Prebid Availability Impacted
Incident Report for Xandr
Postmortem

Incident Summary

Between 2020-01-07 16:00 UTC and 2020-01-08 02:10 UTC, Prebid availability in LAX was impacted.

Scope of Impact

Between 2020-01-07 16:00 UTC and 2020-01-08 02:10 UTC bid requests that came through the Prebid Server experienced around a 75% failure rate in contacting bidding services, including Xandr’s DSP.

Timeline

2020-01-07 16:00: Incident Started: Combination of oddity with infrastructure settings and maintenance activity caused issue with availability

2020-01-07 17:24: Incident escalated to engineers

2020-01-07 19:31: Revert of release that started incident

2020-01-07 20:29: Second level of escalation begins which led to further errors causing issue

2020-01-08 02:10: Incident Resolved

2020-01-08 02:45: Incident closed and client notification sent out

Cause Analysis

The incident was created due to a bug triggered by a backend release. This bug caused connectivity issues which resulted in the failures for prebid requests.

Resolution Steps

Mitigation is occurring now and involves changing connectivity settings to greatly reduce this from happening in the future.

Next Steps

Engineering team to follow up and implement monitoring so this can be caught before application impact in the future.

Posted Jan 20, 2020 - 14:37 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Jan 08, 2020 - 02:56 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Bidding
  • Impact(s):
    • Drop in delivery on external supply
  • Severity: Major Outage
  • Datacenter(s): LAX1

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Jan 07, 2020 - 22:07 UTC