Ad Serving Possibly Degraded
Incident Report for Xandr
Postmortem

Incident Summary
From approximately 20:40 UTC to 22:05 on Monday, November 23rd the impression bus went partially down in most datacenters resulting in a high percentage of blanks and timeouts. 

Scope of Impact
During the incident window customers experienced a high percentage of blanks and timeouts in all datacenters except for the NYM datacenter.

Timeline (UTC)
2020-11-23 19:34:00: Incident started, first crashes occur in SIN datacenter

2020-11-23 19:37:00: Alert received for SIN datacenter crashes

2020-11-23 19:42:00: Incident escalated

2020-11-23 20:06:00: Underlying problem identified

2020-11-23 20:21:00: Initial fix implemented

2020-11-23 20:39:00: Additional alert received indicating that other datacenters were experiencing issues

2020-11-23 21:02:00: Engineering releases a hotfix

2020-11-23 22:21:00: Impression bus starts to recover

2020-11-24 02:14:00: Incident resolved

Cause Analysis
The incident was caused by the removal of a Prebid Server object, which exposed an old and erroneous Impression bus assumption and resulted in the observed datacenter crashes.

Resolution Steps
Our engineers resolved the issue by stopping the source of data causing the issue while simultaneously correcting the underlying error in a new release.

Next Steps
- Schedule additional training to help identify and prevent this issue type in the future.

Posted Dec 03, 2020 - 21:32 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Nov 24, 2020 - 17:32 UTC
Monitoring

We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.

Posted Nov 24, 2020 - 03:24 UTC
Identified

We have identified the following issue:

  • Component(s): Ad Serving
  • Impact(s):
    • Possible issues with ad delivery
  • Severity: Partially Degraded
  • Datacenter(s): Global

Our engineers are actively working towards a resolution, and we will provide an update as soon as possible. Thank you for your patience.

Posted Nov 23, 2020 - 21:04 UTC