Requests to Real-Time Data Providers Dropped
Incident Report for Xandr
Postmortem

Incident Summary

From approximately 22:00 UTC on Friday, January 8th, 2021 to 02:00 UTC on Saturday, January 9th, 2021 the Impression Bus stopped sending requests to all data providers in all datacenters. From approximately 02:00 to 12:00 UTC on Saturday, January 9th, requests to all data providers in all datacenters were gradually restored until the issue was resolved.

Scope of Impact

During the incident window, data providers did not receive requests from the Xandr Impression Bus. Clients leveraging data provider segments for buying may have experienced drops in delivery of buy-side objects targeting those segments.

Timeline (UTC)

2021-01-08 22:15: Incident Started.

2021-01-09 00:41: Incident Escalated and Engineering Notified.

2021-01-09 01:45: Problematic data provider causing incident identified.

2021-01-09 01:50: Potential mitigation tested: Impression bus rolled back to older configuration.

2021-01-09 03:55: All impression buses rolled back.

2021-01-09 04:37: First 10% of impression buses resume sending requests to data providers.

2021-01-09 13:50: Incident Resolved: Mitigation roll back complete. All data providers receiving full request stream.

Cause Analysis

The incident was caused by an Impression Bus procedure for applying (de-)activation and data provider toggling for bidder objects, which improperly responded to a series of changes against a specific bidder object. This resulted in incorrect accounting for all data providers, and stopped all requests to data providers.

Resolution Steps

Our engineers resolved the issue by inactivating the offending bidder object and rolling back Impression Bus to a previous version.

Next Steps

  • Implement additional monitoring to identify these drops.
  • Address underlying impression bus logic that introduced this defect.
Posted Jan 14, 2021 - 17:23 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Jan 11, 2021 - 14:01 UTC
Monitoring

We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.

Posted Jan 09, 2021 - 14:14 UTC
Identified

We have identified the cause of the issue, and our engineers are actively working towards a resolution. We will provide an update as soon as possible. Thank you for your patience.

Posted Jan 09, 2021 - 02:13 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Userdata
  • Impact(s):
    • Real time data provider segment targeting failing
  • Severity: Minor Outage
  • Datacenter(s): Global

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Jan 09, 2021 - 01:13 UTC