Adserving down in AMS1, LAX1, NYM2 and SIN3 datacenters
Incident Report for Xandr
Postmortem

Incident Summary

From approximately 16:33 UTC to 19:10 UTC on July 30th, we experienced traffic outages of >90% in SIN3 datacenter and >95% in LAX1. During this window, traffic in AMS1 was shifted to FRA1 after a 20-30% outage. From approximately 16:33 to 17:15 on the same day, there was a >85% traffic outage in the NYM2 datacenter‌

Scope of Impact

During the incident window, ad serving was disrupted across NYM2, LAX1, AMS1 and SIN3 datacenters.

Timeline (UTC)

2020-07-30 16:33: Incident Started and immediately escalated to engineering team

2020-07-30 19:10: ad-serving returned to normal

2020-07-31 01:04: Incident Resolved: release to correct issue in Impbus

Cause Analysis

An update to bidder uncovered an issue, causing Impbus to go down across four datacenters.

Resolution Steps

Our engineers resolved the issue by pushing a temporary fix and restarting impacted instances of Impbus.

Next Steps

Improve detection, monitoring and alerts for changes in Impbus traffic.

Posted Aug 03, 2020 - 22:04 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Jul 31, 2020 - 11:57 UTC
Monitoring

We have resolved the adserving issue. We are monitoring an increase in data age, which delays object changes reaching the ad server.

Posted Jul 30, 2020 - 19:54 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Ad Serving
  • Impact(s):
    • Adserving is down in impacted datacenters
  • Severity: Major Outage
  • Datacenter(s): SIN3, NYM2, AMS1, LAX1

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Jul 30, 2020 - 17:25 UTC