External Bidder request drop in SIN3 datacenter
Incident Report for Xandr
Postmortem

Incident Summary:

From approximately 20:25 UTC to 20:45 UTC on Wednesday, March 18th the Impression Bus stopped sending bid requests to a set of bidders in the Singapore (SIN3) data center.

Impact:

During the incident window, buy side activity from a set of bidders would have experienced a stoppage in the Singapore data center.

Timeline (UTC):
2020-03-18 20:25:00: Incident started
2020-03-18 20:40:00: Observed invalid behavior via metrics, started rollback immediately
2020-03-18 20:45:00: Rollback completed and metrics confirmed full recovery
2020-03-18 21:04:00: Issue escalated to Engineering
2020-03-19 00:36:00: Incident Resolved and Retrospective Incidence Management ticket created

Cause Analysis:

The root cause of the incident was due to an erroneous configuration deployed in the Singapore (SIN3) data center, which caused the QPS system to malfunction. This resulted in the impression buses to stop sending traffic to bidders.

Resolution steps:

Our engineering team resolved the issue by rolling back to the previous configuration, which immediately fixed the issue.

Next Steps:

  • Re-evaluate safeguard mechanism on impression buses
  • Investigate existing cluster-size alerts
Posted Mar 23, 2020 - 23:30 UTC

Resolved

The following incident has been fully resolved, and we will post a post-mortem as soon as we have completed one:

  • Component(s): Bidding
  • Impact(s):
    • Decrease in requests and drop in buying from some external bidders for 20 minutes (20:25-20:45 UTC)
  • Severity: Minor Outage
  • Datacenter(s): SIN3

We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Mar 19, 2020 - 01:12 UTC