Delayed data might cause overspend for objects whose budgets are set to lifetime pacing.

Incident Report for Xandr

Postmortem

Incident Summary

From approximately 18:33: UTC on June 13 to 00:35 UTC on June 14, 2021, lifetime budgets on objects were not updated potentially resulting in overspend on affected objects.

Scope of Impact

During the incident window, objects with lifetime budgets only potentially overspent. This did not affect objects using daily budgets.

Timeline (UTC)

2021-06-13 18:33: Incident started
2021-06-13 19:27: Engineering notified and mitigation efforts started
2021-06-13 22:01: Engineering declared an outage
2021-06-13 22:15: Clients notified
2021-06-13 22:25: Restarted affected hardware and reduced active running jobs
2021-06-14 00:35: Incident resolved

Cause Analysis

A hardware failure reduced capacity. While the system was recovering there was an additional failure within the same system, which caused the incident. An automated alert was triggered during the first failure, but not the second so engineering was not immediately aware of the second failure.

Resolution Steps

Our engineering team resolved the issue by pausing non-critical loads and making some database configuration changes to help clear contention that allowed critical jobs to complete.

Next Steps

Review alert process so that an alert is triggered every time there is a failure

Posted Jun 18, 2021 - 17:58 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Jun 14, 2021 - 00:47 UTC

Identified

We have identified the following issue:

Component(s): Bidding
Impact(s):
- Some objects may spend over budgets
Severity: Minor Outage
Datacenter(s): Global

Our engineers are actively working towards a resolution, and we will provide an update as soon as possible. Thank you for your patience.

Posted Jun 13, 2021 - 22:13 UTC