From approximately 18:33: UTC on June 13 to 00:35 UTC on June 14, 2021, lifetime budgets on objects were not updated potentially resulting in overspend on affected objects.
During the incident window, objects with lifetime budgets only potentially overspent. This did not affect objects using daily budgets.
2021-06-13 18:33: Incident started
2021-06-13 19:27: Engineering notified and mitigation efforts started
2021-06-13 22:01: Engineering declared an outage
2021-06-13 22:15: Clients notified
2021-06-13 22:25: Restarted affected hardware and reduced active running jobs
2021-06-14 00:35: Incident resolved
A hardware failure reduced capacity. While the system was recovering there was an additional failure within the same system, which caused the incident. An automated alert was triggered during the first failure, but not the second so engineering was not immediately aware of the second failure.
Our engineering team resolved the issue by pausing non-critical loads and making some database configuration changes to help clear contention that allowed critical jobs to complete.
The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.
We have identified the following issue:
Our engineers are actively working towards a resolution, and we will provide an update as soon as possible. Thank you for your patience.