Incident Summary
From approximately 14:21 UTC to 15:00 UTC on Wednesday, October 9th, users may have been unable to access created objects due to delays in database replication.
Scope of Impact
During the incident window, users may have experienced issues with accessing objects they created in the UI/API, including unexpected UI/API errors.
Timeline (UTC)
2019-10-09 14:21: Incident started: High replication lag noticed by engineering.
2019-10-09 14:43: Issue escalated as incident and investigation proceeds.
2019-10-09 14:45: Issue reported on status.xandr.com.
2019-10-09 14:47: Rate limits added to specific users for domain-list service.
2019-10-09 15:00: Incident resolved: replication lag recovered for all database instances.
Cause Analysis
A large spike in DELETE traffic to the domain-list service slowed down MySQL replication.
Resolution Steps
Our engineers resolved the issue by reducing rate limits for the user making the large number of DELETE requests.
Next Steps
* Re-architect the domain-list service to be able to delete large amounts of data at a more regulated pace
* Add max number of domain-list URLs that can be part of a list.
The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.
We have identified and patched the following issue:
We are monitoring our systems closely, and will provide an update as soon as the issue has been fully resolved.