From approximately 07:43 to 09:09 UTC on Tuesday, July 28, 2020 clients were unable to access line items and insertion orders via the Order Home dashboard.
During the incident window, customers could not load line items and insertion orders via Order Home dashboard as they encountered a "Gateway Timeout" error.
2020-07-28 07:43: Incident started
2020-07-28 08:26: Incident escalated to on-call engineer
2020-07-28 09:09: Incident resolved
A backend process that reads line item / insertion order IDs from a queue and updates their status stopped processing, causing the queue to grow exponentially until it reached its maximum capacity. After several hours of attempting to process at maximum capacity, the resource load caused our caching mechanisms to fail and the system stopped responding to user requests.
Our engineers resolved the issue by swapping the production environment over to the backup database cluster. The primary database cluster was afterwards restarted and, after re-indexing, we were able to conclude that the issue was resolved.
The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.
We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.
We are currently investigating the following issue:
We will provide an update as soon as more information is available. Thank you for your patience.