Incident Summary
On June 8th at 19:30:00 UTC a new version API was deployed. This version had a database connections and an authorization bug. This is a bug for pixel creation only and only when campaigns are attached. After discovery the new API release was rolled back to the previous version.
Scope of Impact
POST to /pixel service with campaigns timed out, which resulted error.
Timeline (UTC)
2020-01-08 18:59: Incident Started: New API version was deployed
2020-01-09 16:14: Incident escalated to engineering
2020-01-09 18:45: Incident Resolved: Rollback to previous version complete
Cause Analysis
In the most recent release of API, new functionality was added in an attempt to work around an existing cross-data center sync issue. this caused the connection pool become depleted and new requests were hanging while waiting for a connection to become available. Ultimately, calls into API were not receiving responses leading to timeouts in the calling service.
Unfortunately this was not caught in dev/staging due to the high level of traffic required to cause the outage. We are currently looking into better ways to test production-level volume in dev environments.
Resolution Steps
Rolling back to the previous version
Next Steps
Added new detections
We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.
We are currently investigating the following issue:
We will provide an update as soon as more information is available. Thank you for your patience.