Pixel service returns "too many requests"error when POST
Incident Report for Xandr
Postmortem

Incident Summary

On June 8th at 19:30:00 UTC a new version API was deployed. This version had a database connections and an authorization bug. This is a bug for pixel creation only and only when campaigns are attached. After discovery the new API release was rolled back to the previous version.

Scope of Impact

POST to /pixel service with campaigns timed out, which resulted error.

Timeline (UTC)

2020-01-08 18:59: Incident Started: New API version was deployed
2020-01-09 16:14: Incident escalated to engineering
2020-01-09 18:45: Incident Resolved: Rollback to previous version complete

Cause Analysis

In the most recent release of API, new functionality was added in an attempt to work around an existing cross-data center sync issue. this caused the connection pool become depleted and new requests were hanging while waiting for a connection to become available. Ultimately, calls into API were not receiving responses leading to timeouts in the calling service.
Unfortunately this was not caught in dev/staging due to the high level of traffic required to cause the outage. We are currently looking into better ways to test production-level volume in dev environments.

Resolution Steps

Rolling back to the previous version

Next Steps

Added new detections

Posted Jan 17, 2020 - 01:12 UTC

Resolved
This incident has been resolved.
Posted Jan 09, 2020 - 13:05 UTC
Monitoring

We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.

Posted Jan 09, 2020 - 10:05 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Console API
  • Impact(s):
    • Latency, timeouts and errors in API
  • Severity: Major Outage
  • Datacenter(s): Global

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Jan 09, 2020 - 08:01 UTC