Pixel service returns "too many requests"error when POST
Incident Report for Xandr

Incident Summary

On June 8th at 19:30:00 UTC a new version API was deployed. This version had a database connections and an authorization bug. This is a bug for pixel creation only and only when campaigns are attached. After discovery the new API release was rolled back to the previous version.

Scope of Impact

POST to /pixel service with campaigns timed out, which resulted error.

Timeline (UTC)

2020-01-08 18:59: Incident Started: New API version was deployed
2020-01-09 16:14: Incident escalated to engineering
2020-01-09 18:45: Incident Resolved: Rollback to previous version complete

Cause Analysis

In the most recent release of API, new functionality was added in an attempt to work around an existing cross-data center sync issue. this caused the connection pool become depleted and new requests were hanging while waiting for a connection to become available. Ultimately, calls into API were not receiving responses leading to timeouts in the calling service.
Unfortunately this was not caught in dev/staging due to the high level of traffic required to cause the outage. We are currently looking into better ways to test production-level volume in dev environments.

Resolution Steps

Rolling back to the previous version

Next Steps

Added new detections

Posted Jan 17, 2020 - 01:12 UTC

This incident has been resolved.
Posted Jan 09, 2020 - 13:05 UTC

We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.

Posted Jan 09, 2020 - 10:05 UTC

We are currently investigating the following issue:

  • Component(s): Console API
  • Impact(s):
    • Latency, timeouts and errors in API
  • Severity: Major Outage
  • Datacenter(s): Global

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Jan 09, 2020 - 08:01 UTC