Latency on both Monetize and Invest UI when editing/creating Line Items
Incident Report for Xandr
Postmortem

Incident Summary

On November 17th we received reports of customers experiencing high latency on Monetize and Invest. We investigated and found that general latency had started to increase from November 10th. Looking into what was deployed around this time was an update to our Rate Limit Service. Once this was spotted the change was rolled back.

From approximately 18:10 UTC to 16:01 UTC on November 10th, 2021, users may have been experiencing Latency issues when using Invest and Monetize. This would have specifically caused issues creating, editing, and loading objects, which was most noted around splits failing and long page load within Invest and Monetize. Creative audit results were also delayed with a backlog caused by the latency on the API service. This resulted in automated audit system to failover to less accurate methods and impacted clients. This incident also led to creative preview to sometimes time-out because of latency load and backlog of calls being made.

Scope of Impact

From approximately November 10th 18:10 UTC to  November 17th 16:01 UTC on November 10th, 2021, users may have been experiencing Latency issues in both Invest and Monetize.

Timeline (UTC)

  • 2021-11-10 18:10: Incident started 
  • 2021-11-17 12:05: Incident escalated to engineering.
  • 2021-11-17 16:01: Incident Resolved

Cause Analysis

The incident was caused by a new logger in our Rate Limit service that added additional latency, which was not noticed in testing. Our metrics page did not have alerts set for response time for this service, where it should have as the service response time is critical to our platforms.

Resolution Steps

When it was realized that this was caused by a bad logger the rate limit service was rolled back to a previous version to remove the bad logger.

Next Steps

* Add alerts for the rate limit service

Posted Dec 23, 2021 - 16:50 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Nov 17, 2021 - 17:04 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Buy-side pages
  • Impact(s):
    • Unable to save/edit objects
    • Latency
  • Severity: Minor Outage
  • Datacenter(s): Global

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Nov 17, 2021 - 13:44 UTC