Reporting and Log Level Data Delays
Incident Report for Xandr
Postmortem

Incident Summary:

On June 22nd at approximately 1900 UTC, capacity in the NYM2 datacenter was brought down due to a network circuit issue between LAX1 and NYM2 data centers. This caused a backup in log processing which led to delayed log level data feeds and reporting.

Scope of Impact:

During the incident window, clients experienced longer than normal reporting delays for Log Level Data and Network Analytics reports.

Timeline (UTC):

2020-06-22 16:47: LAX1-NYM2 circuit goes down

2020-06-22 19:11: First alert of delayed data

2020-06-22 20:10: Incident Reported: Log Level Data and Network Analytics delays were reported

2020-06-22 21:20: Began installation of extra servers

2020-06-23 04:00: Incident Resolved: Reporting delays fell under threshold

Cause Analysis:

A network circuit between NYM2 and LAX1 data centers failed and traffic was re-routed, which reduced overall network bandwidth. The limited bandwidth was acceptable for two hours, after which there was increased platform traffic due to the time of day resulting in network saturation when requests started to fail.

Resolution Steps:

Our engineers resolved the issue by adding extra processing capacity in NYM2 data center.

Next Steps:

  • Add more capacity
  • Investigate optimization of handling data when network is backed up
Posted Jun 29, 2020 - 14:02 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Jun 23, 2020 - 16:15 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Log Level Data, Analytics reports
  • Impact(s):
    • Stale reporting data
  • Severity: Minor Outage
  • Datacenter(s): NYM2

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Jun 23, 2020 - 03:17 UTC