Global Data Center Seeing Authentication Issues
Incident Report for Xandr
Postmortem

Incident Summary

From approximately 09:25 UTC to 14:32 UTC on Thursday, May 28th, 2020 network maintenance in the NYM2 data center caused disruptions between services that required manual restarting to resolve. These services included Invest, Monetize, Console, and Platform.

Scope of Impact

During the incident window some clients would have experienced issues logging in and editing objects. Batch-segment upload was also affected. This happened to clients using the UI as well as the API.

Timeline (UTC)

2020-05-28 05:11 UTC: Planned Maintenance started

2020-05-28 09:25 UTC: Incident Reported: Console UI or API authentication alerts were reported

2020-05-28 14:32 UTC: Incident Resolved: Manual restart completed and authentication and editing available again.

Cause Analysis

The root cause of the incident was a networking loss in NYM2 that caused issues in applications that were unable to automatically recover.

Resolution Steps

Our engineers resolved the issue by shifting the network connection to our LAX1 and AMS1 data centers and manually restarted the disrupted applications.

Next Steps

  • Take more preventative action with shifting traffic to other data centers before maintenance
  • Rectify underlying issues in the applications to harden them against network issues
Posted Jun 05, 2020 - 14:23 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted May 28, 2020 - 22:48 UTC
Monitoring

We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.

Posted May 28, 2020 - 14:39 UTC
Identified

We have identified the cause of the issue, and our engineers are actively working towards a resolution. We will provide an update as soon as possible. Thank you for your patience.

Posted May 28, 2020 - 14:01 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Console API, Console UI, Monetize UI, Invest UI
  • Impact(s):
    • Page load failures and errors in user interface
    • Latency, timeouts and errors in API
    • Some users unable to log in
  • Severity: Major Outage
  • Datacenter(s): Global

We will provide an update as soon as more information is available. Thank you for your patience.

Posted May 28, 2020 - 10:44 UTC