Console UI and API log in failures
Incident Report for Xandr
Postmortem

Incident Summary

From approximately 06:29 UTC to 06:33 UTC on Friday, February 7th a layer of the LAX datacenter was isolated during network maintenance, causing the DNS from the LAX datacenter to be disconnected from the internet resulting in connection issues.

Scope of Impact

As a result of this incident clients experienced Console UI and API log in errors between 06:30 and 07:00 UTC on Friday, February 7th.

Timeline (UTC)

2020-02-07 06:18: Traffic switch in LAX performed
2020-02-07 06:29: LAX datacenter rebooted
2020-02-07 06:30: Incident Started: Console UI and API log in failures
2020-02-07 06:32: Engineering alerted to LAX datacenter connectivity issue
2020-02-07 06:37: Engineering confirms LAX datacenter connectivity issue resolved
2020-02-07 07:00: Incident Resolved
2020-02-07 07:12: Retroactive Incident ticket created

Cause Analysis

The root cause of the incident was due to a rebooting of the data center core switch, which is used to connect devices within the data center. During this reboot core switch links were down, causing the core switch routing engine to become overloaded. As a result of this overload a connection issue occurred.

Resolution Steps

The connection issue recovered on its own and the issue resolved itself.

Next Steps

  • Re-evaluate whether or not to shift application traffic to different datacenters during maintenance to reduce potential customer impact.
Posted Feb 21, 2020 - 02:13 UTC

Resolved

The following incident has been fully resolved, and we will post a post-mortem as soon as we have completed one:

  • Component(s): Console API, Console UI
  • Impact(s):
    • Page load failures and errors in user interface
    • Latency, timeouts and errors in API
    • Some users unable to log in
  • Severity: Minor Outage
  • Datacenter(s): SIN1, LAX1

We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Feb 07, 2020 - 07:54 UTC