On Friday, 7 February, between 06:29 and 06:33 UTC, during the network maintenance in LAX1, Layer 2 network in LAX1 was isolated for approximately 4 mins. As a result, Anycast DNS from LAX1 is withdrawn from Internet and various connection issues were reported. Scope of ImpactConsole login and api was impacted from 06:30 to 07:00 UTC.
2020-02-07 06:18 Net engineer switched traffic
2020-02-07 06:29 Core Switch reboot
2020-02-07 06:30 Incident Start
2020-02-07 06:32 Sysops Team receives alerts
2020-02-07 06:37 Core Switch came up
2020-02-07 06:44 DBA team informed Net engineer team about Database down
2020-02-07 06:44 Sysops Team confirmed the alerts started to clear
2020-02-07 07:00 Incident Resolved
2020-02-07 07:12 Incident Ticket is created
During the rebooting Core Switch, when Core Switch's links were down due to reboot, Core Switch routing engine was overloaded.
Because of the overloading, connection issue happened.
LAX1 Layer 2 network became isolated from the Layer 3 network, Internet and other Data Centres for approximately 4 minutes while booting.
Connection issue is recovered by itself.
With the current architecture, connection issue is not preventable for Layer 2 network and it has long Converge time compared to Layer 3.
But we could evaluate to shift application traffic to different DCs to have less customer impact if it's possible.
The following incident has been fully resolved, and we will post a post-mortem as soon as we have completed one:
We apologize for the inconvenience this issue may have caused, and thank you for your continued support.