From approximately 16:11 UTC on Tuesday, Sep 19 to 23:12 UTC on Wednesday, Sep 20, 2023, users experienced a delay in the delivery of LLD data, and reporting data was not accessible through the UI, during the impact window.
The issue was caused due to an on-going database query locks. This issue originated from a non-responsive scheduler, which failed to schedule jobs and process messages from the queue. Further investigations revealed that threads were stuck, holding a backlog of messages without acknowledgment, due to indefinite database operation timeouts that caused lock contention. This resulted in users experiencing a delay in the delivery of LLD data, and reporting data was not accessible through the UI, during the impact window.
The issue resolved on its own as there was no further delay remaining in the messaging queue for the system to process. As a result, the code running on the failed node regained its stability.
The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.
We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.
We are currently investigating the following issue::
Status: We will provide an update as soon as more information is available. Thank you for your patience.