Reporting Delays
Incident Report for Xandr
Postmortem

Incident Summary:

Between February 21st (Sunday) 23:24 UTC and February 22nd (Monday) 07:25 UTC, the System Operations team took down an Impbus validation engine (IVE) host for upgrades. This did not come back up properly after that, which led to batches data age to rise. This caused ad delivery to act on stale data, so any changes made by clients to campaign budget/targeting/activation were not seen by the system for the duration of this incident.

Scope of Impact:

During the incident window, reporting delays and ad delivery may have been impacted due to stale data since changes made to campaign budget/targeting/activation were not seen by the Impbus during the incident window.

Timeline (UTC):

2021-02-21 23:34 : Incident started, engineer notified
2021-02-22 03:16 : IM Ticket created
2021-02-22 07:09 : Internal server issues fixed and restarted
2021-02-22 07:25 : Incident resolved

Cause Analysis:

The System Operations team shut down an Impbus validation engine (IVE) host for upgrades. This did not come back up properly which led to batches data age to rise. This caused ad delivery to act on stale data.

Resolution Steps:

Our engineers resolved the issue by switching to another Impbus validation engine (IVE) and increasing the size limit in the server's configuration file.

Next Steps:

System Operations team to update and validate runbooks on the procedures followed for Impbus validation engine (IVE) host upgrades.

Posted Feb 24, 2021 - 02:05 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Feb 22, 2021 - 07:38 UTC
Monitoring

We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.

Posted Feb 22, 2021 - 07:15 UTC
Identified

We have identified the cause of the issue, and our engineers are actively working towards a resolution. We will provide an update as soon as possible. Thank you for your patience.

Posted Feb 22, 2021 - 04:45 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Reporting
  • Impact(s):
    • Stale reporting data
    • Latency, timeouts and errors for affected service
    • Some data incomplete or incorrect until reprocessed (please repull data as necessary)
    • Slower report retrieval
  • Severity: Minor Outage
  • Datacenter(s): Global

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Feb 22, 2021 - 03:29 UTC