Native Image Fit Service was Unavailable
Incident Report for Xandr
Postmortem

Incident Summary

From approximately 17:12 UTC on December 2nd 2019 to 17:52 UTC on December 2nd 2019, the native image fit application experienced network connectivity issues that prevented it from retrieving the images to be resized.

Scope of Impact

During the incident window, some publishers running native inventory in Europe saw broken images. Based on metrics, about 99,000 images were affected.

Timeline (UTC)

2019-12-02 17:12: Incident Started. Application responsible for Native Image resizing unable to reach external internet.

2019-12-02 17:20: Escalated to engineers

2019-12-02 17:48: Network connectivity restored

2019-12-02 17:52: Issue was resolved

Cause Analysis

The incident was caused by the application responsible for Native Image resizing being unable to reach the internet. This occurred because the node responsible for this service in AMS1 went unresponsive and stopped forwarding traffic from internal networks to the internet.

Resolution Steps

The node that went unresponsive was rebooted and started to function properly.

Next Steps

Improve the existing checks to confirm proper function of the services running on the nodes.

Posted Dec 13, 2019 - 15:56 UTC

Resolved

The following incident has been fully resolved, and we will post a post-mortem as soon as we have completed one:

  • Component(s): Ad Serving
  • Impact(s):
    • Some publishers running Native inventory in Europe may have seen broken images from 17:13-17:52 UTC
  • Severity: Major Outage
  • Datacenter(s): AMS1, FRA1

We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Dec 02, 2019 - 18:11 UTC