Incident: Service Outage

Posted: Friday, September 27th, 2019 at 16:09 by Adam Heath

Wednesday 25th September 22:30 – Thursday 26th September 01:04

A change to standardise access lists on some of our core network devices created an unexpected impact whereby some core devices were unable to learn/send routes from our route reflectors.

We have raised a vendor case and updated our regression test procedures to avoid future issues. An RFO will follow in due course.

One Response to “Incident: Service Outage”

  1. Incident Management says:

    Impact: Services delivered through some core devices may have experienced a total loss of service.

    Root Cause: In an effort to standardise our management plane access lists across devices through a non-service affecting change, an unforeseen and adverse effect caused the control plane of each device to react to the change. This caused the core devices to lose their established BGP sessions to our route reflectors. Once the issue was identified, roll back was initiated to the pre-change baseline and service restored. We are continuing to liaise with our vendor to gather the root cause of the outage.

    Additional Actions: A review of the failed change has been carried out and actions to prevent recurrence of this unexpected impact and service outage have been identified and will be progressed.

Leave a Reply

You must be logged in to post a comment.