Archive for the ‘Emergency Work’ Category

Emergency Maintenance: Telehouse DSL platform

Posted: Saturday, October 29th, 2016 at 17:22 by Steve Lalonde

Saturday 29 October 2016 17:30 – 23:59 

We will be making some emergency changes to the DSL platform due to a fault with a distribution switch.

Traffic entering the DSL platform via Telehouse will be rerouted via Interxion

Approximately 100 customers will be disconnected and will reconnect to LNS/LTS in Interxion

All customers may see some unusual routing as the network adjusts to the change in traffic flow.

  1. Steve Lalonde says:

    We have been unable to resolve the issue with the switch.

    Traffic will remain routed over alternative paths.

Emergency maintenance: Harbour Exchange 6/7 core

Posted: Friday, September 30th, 2016 at 18:19 by David Derrick

00:01-06:00 1 October 2016

During the above window we will be rebooting our core router in Hex6/7 to address an urgent bug affecting its stability. Downtime is expected to be approx 30 mins during which connected customers will lose service. Other traffic will take alternative paths.

  1. David Derrick says:

    This work is complete.

Emergency maintenance: VoIP platform

Posted: Monday, September 19th, 2016 at 17:18 by David Derrick

Tuesday 20 September 2016 00:01-06:00

During the window above we will be performing emergency maintenance on the back end databases running our VoIP platform.

The work will mostly take place behind the scenes but there may be interruptions to service of up to 30 minutes during the period.

  1. Ming-Yu Hsieh says:

    This work is now complete.

Emergency maintenance – 10/08/2016 17:30hrs – IPStream Connect and WBC connections

Posted: Wednesday, August 10th, 2016 at 12:01 by Neil Watson

We are currently aware that some customers are experiencing a level of packet loss on IPSC and a small number of WBC nodes. As well as investigating internally, we have also engaged with 3rd party suppliers who have identified a configuration problem that will lead to the issues being experienced. In order to correct this, changes need to be made, however in doing so, this will cause a port flap on our equipment which will effectively disconnect all DSL customers on the following nodes:

All IPSC nodes and the following WBC nodes
Peterborough
Preston
Sheffield
Slough
Stepney Green
Wolverhampton

We will therefore undertake this work at approximately 17:30hrs this evening to restore full service to all customers. It is anticipated that all customer will reconnect automatically, however we would ask anyone that does not connect to power cycle their on site equipment as a first step.

We wish to apologise for the short notice of this emergency maintenance, but feel that it is necessary in the interest of the affected customers. We’d also like to apologise for any inconvenience that this may cause and wish to reassure you that we are working hard to minimise any impact.

  1. Neil Watson says:

    We have been advised that the 3rd party supplier is not currently in a position to undertake the changes and is working on an alternative method of resolving the issue. We will therefore cancel the maintenance at 17:30 today. Should any further work be needed we will look to roll this into the pre-planned maintenance window from 23:00 tonight. Apologies for the short notice. We will continue to work to get the issue resolved and will update further on this post.

  2. Neil Watson says:

    Work was undertaken during the maintenance window last night to correct the packet loss issue that some customers were seeing. This work did not disconnect users and did not impact users’ services negatively. Initial reports this morning indicate that the levels of packet loss are greatly diminished or are not present. We are however seeing some disruption to VoIP services which are impacting a minority of users. We are currently trying further diagnose. We apologise for the ongoing nature and would like to reassure you that the issue has high level visibility within Entanet and we are working to resolve as quickly as possible.

  3. Neil Watson says:

    Investigations are continuing and further escalations have been made with 3rd party suppliers. We are expecting a further update before the close of play today and will update as soon as we have further information.

  4. Neil Watson says:

    We have received a further update from our suppliers. They are now planning on introducing further monitoring and diagnostics to confirm the suspected cause of the VoIP issues being seen. Once confirmed, there is an anticipated root to fix which will be implemented as soon as possible. More info will be posted as soon as we have it.

  5. Neil Watson says:

    Further escalations have been made to push this issue as we’re keen to get a resolution as soon as possible. We apologise for the delays being experienced in returning full service to all customers. We will update with more concrete information once we have it.

  6. Neil Watson says:

    Investigation work has continued throughout the day and a potential problem with mis-categorised flows has been identified. Work is underway to determine that this is indeed the cause, after which plans for rectification will be implemented. We will continue to update until resolved.

  7. Neil Watson says:

    Further work has continued over the weekend and changes have been agreed with 3rd party suppliers which have been implemented. These changes appear to have resolved the issue with VoIP and our monitoring has been positive since the changes were made. Unfortunately we will only be able to confirm once we have seen the normal weekday load return. We will continue to monitor closely and would encourage partners and customers to report further issues should they be seen. Once again apologies for the incident and the time taken to resolve.

  8. Adam Heath says:

    Following the previous update, early reports from users inform us that there is still disruption being experienced on some VoIP services. We can assure you this is being handled as the highest priority and we are continuing to work with the suppliers for a resolution.

  9. Neil Watson says:

    Following on from this morning’s reports of further VoIP disruption we have, at the manufacturer’s request, applied additional configuration changes to correct the miscategorised traffic flows. This action was determined based on the traffic dumps and monitoring that took place this morning as soon as the reports of further issues were received. Initial reports following the change suggest that this has had a positive effect. We will continue to monitor and ask for partners to contact the support team should they experience further problems.

  10. Neil Watson says:

    We are receiving reports that the issue has not been resolved and that some customers are still experiencing VoIP issues. We have already passed the issue back to the 3rd party suppliers, with whom we are closely engaged – a further update is expected soon. We are also looking to make changes within the Entanet network to try to further alleviate the problem. We anticipate those changes being made shortly, after which we will re-test again.

  11. Neil Watson says:

    Since our last post we have made further changes to try to alleviate the symptoms of the issues seen. Whilst these changes will improve the quality of some calls, it is not the full solution. We are continuing to work on the root cause of the issue and will update further as soon as we are able. Once again apologies for the extended time to fix for this issue.

  12. Neil Watson says:

    Further configuration changes were made late last night. We are currently assessing the impact of those changes and will advise soon if improvement is seen. We’ll continue to work today to get to a point of resolution.

  13. Neil Watson says:

    As a result of further diagnostics and investigations we have just made further changes to the platform to alleviate the problems being experienced by some VoIP customers. We will now monitor the support calls being received for any VoIP issues that occur after this post and the interfaces of the relevant systems for any loss or drops. Based on the results of this monitoring we will then advise next steps. Additional updates will be made with the results of that monitoring.

Emergency Maintenance: DSL network

Posted: Monday, April 11th, 2016 at 16:12 by Steve Lalonde

Tuesday 12 April 2016 0:00 ~ 8:00

During the above window we will be making some changes to the DSL network to reduce the packet loss some users have been reporting.

The work may cause some users to disconnect/reconnect

No other impact is expected.

  1. Steve Lalonde says:

    This work was completed at approx 8:30

Emergency Supplier Maintenance: Peterborough & Colindale MSIL card changes

Posted: Monday, February 15th, 2016 at 14:25 by Scott Morgan

Tuesday 16th February 2016 00:01 – 06:00

During the above window our suppliers will be upgrading hardware and software for the following exchanges:

(Liverpool Birkenhead, London Stamford Hill, Manchester Denton, Newcastle West, Norwich, Peterborough)

This will impact DSL services routed through our Peterborough and Colindale nodes.
The PW window is from 00:01 until 06:00 and work will start at 00:01 when non-disruptive pre-checks will be performed. There will be no service impact between 00:01 and 01:00. From 02:01 our suppliers will start to re-boot the device to the new version of code. This will cause an outage to all customers of between 10 and 15 minutes whilst the device restarts. However, if roll-back is required,the outage will be extended to 90 minutes. Some customers will experience a second outage of up to 15 minutes while BT upgrade the interface card connecting their circuit. This outage could be extended to 45 minutes if roll back is required. All work will be complete by 06:00.

 

Emergency Work: VoIP Platform

Posted: Thursday, December 10th, 2015 at 17:09 by Steve Lalonde

Friday 11 December 2015 00:00-06:00 GMT

Following on from this weeks VoIP platform issues we will be replacing the primary Database server.

We expect the work to take approximately 2 hours during which VoIP platform will be unavailable.

The back out plan will be to revert to the existing server.

  1. Ming-Yu Hsieh says:

    we are still working on the database server.

  2. Ming-Yu Hsieh says:

    This work is now complete.

Emergency maintenance: VOIP systems

Posted: Tuesday, December 8th, 2015 at 12:39 by David Derrick

Following on from Saturday’s disruption we have discovered a problem with the VOIP databases which are affecting calls for some customers. Attempts to gracefully recover have failed so we will have to briefly stop and restart the database. This may cause calls to drop. Downtime will be less than 60 seconds and is expected to happen at 12:45.

  1. David Derrick says:

    The database has restarted successfully.

  2. Adam Heath says:

    There seems to have been a continuation of the database issue we previously identified which may be causing issues with inbound calls and amendments to call forwards. We are investigating on our side and will provide further information when it becomes available.

  3. Adam Heath says:

    We have resolved the immediate database issue and services now appear to be working again. Further remedial work is required and will be scheduled in due course.

Emergency maintenance: Sonicwall GMS

Posted: Thursday, November 12th, 2015 at 10:47 by David Derrick

Friday 13 November 2015 11:00-13:00

During the above window we will be applying the latest service pack to our GMS deployment to resolve a number of urgent issues. Customers and support team members will be unable to view reports and make changes although the NOC team can still access appliances directly to make urgent changes.

Please note that the appliances themselves are unaffected by this work and the security of customer networks is not at risk.

  1. David Derrick says:

    This work is complete. Users may need to clear their browser cache if they have trouble logging in.

Emergency Work: Telford UPS

Posted: Thursday, August 13th, 2015 at 13:05 by Mark Yardley

We have experienced a failure of one of the UPS devices that protects telford-dc3.core. We will shortly be undertaking work to replace the failed unit and restore full power redundancy. Services connected to or through telford-dc3.core should be considered at risk until the work is completed.

Further updates will be provided within the hour.

  1. Mark Yardley says:

    Engineers are currently progressing with the work. Further updates will follow shortly.

  2. Mark Yardley says:

    The work is now complete and UPS redundancy has been restored.