As many of you realised Darklite-SCE suffered a major outage starting early Sunday morning (4th) and continuing until Tuesday (6th) at 2pm. Since this time all servers have been back online and are functioning normally - we are continuing to monitor the situation and remain in contact with the data centre to ensure no further outages occur. Here is a step by step analysis of the timeline leading to this downtime:
- Darklite runs servers from two data centres, one in Dublin and one in Maryland USA.
- Over the past year we have been migrating all customers to Dublin. There are currently 350 active sites on the final remaining American server.
- We have had an 8 year business partnership with the centre in Maryland
- A few months ago the Maryland data centre was purchased by a company called NaviSite1 - who are a much larger publically traded company with similar years of experience.
- NaviSite indicated their desire to move all servers (approx 500 and upwards of 200,000 individual sites) to their primary data centre in Minnesota USA. This would only be handled when they could guarantee the smallest of downtime.
- This migration was indefinitely postponed after they discovered inherent problems with the method they planned to employ to move servers.
- Early Sunday morning (2am) we were made aware that our American server was offline.
- Contact was made with the data centre, technical support, and other hosting companies which used this facility. Many of them including ourselves were informed that the migration had been proceeded with, much to our surprise.
- At this time we were told the server would be offline for approximately 12 hours, as they copied all sites and email over a dedicated line to a duplicate setup in Minnesota.
- By Sunday afternoon it was clear that this window would not be reached - their transport mechanism had failed and they had decided to load up the individual servers onto trucks and drive them to the new data centre. We were informed that it would be an eight hour drive to the new centre.
- At this point we joined a conference bridge call with many of the other hosting companies affected to share information and news as contacting NaviSite had become impossible.
- Leading up to Monday morning we received upwards of eight promises that the server would be back online and functioning at various times.
- By Monday at 5am it was becoming obvious that due to DNS issues we would need to separate the link between the American server and the Dublin based servers - this work was completed at 9.30am and thanks to the IEDRs (.ie domain registry) quick help in this matter the vast majority of our customers who are hosted in Ireland never experienced any downtime. The Dublin centre will now remain unlinked from the American centre - this isolates our Dublin hosted customers from any further potential problems.
- At 11 AM limited access was restored to the management console on the American server and approximately 50 sites came online.
- NaviSite informed us that they were experiencing significant network issues with the servers online and when they attempted to do so their entire network was being brought down.
- This required some additional equipment to be flown in to handle the new traffic.
- At 9.30 PM access was restored to approximately 200 accounts. IP routing issues with NaviSite blocked the remaining sites from going live.
- At 4.30 AM a further 100 accounts became active.
- At 9.20 AM all accounts went offline as NaviSite reloaded their internal routing tables.
- At 10.30 AM we began preparation to perform a mass migration to Dublin of all American hosted sites - the deadline for this to start was set to 3PM. This was a last step measure that would have restored e-mail but would have required up to 72 further hours and NaviSite to fully recover to enable our staff to successfully migrate all web sites.
- At 12PM access to the server was restored and one by one sites began to become available.
- At 2PM all sites and e-mail were functioning as normal.
During The Outage:
- From Sunday 2am until Tuesday 2pm, Jonathan Stein, Darklite`s Managing Director remained on the conference call to America and exerted all possible pressure on the NaviSite staff to restore full access.
- The remaining staff handled hundreds of phone calls and updated our phone system with the latest information possible.
- Information coming from NaviSite executives was extremely limited, despite bypassing the new outsourced Indian support and working directly with a NaviSite VP located in America for information.
- Other customers of NaviSite experiencing the same problems provided significantly more support and help during this time than NaviSite.
- All sites belonging to Darklite are currently online, however a significant percentage of other NaviSite customers still have no access to their servers.
- We continue to watch the situation closely, however as time passes the NaviSite systems seem to be growing more robust.
- We are performing customer requested migrations to Dublin in a calm and systematic manner. A full plan will be created in the next days to speed up this process.
- We have limited confidence in the abilities of NaviSite to provide useful information in a timely fashion.
- Many NaviSite customers are discussing what action to take against NaviSite, however at this time it is more important to ensure customer sites and e-mail are online. We will release a further statement regarding NaviSite in the next couple of days when we are satisfied that all customers will have no further outages.
I would personally like to thank the customers who called to say thank you for the updates provided and offered their support in this very difficult time - you are truly fantastic customers! I extend our sincere apologies for the significant inconvenience experienced, and although we could have done nothing to prevent this issue, myself and my staff will be taking every possible step in the coming weeks to ensure that all American hosted sites are brought over quickly to the Dublin centre. Please give us your patience on this matter, so that we can move your accounts in a considered manner and prevent any further access issues. We will contact each of the American hosted site owners individually to schedule this migration and it will involve no further downtime.
There will be a further release in a couple of days with more information.
Managing Director - Darklite-SCE