Despite an unprecedented number of critical system problems at our customers last week, we coped and our customers kept working. We had hard disk failures; weird Internet connection issues; an email server that has run impeccably for a couple of years crash 3 times for no obvious reason and an old customer database refuse to start up after a regular server reboot.
This is what we do as a business IT Support company. In addition to fixing the daily glitches on our customers systems and answering their questions, we help businesses prepare for critical problems and disasters as best as their budgets will allow.
This week, though, I’ve never experienced anything like it in 20 years. Two sets of hard disk failures brought servers down. Thanks to a stable disk array one server is still up and running awaiting replacement of the disk under warranty. The other unfortunately had two simultaneous disk failures so we have had to implement a temporary disaster recovery server.
On one of our newer customers, we are dealing with intermittent Internet connection dropouts (and once resolved we will be recommending a second line). At another customer we have a similarly unpredictable problem on their email server. What connects these problems and makes them very hard to solve is that they happen out of the blue, leave no obvious messages to indicate the cause and are solved with a reboot of firewall or server respectively – the sledge hammer approach.
The final critical problem that stopped a customer working for a couple of hours was with their most important application - a customer database. After a routine system reboot it refused to start up. Worse still, this is an old application that is no longer supported by the software company that created it. We got it working but without the safety net of the original developer to help us we will always be flying by the seat of our pants.
We would rather disasters didn’t happen at all. We want to be tuning and tweaking systems to keep them running smoothly so that our customers keep working. We don’t want to be working late at the drop of a hat, feeling stressed and pressured, rearranging our weekends, or dashing around London. Life and business computer systems don’t work like that however. So we remain vigilant and working to do the very best for our customers. And although we coped, and our customers are all working, there is a lot of tidying up still to do.