My kit’s in a data centre – surely nothing can go wrong?

By Roger Keenan, Managing Director of City Lifeline Ltd, a leading professional colocation data centre in central London.

  • Monday, 10th June 2013 Posted 11 years ago in by Phil Alsop

People put their mission-critical operations and mission-critical equipment in a professional colocation data centre for a variety of reasons. But the common denominator is safety and security. People want certainty that things will continue to work, that their equipment will not be disrupted or stolen and that they can sleep at night. Mostly, that is what happens, and the reliability of a professionally run data centre and its resilience to unexpected external events is generally far superior to what most in-house organisations can deliver.


The effects of failures can be catastrophic. Trading systems can lose millions in seconds if a failure which affects the algorithms occurs. Image the effect on ebay if ebay stopped working during live auctions. Or the very real effects that happened when Amazon in Dublin went down in 2011. Not only are the ramifications on the operations huge, but they drastically affect any data centre that allows them to happen as well. Reputation is everything, the market is competitive and a professional colocation data centre which does not deliver and make its customers feel secure will soon find it has no customers. SLA’s make no difference.


So what does a professional colocation data centre do to mitigate against such things? It comes down to design, operation and training. Statistics show that more than half of all outages are caused by human error – the “oh, I thought you said switch off D, not E, is that why it’s gone quiet?” syndrome. Organisations often boast about how their people are their most valuable asset. A data centre is one of the most crucial tests of whether they really mean it. Routine training is important and necessary, but the real test is when things have not happened as expected, it is 3am and one person has to make the critical decisions on which the future of his organisation and its customers depend. That is when the value of good training and regular practice is seen.


Security is another area where people are critical. Many professional colocation data centres offer 24/7 access, so that if any equipment fault occurs, even at 4am on Christmas Day, it can be addressed immediately. In the world of professional security, there is no substitute for well-trained, properly qualified people who are on site all the time, know the facility intimately and can control access and equipment removals and can respond in an intelligent way to unexpected events. People on their own are not enough – they need to be supported by well thought through security systems, such as sophisticated CCTV with external motion detectors and electronic tripwires. Procedures need to be well designed and well balanced. There are plenty of horror stories about technicians who dismantled their equipment, went to the car park to get a screwdriver, then couldn’t get back into the facility. The balance between tight security and ease of access needs to be both right and appropriate to the type of customers using the data centre.
The core of any commercial data centre, after physical security, is the availability of electrical power. Most of the actions needed to ensure that power to the equipment never fails are automated. A good electrical design will be to Tier 3 data centre standards. The key bit is “concurrent maintainability”. In other words, the ability in the electrical design to be able to take any part of it out of service for maintenance without affecting the end user of the equipment. Such a design implies that any single failure (which is the same thing as taking a system element out of service) also does not affect the end user and therefore that there are no single points of failure (commonly known as SPOF’s).


The old Acronym KISS (Keep It Simple, Stupid) is invaluable in avoiding power outages. The more complex a design is, the less easy it is to understand and the more likely people are to make mistakes. The more elements are dependent on each other, the more likely that some unpredicted interaction will cause an outage, and the more likely that the technician trying to make critical decisions in the middle of the night will get one of them wrong. And using sophisticated technology is not advisable for the same reasons – old fashioned, simple, modular electromechanical technology works just as well and is much easier to diagnose and deal with in a crisis.


Nothing in life is perfect, and no-one can give a cast-iron guarantee that nothing will ever go wrong. But a professional colocation data centre, which exists solely for the purpose of keeping its customers’ equipment, safe, secure and operating, and whose reputation and existence depend on doing just that, is likely to be the best way for most managers to ensure they sleep soundly at night.