Watchers on the Wall

Human Error – How Can You Limit Mistakes with your Critical Infrastructure? By Mark Gaydos is Chief Marketing Officer for Nlyte Software.

  • Thursday, 11th April 2019 Posted 5 years ago in by Phil Alsop

A data centre is often the heart of an organisation’s operation. While the majority of it is broadly mechanical, electrical and technological; including the racks, cabling, routers, switches, software, and data, there is a human element that watches over it all. Theoretically omniscient, these are the ‘watchers on the walls’, keeping sight of all that happens, stepping in to guide, correct, or replace an underperforming resource while keeping the zombie servers and cyber threats the other side of the kingdom’s walls.

 

However, sometimes that human element can be less than optimal in the running of the centre and, in the worst extreme, can be a factor in data loss, corporate slow down – and loss of revenue. This is no surprise: The data centre manager isn’t the all-powerful Wizard of Oz, usually it’s some real person with a name like Oscar Zoroaster Phadrig Isaac Norman Henkle Emmannuel Ambroise Diggs… (the man pretending to be the Great and Terrible Wizard of Oz – of course!) or perhaps something a little more ordinary.

 

And the best of us ordinary mortals sometimes make mistakes. Some key learnings to stop those human errors getting in the way of the business of running the kingdom include some simple processes…

 

Watching the wall: Stopping human error

·         All stakeholders, from, partners, and contractors to employees must share information and work together

·         Documentation and processes must be consistent and seen – and used

·         Understand performance and resilience – practice, test, simulate

·         See and understand the whole power chain

·         Monitor operations in real-time, not as they were

·         Identify changing trends and respond promptly to warning signs

·         Implement workflow management to enforce consistent processes

 

Human errors have played a crucial part in the fall of many a kingdom – like a power supply being unplugged in a major airline’s data centre (making rather big news last year), causing a power cascade that ended up knocking out global data operations.

 

But just as the great and powerful wizard used some little tricks to better rule his land, the data centre manager has ways to improve their powers and minimise the human (and technical) errors that are the undoing of peak performance. Capable and knowledgeable data centre managers take care of the whole company from their sub-kingdom estate – and their wise and just rule can spell prosperity for the whole corporate empire.

 

The biggest of the wizarding tricks that the wise data centre manager uses to stop the plagues of configuration errors, hordes of zombie servers, covens of energy inefficiencies, monitoring goblins, capacity planning crusades, performance management quests, and the ogre of real-time reporting, consists of four little words.

 

Data Centre Infrastructure Management (DCIM) – the sword in the stone

Simply put, DCIM helps data centre managers rule the kingdom without question of the legitimacy of their rule. The reason? The solution enables data centre managers to move away from Excel spreadsheets and notepads and gives them a personal ‘data centre assistant’ keeping track of a plethora of readings from throughout the data centre. It presents one view of the whole estate.

 

A real Excalibur will automate device discovery, workflow management, and reporting across the entire data centre technology stack: Physical, virtual, and edge, including software and IoT devices.

 

This means that data on all elements of the data centre domain is more accurate, immediate, and actionable, reducing the burden on the data centre manager – leaving them more time to improve and action positive changes based on this data. This means time to plan and strategize, to become a wise and beneficent ruler, able to look over the horizon to see upcoming threats to the kingdom.

 

Thus, the organisation can bask in the best quality of service, up time, and capacity planning they’ve ever received: Application uptime, speed of service, and ability to innovate all stemming from the stable base of the data centre, because the ruler can see, understand, and step in not only to solve historic issues, but even use this data to foretell the future with predictive analytics. These include pointing out future power issues, capacity crunches, predicted bandwidth squeezes, and hotspots in the aisles to cool down before equipment starts to wear to fast.

 

To paint a simpler picture: DCIM exists to minimise risks whilst improving efficiency and transparency for the entire organisation.

 

Given that the biggest network management challenges for data centres currently revolve around a drive to greater transparency and overall efficiency in a complex and rapidly growing landscape of applications and data volumes: It’s a needful technology. The increasing number of assets being added to the data centre, from network servers to switches and racks, is adding to the complexity of the on-premise set up – which has an inverse effect on network performance.

 

Even cloud computing providers are not immune to these challenges – in fact, they have them on an even grander scale. All the issues bedevilling an on-premise data centre are present in the cloud, and so cloud service providers likewise face an even more pressing need to manage their data centres to the ‘nth degree’ to ensure the five nines, or whatever service level agreements providers guarantee their customers.

 

The right solutions shorten the timeframe to action and response, reduces the required man-hours to fix breaks and install new solutions of all types, and minimises errors across the board. The result: A well-run kingdom, a better run business, significant cost and time savings, and a stable platform for innovation in all things IT for the business to build on. Not to mention returning sanity to a frazzled data centre management service team, simply looking to deliver a great service to their users.