The Top Reason Most IT Organisations Can’t Quickly Resolve Critical Incidents

Most organisations today are digital businesses. In other words, they rely on electronic processes to run their core business. Digital processes dramatically streamline and enable more accurate and often higher quality operations. However, they also create major disruptions when there are problems with the applications or systems running these processes – or the Internet or corporate network enabling them. In other words, they are the source of critical events that become incidents requiring IT involvement. By Vincent Geffray, Senior Director, Product Marketing, Everbridge.

Friday, 14th June 2019 Posted 7 years ago in by Phil Alsop

Quick resolution is essential to avoid a severe impact to the company, its customers, partners, and employees. Yet most organisations fail to achieve that goal, and they need to understand why so they can realise far better outcomes.

Unplanned downtime costs a fortune

If you think your business is safe from these critical incidents, think again. According to the Ponemon Institute, the chances of experiencing a data breach are 1 in 4. Add to that unforeseen Internet and electric outages and network issues due to natural disasters, weather, and human error, and the odds are against you.

Whatever the incident, your goal should be minimising the duration of the disruption so you can, in turn, minimise the impact on your customers and business.

Let’s say you incur £5,000 per minute for unplanned downtime. This factors in all related outlay and monetary impacts, including directing highly skilled IT staff away from current projects and activities, lost IT and non-IT employee productivity, incurring regulatory fines, and losing revenue and even opportunities (such as not being able to process an order or serve a prospective customer). In other words, £5,000 is a conservative estimate.

Simply put, the costs are high – and organisations are likely to lose more than they can calculate, since existing customers and prospective customers might turn to a competitor. Who knows what revenue you have left on the table over the lifetime relationship with a customer who walked away or never signed on to do business with you?

Add up all the related costs and one hour of unplanned downtime could easily cost your organisation £300,000. Experience just a handful of these a year – in line with IDG’s survey finding that most companies reported as many as five critical IT events each year – and you exceed £1 million.

Minimising the mean time to resolve (MTTR) by collapsing the mean time to find someone!

In the IT world, the duration of an incident is measured by the average time that elapses from the incident being opened or reported until it is resolved or closed. The IDG survey found that it takes most companies an average of 39 minutes just to assemble the correct incident response team. The larger the organisation, the more distributed the IT staff, the more complex the computer applications, the larger the number of resolvers and the longer to time to identify the right IT staff and engage the team. It’s easy to see how the time to resolve an incident can extend well beyond an hour.

Once the team is assembled, it can be quite time-consuming to unravel the issue and work toward a resolution. Most organisations are deploying a growing number of vital IT services and applications. With that comes a growing level of complexity of apps/IT services and cloud management that makes it increasingly difficult to easily determine what’s gone wrong.

Those are not the only reasons IT organisations struggle to resolve incidents quickly. They spend significant time and money to do so. But they have invested little in the one step that can help them truly minimise incident duration and business impact.

Where 70% of IT organisations go wrong

For years, IT organisations have spent hundreds of thousands and even millions of pounds on tools and services to detect and determine issues faster like Solarwinds, Nagios, AppD, NewRelic, dynatrace, Jira, and Splunk to monitor their infrastructure and track issues. The idea is that earlier detection will lead to a faster fix. Although it is a necessary step, it is not sufficient to ensure expedited incident resolution.

To investigate, fix and test issues, organisations turn to highly skilled IT staff that know the company’s infrastructure inside and out. In most organisations these are the most highly paid resources – and they call upon expensive tools that help automate part or most of their incident response process.

The culprit is the response phase, which can take up to 70% of overall resolution time according to industry analysts Forrester. And that’s namely because organisations still handle most of this step manually. This includes everything from communicating with stakeholders and launching restoration calls to determining the best people to contact and setting up a conference call or meeting space.

In other words, organisations have invested millions to optimise how quickly they can detect, investigate, fix issues and even test their fix. However, they’ve done little to streamline the response phase.

No surprise: Incident Response Automation is the only possible answer

If your organisation and its infrastructure are highly complex – which is the case for most organisations today – you need integrated processes and tools throughout the incident response process. By adopting an end-to-end incident resolution approach, automating everywhere possible, you address the shortcoming preventing you from minimising the impact of IT incidents. In other words, streamline all steps in the incident response process and you can truly optimise the entire process and get back to business as usual more quickly.

If you’ve overlooked the response phase of your incident management process, then it is most likely that all the gains provided by the ITOM, SIEM, ITIM and ITSM tools you currently use are wasted because of time-consuming, manual, inefficient and unpredictable incident response coordination.

This is how Incident Response Automation helps IT organisations every day to regain control on their performance and deliver quantifiable value to the business.