Uptime Institute introduces Outage Severity Rating

Uptime Institute has introduced its new Outage Severity Rating (OSR) to help the digital infrastructure and data center community better understand and articulate service outages in the context of how each incident affects the business. With OSR, infrastructure practitioners can finally share a common lexicon when forming their own service delivery capacity strategies and can view their own outages in terms of business impacts, rather than referencing outages based upon the number of physical infrastructure components that were involved.

  • Wednesday, 15th May 2019 Posted 5 years ago in by Phil Alsop

For the past three years, Uptime Institute’s Intelligence group has been studying publicly reported outages to understand the causes and impacts of unplanned downtime. During the three-year time period, the number of public outages has steadily climbed, with 27 outages in 2016; 57 outages in 2017, and 78 outages in 2018. This rise in outages is proportional to the complexity of typical infrastructures, where computing capacity and its associated data is delivered by a combination of in-house data center sites, co-location facilities and the cloud all connected by high capacity networks. Consequently, IT system and network problems have now surpassed mission critical and facilities issues as the leading causes of publicly recorded outages, compared to power which was the biggest cause in previous years.

 

“Public awareness of outages is becoming more pronounced as the number and impact of outages increases. In most cases, we find it difficult to understand the true nature and magnitude of the outage since most practitioners still characterize the severity of an outage based on the amount of affected physical infrastructure equipment.” said Andy Lawrence, executive director of research, Uptime Institute. “The OSR was developed to allow the data center industry’s infrastructure practitioners to view outages from the top down, at the IT service delivery level, and then communicate with one another in an informed and normalized business impact fashion. The OSR eliminates the equipment-centric view of outages, and instead focuses on the ability for the hybrid digital infrastructure to support the required IT business services being delivered by the infrastructure.”

 

Hybrid infrastructures deliver business services at a level of designed capacity based on all components being available. When any part of the hybrid infrastructure fails, capacity is affected in a gradient fashion, related to the complexity of the failure(s). As the IT industry continues to leverage hybrid infrastructure designs, the definition and scope of “outages” also must change. Historically an “outage” was considered as a binary state of service delivery; entire data centers were described as online or offline. Consequently, Uptime Institute has been advising companies that they need to pay more attention to business service resiliency, understanding how the hybrid system is designed, what the interdependencies are, and then plan accordingly. The use of OSR will allow IT business managers to better understand their own outage trends and where to focus their investments to reduce business continuity vulnerabilities and other risks over time.

 

The Outages Severity Rating (OSR):

Negligible – This is a negligible outage, recorded and reported but with little or no obvious impact on business services, and no service disruptions

Minimal – This is a minimal outage where some number of IT business services are disrupted or degraded but with minimal effect on users/customers/reputation.

Significant – This is a significant outage, with observable customer/user services disruptions, mainly of limited scope, duration or effect. Minimal or no financial effect. Some reputational or compliance impact(s) possible.

Serious – This is a serious outage, with disruption of service and/or operations. Ramifications include some financial losses, compliance breaches, damage to reputation, and possible safety concerns.

Severe – This is a mission critical outage, with major, damaging disruption of services and/or operations with ramifications including large financial losses, possible safety issues, compliance breaches, customer losses and reputational damage.