AIOps in the spotlight - Part 4

AIOps is an emerging technology which offers the promise of helping IT and data centre teams get to grips with the growing complexity of their respective infrastructure environments – with the ultimate objective of ensuring application performance optimisation. In this issue of Digitalisation World, you’ll find a variety of thoughts and opinions as to just what AIOps offers, and why it matters.

  • Friday, 6th December 2019 Posted 4 years ago in by Phil Alsop

Today, most IT teams find themselves facing a number of challenges presented by the new and increasingly complex infrastructure that accompanies digitisation, including an exponential increase in data volumes and types. In fact, Gartner estimates that the data volumes generated by IT infrastructure and applications are increasing two- to three-fold every year (and that’s compounding growth). There’s clearly too much data for the humans on the IT team to sort through on their own.

 

By Vijay Kurkal, COO at Resolve.

 

 

Expanding infrastructure also results in 1000s of events streaming in every day to overtasked admins on the front lines. Given the immense volumes and the high rate of false alarms, IT teams are forced to simply ignore many of these alerts. On top of that, teams are tasked with tracking a dynamic, ever-morphing infrastructure that is heavily virtualised and spread across hybrid environments in the cloud and multiple data centres. Despite these challenges, IT is still expected to resolve requests, incidents, and performance issues in seconds, not days – without introducing more people to their already overburdened teams. 

 

AIOps, particularly when combined with automation, can help IT teams survive and thrive in this new era of increasing complexity. AIOps is a term coined to describe the use of artificial intelligence (AI) to aid in IT operations. AIOps technologies harness AI, machine learning, and advanced analytics to aggregate and analyse immense volumes of data collected from a wide variety of sources across the IT infrastructure. In doing so, AIOps quickly identifies existing or potential performance issues, spots anomalies, and pinpoints the root cause of problems. Through machine learning and advanced pattern matching, these solutions can even effectively predict future issues, enabling IT teams to automate proactive fixes before issues ever impact the business.

 

AIOps technologies also offer advanced correlation capabilities to determine how alarms relate to one another. This separates the signal from the noise and ensures IT teams focus their attention in the right place, streamlining operations. Additionally, many AIOps solutions can automatically map the dependencies between dynamic, changing infrastructure components to provide real-time visualisation of the relationships between applications and underlying technology. This makes it much easier to see how things are connected when troubleshooting and significantly reduces the time to solve problems.

 

While AIOps on its own drives tremendous value, the magic really happens when it is combined with robust automation capabilities that can take immediate and automated actions on the insights powered by the AI. When these technologies come together, they deliver a closed-loop system of discovery, analysis, detection, prediction, and automation, bringing IT closer to achieving the long-awaited promise of truly “self-healing IT.”

 

For IT teams faced with managing increasing complexity, AIOps and automation are the key to improving operational efficiency, reducing mean time to resolution (MTTR), and increasing the performance of business-critical infrastructure.  It’s finally IT’s turn to harness powerful AI-driven and automation technologies – and it must to ensure the success of its digital transformation efforts.

The fundamental promise of AIOPs is to enhance or replace a range of IT operations processes through combining big data with AI or machine learning, explains Justyn Goodenough, International Area VP at Unravel Data:

 

This promise holds appeal across industries as enterprises recognise the potential for AIOPS to solve expensive, challenging, and time-consuming problems in their big data deployment. Not only does AIOPS have the potential to drastically reduce the cost of deployments, it can do so while improving performance.

 

These increases in performance are largely achieved through automating or enhancing processes across an extensive range of use cases. For almost all workloads, AIOPS has the potential to automate several integral tasks including: workload management, cloud cost management, performance optimisation and remediation, and other processes. While running these tasks would typically be the responsibility of the data team, where they are a time-consuming and repetitive process, AI or ML can independently perform them at a much greater speed. This allows for data teams to focus on value on initiatives instead of constantly firefighting.

 

However, getting AIOPS right is a challenge - you only get out what you put in. This is to say that these AI-solutions need quality data inputs in order to generate useful outputs. These inputs need to be relevant, accurate, timely and comprehensive. To ensure their data inputs satisfy these criteria, enterprises need to measure the business outcomes they care about (time to insight, transaction response time, job completion time etc.) accurately, frequently and consistently. 

 

That being said, integrating AIOPS into big data workloads should be addressed on a case-by-case basis. What works for one organisation may not for another. In recognition of this, several distinct approaches to AIOPS have been developed:

  • Rule based – This is arguably the least ‘intelligent’ instance of AIOPs with data teams developing rules and automated responses to specific instances in the data stack. While this is the easiest to implement, it only works for constrained and limited use cases and can be difficult to maintain on a long-term basis
  • Neural Net based – Unlike the rule based approach, neural networks systems ‘learn’ to perform tasks by considering examples without being programmed with task-specific rules. While this is a more true example of AI, it does require training and can be problematic in dynamic environments
  • Unsupervised Self Learning – This approach requires the least involvement from the data team as the AI is left to teach itself. However, it is difficult to focus the AI on addressing the desired KPIs as its self-learning process lacks guidance
  • Supervised Self Learning – Addressing the issues of unsupervised self learning, supervised self learning combines human expertise with machine learning. This results in AI more directly addressing desired areas but requires more involvement from the data team

 

For enterprises looking to optimise their big data deployments, AIOPS is a necessary consideration. While a daunting prospect, evaluating which of these approaches is most pertinent to enterprise needs and if there are sufficient data inputs available is a good first step in the path to AIOPS.