The benefits of AIOps
By Paul Mercina, Director of Product Management, ParkPlace Technologies.
AI technologies are all the rage and in many cases the capabilities as they exist today are over-hyped. One area where AI is delivering on its promise, however, is in AIOps.
Originally coined to mean “algorithmic IT operations” and then updated to “artificial intelligence for IT operations,” AIOps nonetheless remains securely rooted in the mature fields of big data analytics and machine learning. When applied to increasingly complex IT infrastructure, these tools tame the barrage of information and provide real-time and predictive insights to improve operations.
AIOps has arrived just in time to save data center teams from manual oversight of IT systems and facilities and the overwhelming volume of alerts and information delivered across a variety of monitoring systems. Separating signal from noise and highest impact issues from less critical problems has become extremely difficult for any IT organization. Root cause analysis is, similarly, reaching beyond mere human capabilities.
By aggregating log files and monitoring system data, AIOps moves the data center toward the coveted single-pane-of-glass visibility. Automation features can take over the burgeoning variety of administrative tasks, reducing costs and stretching staff resources further. And machine learning systems, armed with large volumes of data, can perform pattern analysis and outlier identification, resulting in increasingly accurate and predictive fault identification and recommended interventions.
These are powerful advantages, but the downstream impacts for the business and its customers are what’s truly driving adoption. AIOps is fast becoming indispensable for organizations determined to keep pace with customers’ high demands for availability, reliability, and performance. In fact, AIOps is among the most promising—and proven—options for boosting uptime. Interventions shift from after-the-fact repairs to real-time and even proactive solutions implemented before systems go down in the first place.
The applications for AIOps are quickly multiplying. Google, in a well-publicised example, has developed a system to monitor temperature, cooling system, and other information from hundreds of sensors around the data center and recommend changes. The company recorded a 40 percent reduction in cooling-related energy requirements, and commercial AIOps-based facilities management systems are seeking similar impacts.
Park Place Technologies deploys AIOps for hardware monitoring and has achieved a 31 percent reduction in mean time to repair across thousands of customer sites. We’re benefiting from proactive fault detection and the automation of triage and trouble ticket generation to give engineers more timely and complete information to effect repairs and prevent downtime.
Additional use cases span resource utilisation to storage management to cyberthreat analysis. Fortunately, there are off-the-shelf AIOps tools available for a variety of purposes, as well as managed services providers integrating AIOps applications. These third-party solutions not only offer turnkey opportunities to engage AIOps, they also achieve faster time to value.
As progress continues, we can expect increasingly integrated, end-to-end AIOPs systems capable of analysing language and other complex inputs, leveraging deep neural networks, and automating more of the adjustments recommended by the machine learning algorithms. Data center leaders will need to get used to the idea of turning over more functions to these powerful autonomic solutions, so they themselves can make better use of their precious time.
AIOps: do you really need it?
By Nigel Kersten, Field CTO Puppet.
AIOps is the natural evolution of IT operations analytics, where we employ big data and machine learning to do real-time analysis of our IT data sources in order to automatically identify and remediate issues.
Everyone is dealing with the fact that their environments have become too complex and vary far too frequently to manage manually, and that even a small amount of automation is insufficient to keep up with business demands. AIOps sounds great! Let’s get the robots to do all of that menial, manual and mundane work that most IT departments are suffering under! It’s a seductive proposition – just rub some machine learning on a problem and it will go away!
The reality is that, as we say in computing, “garbage in, garbage out”. Most companies don’t have consistent enough data to make automated decisions on, or if they do, then that data is stuck inside an organizational silo and isn’t easily accessible to machine learning tools.
To get to a position where you can choose to take advantage of AIOps, you need consistent data about your infrastructure and release processes. To get consistent data about infrastructure state, invest in an automation platform, stop making changes manually, and adopt an infrastructure-as-code solution. To get consistent data about process times and change releases, invest in automating as much of your software delivery lifecycle as you can and minimise human interaction in your change management processes. Build robust automation and try to get rid of your change committees.
Many of the problems I see people trying to solve with AIOps could be more easily solved by taking an incremental approach to automation and addressing the underlying root causes. If your teams are drowning under a constant wave of menial and low-value tickets, don’t look to adopt an AIOps tool to tell you which issues are actually important so they can work on them – apply systems thinking, look for the underlying causes, automate away the inconsistency in your environments, automate your release processes and empower your teams to operate with minimum viable bureaucracy.
Once you’ve got to that point, then investigate whether AIOps can help. You might find that the problems no longer exist.