1. AIOps - what's all the fuss about? In other words, what is it and why does it matter?
In essence, AIOps combines AI, ML, and big data analysis to improve IT operations (IT Ops). It does this by intelligently and autonomously spotting issues - in some cases fixing them in real time. This greatly supports a business’ need for speed, agility, and increased efficiency, while also ensuring performance and improving customer experience.
Why do they need this? IT Ops teams are faced with mounting and varied challenges. These span from managing the huge increase in operational data volumes that have scaled far beyond any human capacity to handle; to increasing complexity of IT environments; to competing with the speed and agility pressures posed from digital transformation itself. These high frequency app releases may come from Development but the performance and management responsibilities fall on IT Ops.
In short, for IT Ops to stand any hope of succeeding in the future there needs to be an evolution toward intelligent autonomy, hence AIOps.
2. AIOps – does it replace existing technologies and approaches used to monitor and manage IT, or is it more of an add-on to what an organization is already doing?
Businesses’ digital infrastructures and requirements are becoming ever more complex. AIOps is a tool to help keep track of, manage, streamline, and automate these disparate and expanding workflows – modernizing, speeding up, and automating those existing processes with ML and analytics. AIOps can assist in many ways, spanning event noise reduction, predictive alerting, probable cause analysis, and capacity analytics. Yes, some legacy tools may become obsolete but it is certainly an additional – and vital – function rather than a replacement.
3. In other words, are we talking evolution or revolution?
It is the natural evolution and convergence of IT operations infrastructure that will cause a business revolution once it reaches maturity.
4. Bearing this in mind, how much is AIOps about the new breed of monitoring and management technology solutions and how much is it about an organization’s mindset and willingness to change?
It's really about both. It's about a mind set and willingness to change in terms of modernizing those traditional monitoring and event management processes, and also adopting these new breeds of technology solutions to do that.
Some IT organizations have been scared off by thinking that they need to invest in data scientist skill sets or staffing up teams with people with data science degrees. This isn’t the case - all the intelligence should be in the solution, it should be built in. They just need the operational skill sets in order to manage it and strategically take advantage of the rich actionable insights AIOps can deliver.
Part of the responsibility here actually lies with the industry. We need to make it more tangible and the value realization clearer. At the moment, it often comes across as quite abstract and esoteric, and companies just don’t have the IT budget to invest where there is not a clear path to realizing tangible benefits.
5. Is it right to break down AIOps into separate network monitoring/management, infrastructure monitoring/management and application performance monitoring disciplines, or should AIOps be considered as one integrated monitoring and management solution?
At BMC we think an integrated solution is best. For something to be seen as a ‘true’ AIOps solution it needs to cover the three key areas of Observe (monitoring), Engage (linking ITSM and ITOM processes) and Act (for Automation). It needs to be able to detect, analyze, and act all in one solution rather than piecemeal. This holistic approach is better for AIOps as IT organizations are working across extremely complex, hybrid environments – so it’s not only more expensive to be piecemeal, but it can quickly become unmanageable too. Additionally, the value of the solution increases with the amount of cross-silo data that you can observe.
6. AIOps seems to cover a whole range of tools and solutions, ranging from the passive – this is what’s happened, and maybe why; right through to the predictive or proactive – this is about to happen and here’s what you need to do about it. What are the relative merits and drawbacks of the range of the available AIOps approaches?
Often companies will start with the passive (i.e. what can we learn), but to get the full value you need to become proactive and predictive. A good AIOps platform should support that. Yes, the historical data is certainly part of it – it’s essential to know what happened, what is the normal course of action, what is normal behavior, and what is abnormal behavior. But where you see the true value is when you move to becoming predictive and proactive. After the machine has been trained to monitor and predictively alert, you can proactively trigger automated remediation. This way you can address issues in your environment before any service impact, or before the end user even knows about it or experiences a decrease in availability or performance of their systems. That remediation part is how you close the loop in AIOps.
7. In other words, how would you characterize the relative value in working through historical data as opposed to working with streaming, live data?
You really do need both. In order to do ML you need the historical data for pattern learning. Once you’re able to identify patterns and understand system performance you can identify anomalies in current real time data and respond in a timely way.
8. AIOps – primarily, it seems to be about the optimization of an organization’s likely hybrid IT operations through better monitoring and management, but can it also offer valuable business insights at a more strategic level?
Yes, as AIOps adoption grows and evolves in maturity, it does have the potential to offer strategic insights. A good example of this is in the capacity optimization area. When looking at things like capacity management optimization, the system is analyzing historical data then making projections and forecast models. By understanding capacity metrics and workload patterns, you can predict resource saturation points, perform what-if simulations, and recommend and perform optimization actions that lower overall IT infrastructure costs.
9. So far, we’ve talked about what AIOps is, and isn’t, and the value it offers to organisations which embrace this new approach to IT operations. Before we finish, let’s look at how an organisation goes about acquiring AIOps technology. For example, what are some of the key questions to ask an AIOps vendor?
There are a number of questions. First and foremost, “what parts of the AIOps value chain do you cover?” We’ve spoken a lot about piecemeal vs. holistic approaches so understanding which (if not all) elements - observe, engage, and act - the AIOps solution covers is vitally important.
The second step is investigating which use cases are supported. IT teams need real tangible value from an AIOps strategy – they’re not just going to invest in a science experiment. So understanding and prioritizing use cases such as event noise reduction, predictive alerting, root cause analysis, and even remediation is essential.
Also you need to understand how easily these new analytics and automation tools can integrate across existing IT Ops processes and cover the entire IT environment across on-prem, cloud, and even containers. And then lastly – how immediately actionable these new automation capabilities will be.
10. And are there integrated, single vendor AIOps solutions available today, or is it more about acquiring two or three key pieces of software which together form the basis of an AIOps implementation?
Yes, there are certainly vendors (BMC being one of them) that cover the whole value chain, as well as other solutions providers which may cover only part of it. But bearing in mind the compute complexity, various hybrid environments, and huge increases in data, we see it as far more strategic to go for a holistic approach.
11. Bearing in mind that we've established the value of AIOps, where does an organization start in terms of introducing AIOps into the business? With previous technologies such as virtualization and cloud, it was possible to start with a single application in a test environment, before going more mainstream. AIOps would appear to be a bit more 'all or nothing'?
It doesn’t have to be all or nothing. There are steps to get started, and it comes back to aligning it to use cases or identifying the areas of friction within existing IT Ops processes that need to be addressed – pre-determining what the success criteria are beforehand. For example, one of the use cases we help a lot of customers with is event noise reduction. Large enterprises can have thousands of events per hour, far beyond any human scale to manage, so here AIOps can be deployed to suppress the ‘normal’ events, flag the abnormalities, and quickly provide root cause analysis and remediation guidance.
Part of the initial process is simply establishing data sources and models, and making data available and centralized in a single solution so that it can be analyzed.
Organizations certainly need to take a planned approach, but each business will have different priority use cases.
12. Are you able to share one or two examples of customers who are already benefitting from implementing AIOps?
One example is a global manufacturer for the medical industry. A key part of the AIOps value chain is acting with and engaging the service desk. This customer has a central command center that monitors over 40 critical applications across its IT infrastructure and handles events worldwide. Part of its function is ensuring the availability and performance of its global IT infrastructure as well as issue resolution. With the BMC solutions, this customer is able to monitor the entire IT environment, predictively alert before thresholds are breached, and proactively remediate more than one-third of the critical incidents. This saves the customer time by reducing the mean-time-to-identification (MTTI) and mean-time-to-repair (MTTR), and it saves the customer money by automating analysis and remediation tasks.
13. And how do you see AIOps developing over the next 12 to 18 months, both in terms of the products/solutions available and the adoption rate amongst end users?
We recently did a customer poll which found that 70% of those surveyed are currently in the “exploring options and use cases” phase. So naturally, over this time period we’d expect to see a natural migration toward “planning to” or “actively deploying” AIOps, as well as far more tangible value coming through from AIOps deployments.
We’re also going to see more of a push from the data itself. Data volumes will just keep on increasing and become less and less manageable, especially with IoT and 5G adoption. We’ll see AIOps demand increase simply for businesses to keep meeting performance demands and SLAs in the face of this tsunami of data.
We’d also expect a strengthening of the link between DevOps and IT Ops, using the rich insights from AIOps in the app development process to ensure that performance management of applications is at the forefront.
And finally, we know cloud adoption is booming; Gartner predicts that by 2025 80% of organizations will have completely shut down their data centers in favor of the cloud. AIOps will begin to take a central role in managing these cloud-based apps and services.
14. Finally, are there one or two key pieces of advice you’d like to offer individuals and organizations who are approaching AIOps for the first time?
At the core of a successful AIOps strategy and deployment lies the data. The solution needs to be able to learn patterns and apply those learnings to become more predictive and proactive. So you need to be able to ingest and consolidate diverse data sets, metrics, and logs into a single view for analysis and action.
It’s also important to be able to understand the service impact – creating a link between the events and the end services they are affecting, and getting better at analyzing the data to understand how performance is affected.
And finally prioritize your use cases – identify those areas of pain in existing processes and use specific AIOps capabilities to remove friction, increase agility, and improve service quality.