AIOps applies Machine Learning techniques and Big Data Analytics to large collections of data gathered from various IT operations tools in order to identify issues in real time and proactively fix them. The corporate network is a key resource that enables transactions between enterprise users and applications. So, let’s try to understand the role of network monitoring in implementing a robust AIOps strategy for an enterprise.
The IT Operations Team’s Challenge
Over the last few years, we have seen a significant change in how enterprise applications and services are deployed and delivered. Legacy architectures relied heavily on enterprise IT infrastructure and networks to host, deliver, and run key business-centric applications. Enterprises had complete control over the performance of the network, the infrastructure, and the applications. Today, with the migration of applications and services to hybrid networks (i.e., on-premises and cloud), and with an increased adoption of distributed microservices based architectures, enterprises are struggling to ensure optimum performance of services across infrastructure over which they have little control.
Although many of the business-centric applications are offloaded to external service providers, the IT Operations team finds it ever more challenging today to ensure an uninterrupted availability of these applications. Imagine the intense pressure on the IT Manager when a video conference session during an earnings call ends abruptly because of a problem somewhere: in the corporate network or in the service provider’s network! It is therefore imperative that IT managers have end-to-end visibility across private and public networks, and across the various distributed applications that are collaborating to deliver a seamless service to users. Essentially an IT Operations team must be able to:
a) Monitor the performance of applications and the infrastructure (physical and virtual) over which the applications are running.
b) Monitor the network to quickly identify and locate bottlenecks and performance issues related to the network infrastructure that connects enterprise users to external, cloud-based services.
How Does AIOps Impact Application Performance?
According to Gartner Research, application performance monitoring (APM) involves using software and/or hardware components to monitor three key aspects:
1. Digital Experience Monitoring (DEM), which deals with monitoring how end-to-end application availability impacts overall experience.
2. Application discovery, tracing, and diagnostics (ADTD), which deals with the discovery of applications and visualization of their topology, logging of transactions between applications, and diagnostics related to application components.
3. Artificial Intelligence for IT Operations (AIOps), which has emerged from the adoption of Artificial Intelligence (AI) and Machine Learning (ML) techniques to analyze application behavior. Preventive Healing – the next frontier for this cutting-edge technology – involves coming up with lead signals or early warnings, flagging anomalies, predicting bottlenecks in application performance, and taking remedial actions to mitigate or avert problems even before they can occur.
Monitoring the Network – Easier Said Than Done!
Traditional network management systems relied primarily on managing devices using Command Line Interfaces (CLI) and protocols like Simple Network Management Protocol (SNMP). However, these approaches are insufficient when dealing with today’s networks. With advances in networking and communication technologies, there is a proliferation of high-speed networks that use a variety of physical and wireless media to deliver different types of traffic such as voice, video, and data. Furthermore, with the growing appeal for cloud computing, corporate networks are required to be connected to the public Internet – exposing corporate networks to unprecedented security risks. It is a well-known fact that whenever end-to-end service performance is degraded, the network is the first to be blamed, although in fact, the problem may be related to a poorly performing application or to an abnormal user-triggered activity. In such cases, without full visibility into the network, the IT Operations team is usually the last to hear about a problem – long after a user has experienced service degradation – and the IT response is at best, reactive in nature. A more proactive approach will help mitigate problems and ensure optimum service levels.
What is NPMD and Why is it Important?
As noted earlier, the biggest challenge for IT is to keep the network up and running all the time. IT Managers lack the granular network visibility that is required to help them understand what applications and services are running across corporate-, public- and third-party networks and how to effectively detect, troubleshoot and resolve performance problems when they surface. What is even more challenging is the fact that cloud-based architectures make a prolific use of microservices residing in containers that live on distributed networks, which are themselves virtual and dynamic in nature. As a matter of fact, network virtualization adds an unprecedented level of complexity to managing present day networks.
Thankfully, the simple management techniques of the past have been augmented by a plethora of new network protocols and technologies in recent times to provide unprecedented visibility into the network and its performance. The traditional network management function has taken a broader scope and is widely known today as Network Performance Monitoring and Diagnostics (NPMD). NPMD includes the monitoring of physical network infrastructure and servers, measurement of network- and application-level traffic flows, and detection/remediation of problems across the network. NPMD tools can process data from a variety of sources comprising network-device-generated logs, health metrics and traffic statistics (such as application-specific bandwidth usage, network latency, packet loss, etc.), to provide real-time and historical views of traffic patterns and predict network behavior. Additionally, NPMD tools can integrate with AIOps to identify the root causes of application performance issues. Thus, NPMD tools essentially provide the means to understand how network performance impacts application performance and the resulting digital experience for the end-user.
The Bottom Line
It is no surprise that APM and NPMD go hand-in-hand and are essential components of a good IT strategy for today’s complex networks. It is essential to have good NPMD tools in place and to ensure that they integrate well with AIOps software so that end-to-end service experience can be guaranteed.