Preparing for the future
In order to prepare for the future – we must first understand the changes that are taking place now. There have been several notable shifts over the last few years when it comes to the world of operations.
The first being virtualisation. This is where we went from having a single bare metal server running a few applications to having a single server running many virtualised “servers”. These servers were abstracted by virtualising the underlying hardware of the server. This effectively allowed operators to run multiple “servers” on a single physical server. The benefit, being a more balanced workload across infrastructure and the ability to consolidate virtual machines onto fewer physical servers – meaning less initial investment for IT operators.
The second shift has been in the advent and overwhelming adoption of containers. Containers are similar to virtualisation, except that they take the abstraction to the next level. Instead of just virtualising the hardware and running full blown operating systems on each VM, containers run on top of the operating system of a host or node. This means many workloads run on top of a single operating system.
These nodes don’t have to be on bare metal. They could also be VM’s. The idea is that there is one “server” able to run many containers with the ability to balance the workload over those servers becoming more efficient.
The last, most recent shift is is Functions as a Service (FaaS). Some people call this serverless since it eliminates the need for someone within the organisation to maintain a server. This doesn’t mean that there isn’t a server somewhere running the function, it’s just that someone else is making sure that it runs.
FaaS allows for software developers to write only their business logic and then upload it to a FaaS service, with a public cloud provider like AWS. The running of the servers that are powering the containers that track business logic is completely abstracted away leaving businesses with the ability to focus on only the application’s development.
Due to the abstraction away from hardware and the ephemeral nature of modern applications, within the next few years we won’t care about infrastructure anymore. The more we remove ourselves and our applications from the bare metal, the less we should have to care about it.
Think about it. If an operator is running a totally serverless application on a public cloud, not only is there no need for them to care about the infrastructure behind it, but, it is also not possible for the operator to monitor it. There’s no way to access the metrics from the network or servers behind the containers that are running the code.
In the case of containers, DevOps teams running applications in containers across a well-built Kubernetes cluster or a managed cluster running in the cloud shouldn’t have to think about the hardware that’s running it. More and more, the management of K8 clusters or similar will be ‘outsourced’ to the cloud and neither the hardware underneath these managed clusters nor the clusters themselves will be of any real concern to the company running the application.
The reason that outsourcing this work makes sense is that with the abstraction of computing, hardware and the running of it becomes more of a commodity.
Monitoring in the future
The question then arises – what does the future of monitoring look like? To answer this question, we must focus on the application itself, rather than the workloads running on the infrastructure.
Observability is a good way to think about this. It includes metrics, logs and traces directly pulled or pushed from our workload or application. With this data, we are then able to infer the current state of a system from its external outputs and gain context in order to understand that state.
High cardinality in our monitoring data used to be a non-pattern and something that everyone tried to avoid, but to make an application observable, storing highly cardinal data is a must in order to really delve into a problem when it occurs.
When the root cause of the problem has been identified, IT staff can then review past logs to check whether when it’s trying to write to a specific node of your database cluster, the writes are taking far too long and causing timeouts.
Infrastructure is taking a backseat. As the demands around monitoring continue to shift, observability is going to play a bigger role than ever before. It’s time we utilise our data for greater business benefits.