Digitalisation World Q&A on Kafka

With Michael Noll, Principal Technologist, Office of the CTO at Confluent.

  • 4 years ago Posted in

Where did Kafka originate from?

Kafka started life at LinkedIn,initially, as high throughput, real-time infrastructure for moving data into Hadoop. It became apparent as time passed that this conduit was a precious resource in of itself because it could store large volumes of data and deliver them, not only to Hadoop but also to the plethora of other services that kept LinkedIn’s social network alive: people you may know, who’s viewed your profile, etc. As Kafka's usage grew, the team added stream processing features, allowing users to perform SQL-like operations on high throughput datasets as they moved through the company. The project was open sourced and donated to the Apache Foundation in 2011. As the benefits of messaging, storage, and processing in a single system propagated its way through the tech industry, a new category of data infrastructure became established. Kafka is now one of the most active open source projects and is used by the majority of listed companies the worldover.

Why did Kafka gain momentum so quickly?

Kafka is at the right place at the right time: it has become the technological foundation and digital 'central nervous system' for the always-on world, where businesses are increasingly software-defined and automated, and where the user of software is more software. We only need to look back at the past few years to see that major trends like cloud computing, artificial intelligence, ubiquitous mobile devices, and the Internet of Things have caused an explosion of data. And companies have been struggling with keeping pace to collect, process, and act on all this information as quickly as possible -- be it to serve their customers faster, to gain an edge on the competition, etc. The result? Whether you shop online, make payments, order a meal, book a hotel, use electricity, or drive a car: it's very likely that, in one form or another, this is nowadays powered by Kafka.

What problems do these ‘event streams’ solve for organisations?

At a high-level event streams connect data in different parts of a system or organisation together. But a simple explanation like this belies the real impact that event streaming has had on organisations. Event streaming systems provide storage and processing capabilities in addition to high throughput messaging. Properties that prove critical in real-world organisations because of various means by which they adapt and evolve. Here, answering simple questions like ‘how do you get System A to share data with System B’ is rarely enough. Modern digital companies face more nuanced questions: “Where do I get this data from?”, “How do I get both real-time and historical data?” or “Why doesn’t this data have everything I need?”. Event streaming systems like Kafka lay the foundations for answering these trickier questions, leveraging their combination of storing historical messages, in-built processing and real-time data to do so. All in all, it’s a more powerful and nuanced solution than those that preceded it, even though, at a high level, all such systems connect different parts of a system or organisation together.

Are there still barriers holding Kafka back, or slowing the market’s adoption of it? What challenges lie ahead?

Apache Kafka is at its most powerful when it acts as a central nervous system for all of an organisation's data. This is when all the data in an organisation is instantly available to all applications and people through the platform. With this incredible wealth of data available, new business can be uncovered, customer experiences exceeded, and new operational efficiencies realised.

However, Kafka is a complex distributed system and most organisations are built on top of a spaghetti mess of other systems. So without a team of Kafka experts, the bar for getting to a central nervous system is too high for many companies. This is why we at Confluent created Project Metamorphosis. By bringing the attributes of modern cloud computing systems to Kafka like elasticity, self-management, and unlimited storage, we're solving the most pressing issues that organisations run into when making event streaming a pervasive part of their business.

 

By Anugraha Benjamin, Manager, Infrastructure at Progress.
By Hans De Visser, Chief Product Officer, Mendix.
By Andy Mills, VP of EMEA for Cequence Security.