top of page
Startup Development Team

Making Sense of Event Streaming VS Traditional Messaging



Event Streaming or Messaging? Kafka or RabbitMQ? SQS? Kinesis? Azure Event Hubs? Or Azure Service Bus?


The above are a few variations of a question that has been troubling software developers and architects across organizations in the past ten or so years.


It seems that no day goes by without a discussion on the benefits of event streaming, Kafka, and whether Event Driven Architecture is right for your organization. This is especially true in the context of microservices and distributed systems in general.


Yet, there is so much confusion and ambiguity surrounding this topic and most struggle to properly formulate what is it that distinguishes event streaming from the more traditional messaging platforms. What are some of the things that one needs to be aware of when making this kind of a decision?


Looking for an answer online only adds to the confusion as there seem to be as many definitions of event driven architecture (EDA) and event streaming as there are articles about it. Compounding the problem is that there are very few resources that clearly articulate what are the exact differences (if any), advantages, and disadvantages of EDA vs. traditional messaging.


After all, with both "events" and "messages" - aren't we just sending payloads between senders and receivers?


Lastly, if there are differences between the two architecture styles, when would you use which? What would that decision tree look like? Stepping back, are event streaming and messaging really two opposing approaches or are they complementary to each other?


This article aims to clarify what are the differences between event streaming and messaging as well as how to go about deciding when to use either.


Let's get to it.


What is Messaging


When we talk about messaging or messaging platforms in this context, we are referring to an architectural pattern. In a nutshell, this pattern is based on the idea that we can decouple the communication mechanism between systems. Instead of System A communicating directly with System B, System A sends a message to an intermediary (the message broker), which is then responsible for delivering that message to System B (Or C, D, E, etc).


This style of communication is asynchronous as the sender of the message and the receiver are decoupled from each other. One system (producer) sends the message, which will then be handled by an intermediary (sometimes called the message broker). That intermediary will then handle the message and ensure its delivery to the recipient (consumer).



There are a number of important advantages to this pattern and in fact it is a foundational aspect of building resilient and scalable distributed applications. It enables systems to scale because it allows senders and receivers of messages to scale independently of each other based on the demands on each. The Message Broker middleware is now taking care of managing, routing, and queuing messages, and in doing so shifting much of the scalability problem onto itself (not entirely but to a large extent).


Messaging technologies have been around for a while. These are your ActiveMQ, RabbitMQ, and various JMS (in the Java world) implementations. There are also cloud native offerings such as SQS from AWS, Azure Service Bus from Azure, and Pub/Sub from GCP.


There are a number of popular protocols that are used by message brokers. AMQP, MQTT, and STOMP are a few notable examples.


The Queue


At the core of the messaging paradigm, there is the concept of the Message Queue. The queue is a construct that allows for the processing of messages in a FIFO (first-in first-out) fashion. In other words, messages are sent out to consumers in the order that they arrive into the queue.



The above is a simplified view as different message broker technologies will have a slightly different setup. For example, in RabbitMQ, there is an additional concept of an Exchange, which is the mechanism that producers communicate with and which is responsible for routing messages to queues.


Consumer Competition - One Consumer Per Message


There are a couple of things that characterize the architecture of a messaging system. First, consumers "compete" for messages in a sense. In other words, once a message is deemed to be successfully delivered to a consumer, it disappears from the queue. If multiple consumers are subscribed to the same queue, once one of them consumes the message, others will not receive that message.


There are exceptions to the above - in RabbitMQ, for example, there is a concept of Fanout Queues that will deliver a message to all consumers listening to that queue. Even in this case, though, once a message is delivered to all interested consumers, it will be gone.


This also means that in the case where no consumers were available to process the message, the message is lost. Now, things are a bit more complex than that since there are redelivery policies that typically come into effect where the broker will attempt to redeliver the message. There is also a Dead Letter Queue (DLQ) that can be configured to process messages that were not able to be delivered otherwise. The point, though, is that something has to handle the message (whether through redelivery, or an alternate queue, or a DLQ) or else it will be lost.


Broker Does the Heavy Lifting


Another important aspect, often misunderstood, is that in the world of messaging, most of the work is done by the broker. The broker, for the most part, is responsible for message routing/distribution, delivery, retrying, message tracking, and more. This means that the broker (or brokers as there can and typically will be more than one) has a high likelihood of being a bottleneck if not properly tuned and managed.


What is Event Streaming


Some History


To understand what event streaming is, we need to understand why it has even become a thing and why it is viewed as a concept distinct from traditional messaging. Although we are talking about event streaming as a pattern, it is unavoidable in discussing it to also steer towards discussing specifically Apache Kafka, which is the most popular streaming platform.


In fact, Kafka is the technology that started the whole paradigm as it was developed as a solution to tackle the needs for large-scale event processing at LinkedIn. At the time, LinkedIn needed a messaging technology that would enable the processing of very large volumes of operational and log data in real time.


Limitations of Traditional Messaging


Traditional messaging platforms were not sufficient for LinkedIn because of the limitations on the volumes of data these platforms could process in a given time. To a large degree, this limitation was due to the message brokers being responsible for most of the operational aspects of the messaging system. It simply was not possible to scale the broker model to the extent needed by LinkedIn.


The paradigm shift from traditional messaging that came with the advent of Event Streaming (and Kafka specifically) was that much of the operational responsibility has been transferred to the producers/publishers and consumers/subscribers. This removed a lot of the complexity from Kafka brokers/servers thus allowing them to scale much more efficiently, quickly, and easily.


Producer-Consumer Communication and the Commit Log


The mechanism by which producers and consumers of messages (events) communicate has also been revamped. Instead of the queue, producers would now write events to a "topic" and consumers would subscribe to topics in what is called a "Publish/Subscribe" model. Multiple consumers could subscribe to the same topic.


One of the big differences between an event topic and a traditional message queue is that the topic is based on a Commit Log.


The Commit Log is a data structure to which events are written and persisted (much like a log file). Once an event is written to the Commit Log, the event cannot be changed, and it will stay in the event log for a configurable time frame even if it was already read by a consumer.


Consumers would then be able to read from the topic using an "offset", which is the index of an item in that topic (or topic's partition as we will see further). This allows for consumers to read events from a topic independently of each other where each consumer can read from a different offset. It is the consumer's responsibility to notify Kafka when it is done consuming events from a topic so that Kafka can keep track of each consumer's offset.


Consumers are also grouped into Consumer Groups and that is how typically multiple instances of the same application would be organized when reading from a topic.


The view below is simplified. It does not mention things like partitions or multiple topics, for example. However, it is roughly how the relationship between consumers/consumer groups and producers looks like.




This model has opened up new avenues for reading messages because unlike the traditional model where messages were deleted after consumption, new consumers could now subscribe to the topic and read all messages/events that came in prior to that. What this means is that data can be reconstructed and queried if necessary as many times as required.


One other aspect that allows an event streaming system to scale is that a topic can be divided into partitions. These partitions allow for different sets of data to be shared across consumers and it is what allows Kafka and other event streaming platforms to scale to handling million+ events per second.


In many organizations, Kafka serves as a central messaging hub/bus as it comes with a large ecosystem of connectors (Kafka Connect) and destinations (Kafka Sinks) as well as technologies for data transformation and querying (Kafka Streams, KSQL).




Why the Confusion?


The reason why there has been so much confusion and ambiguity in understanding whether to use a messaging or an event streaming platform, is that both types of technology are responsible for essentially the same thing. That thing is the enabling of asynchronous communication between data producers and data consumers.


This is why there are many use cases and problems that both messaging and event streaming can solve. Thus, often, it may not make a significant difference whether you choose a traditional messaging broker technology such as RabbitMQ or whether you go with an event streaming platform such as Kafka. There is a very large degree of overlap between the two architectural paradigms.


To make things even more confusing, much of the terminology being used is very similar across messaging and event streaming platforms. For example, in RabbitMQ and other messaging platforms, it is also possible to use a Publish/Subscribe model to an extent. Similarly, in the Java Message Service (JMS) specification it is possible to leverage both queues and topics.


Analyzing the differences between how RabbitMQ does Pub/Sub vs Kafka is beyond the scope of this article. The very short version of it is that these concepts, although similar in name, will be implemented differently because the underlying paradigms are different.


Keeping all of the above in mind, there are some fundamental differences and there are use cases where one type of technology can be a better fit for a particular problem space.


Anatomy of a Message vs an Event


Message


One of the things that is typically different between a messaging and an event streaming platform is the idea behind how data is represented within a message vs an event.


The way that we reason about a message in a traditional messaging platform, is that this message is a self-contained unit that represents some entity. That entity could be a customer, an order, an item, or any other domain/business specific object.


If we take user creation as an example, it would look something like the following:


 "userid": "US85TR67CKL33"
 "first_name": "John"
 "last_name": "Doe"
 "operation": "create"
 "uuid":"8c00abda-b9c2-11ec-8422-0242ac120002"
 "created_at": "1649700600" 
 "email": "-encrypted-email-pii-"
 "nonce": "fs65f6fsd67f5dsfdsf"
 "country": "US"

Event


Events, however, are typically key/value pairs and are meant to represent changes in an entity's state as opposed to representing a whole entity. For example, a sensor that measures temperature will send hourly updates with temperature updates.


The timestamp of when the message was created is available as part of the message metadata.


key: "deviceId12345" 
value: "21C"

Now, there is nothing stopping us from sending the same payload such as the one in the previous example with the user. There is no rule that the value has to represent only one thing.


We could have put a JSON object in that value, which will be serialized to a binary format and later deserialized into a string and subsequently a JSON object by our application.


So, although technically, nothing is stopping us from putting a large object into an event payload's value, typically the industry best practice is to lean towards smaller payloads representing state changes when it comes to events. That is not a hard rule, but that is what many go by.


That is not to say that there are no exceptions since each use case is unique. The above is just the conceptual best practice that has been a rule of thumb in the industry. Often, we see messages representing entities of any size while events being thought of as something smaller since it often represents things like updates and state changes.


Bringing it all Together and Summary of Differences


The differences below are general rules of thumb between messaging and event streaming platforms. Some platforms, however, may have more specific differences, while others will have more overlaps and may implement paradigms from both messaging and event streaming.

Messaging

Event Streaming (Kafka)

Communication

Producers publish to message brokers ( exchanges or queues directly - depending on the messaging product)


Message brokers push messages to consumers

Producers publish events to topics.


Consumers poll topics to get new events.

Architectural Pattern

Message Queue

Publish/Subscribe

How Data is Stored

In Queues.


Once a message is consumed, it is removed from the queue.

In a Commit Log which is exposed through a topic.


Events exist for a configurable timeframe regardless of whether they have been consumed.

Is Data Persistent?

Generally, No.


Queues and messages can be made persistent/durable by writing them to disk.

However, that typically is for the purpose of not losing that data if the broker restarts. Once the data is consumed, it still is bound to go away.

Yes and it is configurable whether we want to persist events in Kafka and for how long.

Relationship of Consumer to Message/Event

One consumer can consume the event only once. Typically, once an event is consumed by a consumer, it is gone. This is called "competingconsumers"

Multiple consumers can consume an event as many times as they want.


This means that new consumers can consume past events and events can also be replayed from the "beginning of time" for that event.

Who Manages Message/Event Routing and Distribution

Broker

Producer/Consumer

What Happens During Failure

In general, brokers are responsible for message redelivery (though there is a lot to unpack on how exactly this works).


If there is a DLQ configured, undelivered messages will be delivered to the DLQ.

It is the application/developer's responsibility to implement redelivery and DLQ topics.

Typical Protocols

Open protocols such as AMQP, MQTT, STOMP


Sometimes proprietary such as IBM MQ

Proprietary (ie Kafka has its own protocol, so does Apache Pulsar)

How Scaling is Achieved

Vertical Scaling of Brokers or adding more brokers, which requires careful design of the broker/exchange/queue topology.


Tuning various parameters on the brokers themselves as well as on the consumer side.

Horizontal scaling by adding more Kafka Brokers.


Partitioning/Sharding of topics, which allows for parallelization of consumers reading (1 consumer from a consumer group per partition)

Typical Focus On

Larger pieces of data (messages)


Allowing for complex messaging topologies (through exchanges, routes, and queues) where messages can be routed through different queues based on filtering criteria.


Most basic asynchronous communication scenarios.

  • Scalability

  • High Availability through replication

  • Performance

  • Processing large volumes of small-sized pieces of data (events)

  • Event transformation

Typical Bottlenecks

Brokers are more difficult to scale


Brokers need to sync data from OS cache to hard disks (if persistent queues are used)


Queues getting too big if consumers can't keep up.

Not enough partitions or not enough consumers


Consumers cannot keep up with producers.


When to Use What


There are many suggestions online as to best practices and which use cases fit event streaming vs which use cases are a better fit with traditional messaging.


You will often hear the following general advice:


Messaging

  • When complex message routing topologies are required

  • When you need the message brokers to handle things like message redelivery and dead letter queueing

  • When you want FIFO-like processing where the first consumer that gets the message, also consumes it and the message is removed after it is consumed

  • Specifically this may be pertinent to traditional systems where you need an asynchronous communication mechanism between various microservices or parts of an application.

Event Streaming

  • When you are looking to process very large volumes (think hundreds of thousands or millions of requests per second)

  • When you need a fairly straight forward and/or on-demand horizontal scaling by adding commodity hardware (or cloud infrastructure) as needed

  • Processing of real or near real-time data

  • Often mentioned uses cases include IoT data coming from a variety of sensors, analytics data, logs and other operational data

  • When you want the ability to replay your events and the ability to store them indefinitely within your event streaming platform (for audit or other purposes)

  • Kafka is often used as a universal message/event bus in an organization by connecting various data sources, producers, and consumers.


I Still Don't Know What to Use


The above are popular and solid guidelines. However, let's take a slightly different approach for laying down a foundation for how to decide on which technology to use.


More often than not, it is hard to predict ahead of time, what volumes of data our applications will process or what our data requirements will be like years down the line.


Traditional messaging has been around for a long time and has evolved to a point where it actually is able to satisfy the needs of most small, medium, and even large companies. The reason why the need for Kafka and event streaming arose to begin with, was to enable scales of data processing that were not possible to process within traditional messaging platforms.


So the first question we must ask when deciding upon which platform to go with is - what kind of data volume/traffic can we expect based on the nature of our business?


Say we are building an online store that sells high-end custom-made PC's for gaming. Now, gaming is definitely something that is very popular. However, high-end custom made PC's may not necessarily be something that will be ordered by the millions each day. You might expect to grow to a few thousands (or say tens of thousands) of orders a day within a few years if you are targeting North America-only. Using something like Kafka to process, at most, 50K requests daily may be a bit of an overkill. So traditional messaging will most likely serve you well here.


However, say you were building the next hot thing in website analytics and are counting every click and move that the user does with their mouse on a website. You will end up generating billions if not trillions of events per day if you scale to the size of leading traffic analytics platforms. At this scale, event streaming would definitely make sense.


Again, these are not hard rules and there is a lot more that can go into the decision of which messaging/event platform to choose. There are combinations of factors and each business's demands are unique. Hopefully, this piece has provided some guidance on how to navigate the terrain.


For a more in-depth analysis of your organization's specific needs, feel free to contact us.


What We Left Out


There are a plethora of other important concepts that we did not get to cover here, but which are important to understand when deciding on which messaging/event stream platform you should use. Things like Processing Semantics (At-Least Once, At-Most Once, Exactly Once), retrying upon failure, batching, optimization, high availability are all key to get familiar with.


Also, cloud providers such as AWS, Azure, and GCP often combine elements of both messaging and event streaming in their cloud native offerings such as SQS (AWS), Azure (Event Hubs), and PubSub (GCP). It is important to understand the benefits and challenges of the messaging/event streaming solution that your cloud provides before making use of non cloud-native products such as Kafka, ActiveMQ, or RabbitMQ.


For the curious - this article from Doordash provides a good overview of a migration that the company has done from RabbitMQ to Kafka to support the company's scaling requirements.


Comments


bottom of page