writing

Learn Apache Kafka: The Simplest Demo

Apache Kafka shows up in a lot of systems but rarely gets explained from the ground up. To make it easier to understand, I built the simplest possible demo of two services talking through a broker

Michael Movsesov
Michael Movsesov

Apache Kafka can feel intimidating from the outside. It shows up in service diagrams, in deploy configs, in conversations about scale and event-driven architecture, and the vocabulary (brokers, topics, partitions, consumer groups, offsets) lands all at once. It's easy to end up with a rough sense of the shape of it without any of the parts meaning something specific.

So I built the simplest possible version of it. Two tiny Node services, one Kafka broker in Docker between them, and no shortcuts. The point was to make every word mean something.

The Demo

Say we run a small ecommerce store. Every time a customer places an order we want to send them a confirmation email, and probably a few other things eventually: kick off fulfillment, update analytics, ping the warehouse. The demo only covers the first piece, the email, but the shape is the same for the rest.

The setup is deliberately small. The order-service exposes a single endpoint, POST /orders, and every time it gets a request it drops an OrderPlaced event into Kafka. The notification-service does not know that endpoint exists. It listens to Kafka and pretends to send an email whenever an order shows up. Once you get this running, the pieces start to click.

The whole thing lives in this repo:

repositorymichaelmov/kafka-demo
michaelmov/kafka-demo
JavaScript00

Why Not Just Call Notificatin Service Directly?

If the order-service called the notification-service directly over HTTP, the two of them are tied together. If notification-service is down, orders fail. If we want to add a second consumer later (an analytics-service, say), we have to go back and change the order-service. Every new feature ends up dragging a thread through every existing service.

Kafka flips this around. The order-service does not call the notification-service. It does not even know it exists. It connects to a Kafka broker and writes events to a named topic:

// order-service (producer)
const kafka = new Kafka({
  clientId: 'order-service',
  brokers: ['localhost:9092'],
});
 
const producer = kafka.producer();
await producer.connect();
 
await producer.send({
  topic: 'orders',
  messages: [{ key: event.orderId, value: JSON.stringify(event) }],
});

Notice that here is no mention of who reads the topic, no second hostname, no retry on the consumer being unreachable, because there is no consumer in the producer's world. Anyone who wants to react can subscribe on their own schedule, and the producer is done after producer.send.

What the Broker Actually Does

A broker is a Kafka server. Producers do not send messages to each other; they send them to the broker. Consumers do not pull messages from each other either; they pull from the broker. The broker is the thing in the middle that holds messages durably on disk, hands them out to anyone subscribed, and remembers how far each subscriber has gotten. Take the broker away and nothing is wired together anymore.

fig. 1 — broker as the thing in the middle
producer
order-service
idle
broker
orders
empty log
consumer
notification-service
waiting
0 messages in orders

In the demo there is one broker, running in a single Docker container, on localhost:9092. In a real cluster we would run several so a single broker dying does not lose data, but for understanding the system, one is enough. Once the broker is in the picture, the rest of the vocabulary stops feeling like jargon.

The Vocabulary

A topic is a named log of messages that lives on the broker. The one in the demo is called orders. Producers write to a topic and consumers read from one. The mental model that helped most is a named append-only file, not a queue. (More on that in a moment.)

A producer is anything that writes to a topic. In the demo that is the order-service calling producer.send, which we already saw.

A consumer is anything that reads from a topic. The notification-service runs a loop that polls the broker and gets handed messages one at a time:

// notification-service (consumer)
const kafka = new Kafka({
  clientId: 'notification-service',
  brokers: ['localhost:9092'],
});
 
const consumer = kafka.consumer({ groupId: 'notification-group' });
await consumer.connect();
 
await consumer.subscribe({ topic: 'orders', fromBeginning: true });
 
await consumer.run({
  eachMessage: async ({ message }) => {
    const event = JSON.parse(message.value.toString());
    console.log(`Sending email to ${event.customerId}...`);
  },
});

An offset is the position of a message in the topic. Message zero, message one, message two, and so on, forever. Kafka never renumbers them.

A consumer group is the label Kafka uses to remember how far a particular consumer has read. The notification-service joined the group notification-group, and Kafka kept track of "this group has processed up to offset 14" on its behalf. If the service crashed and restarted, it would pick up at offset 15 without re-doing the work. That last piece is the one most articles skim past, and it turns out to be the whole reason Kafka feels different from other messaging systems.

A Topic Is a Log, Not a Queue

In a traditional queue, like RabbitMQ or SQS, a message is gone the moment it is consumed. Whoever pulls it off the queue owns it; nobody else will ever see it. That is great when we are dispatching work to a pool of workers, but it is not what is going on in Kafka.

In Kafka, the topic is an append-only log that the broker keeps around for a configurable retention period (the default is a week). Consumers do not "take" messages off it. They read past them, and Kafka quietly remembers, per consumer group, where each reader has gotten to. Two completely different services can subscribe to the same topic, read the same messages, and neither one affects the other. A brand new service can join next month, ask to start from the beginning, and replay everything that ever happened.

fig. 2 — topic as an append-only log
analytics
notif
00ORD-1000
01ORD-1001
02ORD-1002
03ORD-1003
04ORD-1004
05ORD-1005
06ORD-1006
07ORD-1007
notification-group@ 06analytics-group@ 03

Once we have watched two cursors move across the same log independently, "Kafka is a log" stops being a slogan. It is a persistent record that consumers happen to read forward through, at their own pace.

What the Demo Did Not Cover

The demo was deliberately at the simplest level: one broker, one topic, one partition, one producer, one consumer. There are two ideas worth flagging for next time, the ones that come up the moment we start thinking about scale.

Partitions

Partitions are how Kafka splits a topic across multiple physical logs so it can be read and written in parallel. They are also the unit at which Kafka guarantees ordering: messages are strictly ordered within a partition, but not across partitions. Once we care about throughput, this is the first knob.

Consumer groups

Multiple consumers in the same group is how Kafka scales reads. If we have a topic with four partitions and four consumers in a group, Kafka hands each consumer one partition's worth of work. Add a fifth consumer and it sits idle. Drop one and the others split the leftover work. It is clever, and worth its own demo.

Both got left out on purpose. The point of the exercise was to be able to point at a broker, a topic, a producer, and a consumer in real code and say "this is that thing." Partitions and group rebalancing make a lot more sense once the bottom layer is solid.