Apache Kafka 101: Topics, Partitions, and Consumer Groups

Apache Kafka is the backbone of many real-time data platforms. It’s often described as a “distributed commit log” or “event streaming platform,” but what does that really mean for developers and data engineers?

Kafka topics

A topic is like a category or feed name. Producers write messages to a topic; consumers read messages from it. Examples:

user-signups
page-views
orders

Topics are append-only logs. New events are added to the end, and Kafka keeps them for a configurable retention period (for example, 7 days).

Partitions and scalability

Each topic is split into one or more partitions. Partitions are the key to Kafka’s scalability and parallelism:

Each partition is an ordered, immutable sequence of messages.
Partitions can live on different brokers, so Kafka scales horizontally.
Consumers in the same group can read different partitions in parallel.

When you create a topic, you choose how many partitions it should have. More partitions allow more throughput but also mean more files and open connections to manage.

Consumer groups and parallel processing

A consumer group is a set of consumers that coordinate to read from a topic. Kafka guarantees that each partition is read by at most one consumer in a group.

This pattern is powerful:

Add consumers to scale horizontally.
If a consumer crashes, another in the group takes over its partitions.
You can have multiple groups (for example, analytics, billing, monitoring) all reading the same topic independently.

A simple example

Create a topic with three partitions:

bin/kafka-topics.sh --create --topic orders   --bootstrap-server localhost:9092   --partitions 3 --replication-factor 1

Then start a few consumers in the same group. Kafka will spread the partitions across them. Add more consumers and they will share the load automatically.

Where to go next

Once you understand topics, partitions, and consumer groups, you are ready to explore more advanced features: exactly-once semantics, stream processing with Kafka Streams, and integrations with tools like Spark and ClickHouse.