Real-Time Analytics Pipeline with Kafka and ClickHouse (Step-by-Step)

Building a Real-Time Analytics Pipeline with Kafka and ClickHouse

Kafka is great at moving data, and ClickHouse is great at querying it. Together, they form a powerful pair for real-time analytics.

High-level architecture

Applications publish events (such as page views and purchases) into Kafka topics.
ClickHouse reads those events from Kafka using a special Kafka engine table.
A materialized view transforms the raw events into a query-friendly format.
Dashboards and APIs query ClickHouse in real time.

Kafka engine table

In ClickHouse, you can define a table that consumes directly from Kafka:

CREATE TABLE kafka_events
(
  event_time DateTime,
  user_id    UInt32,
  event_type String,
  revenue    Float32
)
ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka:9092',
         kafka_topic_list  = 'events',
         kafka_group_name  = 'clickhouse-consumer',
         kafka_format      = 'JSONEachRow';

Materialized view into a MergeTree table

Then you create a real table for queries and connect it with a materialized view:

CREATE TABLE events_rt
(
  event_time DateTime,
  user_id    UInt32,
  event_type LowCardinality(String),
  revenue    Float32
)
ENGINE = MergeTree
PARTITION BY toDate(event_time)
ORDER BY (event_time, user_id);

CREATE MATERIALIZED VIEW mv_events_rt
TO events_rt AS
SELECT *
FROM kafka_events;

As Kafka messages arrive, ClickHouse ingests them through the materialized view, and your queries against events_rt stay up to date.

Dashboarding

You can connect tools like Grafana or Metabase directly to ClickHouse. Typical widgets include:

Events per second over time.
Revenue by country in the last 15 minutes.
Active users per application or game.

Operational tips

Use dedicated consumer groups for ClickHouse so it doesn’t compete with other services.
Monitor consumer lag to ensure ClickHouse is keeping up with Kafka.
Set sensible partitions and retention on Kafka topics to balance cost and freshness.

With a few hundred lines of configuration, you can turn raw Kafka events into a real-time analytics system that feels almost magical to product teams.