Real-Time Analytics Pipeline with Kafka and ClickHouse (Step-by-Step)

Building a Real-Time Analytics Pipeline with Kafka and ClickHouse

Kafka is great at moving data, and ClickHouse is great at querying it. Together, they form a powerful pair for real-time analytics.

High-level architecture

  • Applications publish events (such as page views and purchases) into Kafka topics.
  • ClickHouse reads those events from Kafka using a special Kafka engine table.
  • A materialized view transforms the raw events into a query-friendly format.
  • Dashboards and APIs query ClickHouse in real time.

Kafka engine table

In ClickHouse, you can define a table that consumes directly from Kafka:

CREATE TABLE kafka_events
(
  event_time DateTime,
  user_id    UInt32,
  event_type String,
  revenue    Float32
)
ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka:9092',
         kafka_topic_list  = 'events',
         kafka_group_name  = 'clickhouse-consumer',
         kafka_format      = 'JSONEachRow';

Materialized view into a MergeTree table

Then you create a real table for queries and connect it with a materialized view:

CREATE TABLE events_rt
(
  event_time DateTime,
  user_id    UInt32,
  event_type LowCardinality(String),
  revenue    Float32
)
ENGINE = MergeTree
PARTITION BY toDate(event_time)
ORDER BY (event_time, user_id);

CREATE MATERIALIZED VIEW mv_events_rt
TO events_rt AS
SELECT *
FROM kafka_events;

As Kafka messages arrive, ClickHouse ingests them through the materialized view, and your queries against events_rt stay up to date.

Dashboarding

You can connect tools like Grafana or Metabase directly to ClickHouse. Typical widgets include:

  • Events per second over time.
  • Revenue by country in the last 15 minutes.
  • Active users per application or game.

Operational tips

  • Use dedicated consumer groups for ClickHouse so it doesn’t compete with other services.
  • Monitor consumer lag to ensure ClickHouse is keeping up with Kafka.
  • Set sensible partitions and retention on Kafka topics to balance cost and freshness.

With a few hundred lines of configuration, you can turn raw Kafka events into a real-time analytics system that feels almost magical to product teams.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top