Building a Real-Time Analytics Pipeline with Kafka and ClickHouse
Kafka is great at moving data, and ClickHouse is great at querying it. Together, they form a powerful pair for real-time analytics.
High-level architecture
- Applications publish events (such as page views and purchases) into Kafka topics.
- ClickHouse reads those events from Kafka using a special Kafka engine table.
- A materialized view transforms the raw events into a query-friendly format.
- Dashboards and APIs query ClickHouse in real time.
Kafka engine table
In ClickHouse, you can define a table that consumes directly from Kafka:
CREATE TABLE kafka_events
(
event_time DateTime,
user_id UInt32,
event_type String,
revenue Float32
)
ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka:9092',
kafka_topic_list = 'events',
kafka_group_name = 'clickhouse-consumer',
kafka_format = 'JSONEachRow';
Materialized view into a MergeTree table
Then you create a real table for queries and connect it with a materialized view:
CREATE TABLE events_rt
(
event_time DateTime,
user_id UInt32,
event_type LowCardinality(String),
revenue Float32
)
ENGINE = MergeTree
PARTITION BY toDate(event_time)
ORDER BY (event_time, user_id);
CREATE MATERIALIZED VIEW mv_events_rt
TO events_rt AS
SELECT *
FROM kafka_events;
As Kafka messages arrive, ClickHouse ingests them through the materialized view, and your queries against events_rt stay up to date.
Dashboarding
You can connect tools like Grafana or Metabase directly to ClickHouse. Typical widgets include:
- Events per second over time.
- Revenue by country in the last 15 minutes.
- Active users per application or game.
Operational tips
- Use dedicated consumer groups for ClickHouse so it doesn’t compete with other services.
- Monitor consumer lag to ensure ClickHouse is keeping up with Kafka.
- Set sensible partitions and retention on Kafka topics to balance cost and freshness.
With a few hundred lines of configuration, you can turn raw Kafka events into a real-time analytics system that feels almost magical to product teams.