TrStream

A distributed real-time transaction processing pipeline

About

TrStream is a distributed data pipeline designed to simulate and process high-throughput financial transaction streams in real time.

The project explores the architectural patterns behind modern fintech and data engineering systems, including:

event-driven ingestion
scalable stream processing
lakehouse-style storage
analytical querying

It was developed as a personal project to apply concepts from Designing Data-Intensive Applications (Martin Kleppmann) in a realistic and reproducible environment.

Motivation

Modern financial platforms ingest millions of events per day and must:

process data reliably
retain raw events for auditing
optimize data layout for analytics
remain horizontally scalable

TrStream models this workflow locally using open-source technologies, focusing on system design and data flow.

Overview

At a high level, the pipeline:

Generates transaction events from multiple producers
Streams the events through Kafka
Stores raw data in a data lake (Parquet on S3-compatible storage)
Reorganizes and compacts data for analytical workloads
Exposes a SQL query layer on optimized data

The system is fully containerized and can be scaled horizontally via Docker Compose.

Architecture

Data flow

Producers simulate transaction events with configurable values, distribution and event rates
Kafka ensures decoupling, buffering and fault tolerance
Consumers persist immutable raw data in Parquet format
The partitioner organizes data by date and transaction type
The compacter merges small files to improve query performance
The query layer enables SQL queries directly on optimized Parquet data
A Streamlit SQL editor is provided to conveniently query data

Key features

Real-time ingestion with Kafka-based buffering and backpressure
Horizontal scalable producers and consumers via Kafka partitioning
End-to-end observability of message flow through Kafka UI
Immutable Parquet storage designed for downstream analytics and auditing
Explicit data lifecycle stages: ingestion, partitioning and compaction
SQL querying on lake data via DuckDB (Athena-like experience)
Lightweight SQL editor implemented with Streamlit
Fully containerized local environment with Docker Compose orchestration and explicit health checks

Tech stack

Component	Technology
Messaging	Kafka (Bitnami legacy image)
Storage	MinIO (S3-compatible)
Processing	Python, PyArrow, Boto3
Orchestration	Docker Compose
Monitoring	Kafka UI (Provectus Labs)
Querying	DuckDB, FastAPI
Visualization	Streamlit

Running locally

Helper scripts are provided to simplify common workflows.

Build all images:

bash scripts-cli/build.sh

Start the core pipeline:

bash scripts-cli/run.sh

Start all services (including query layer and dashboards):

bash scripts-cli/run_all.sh

Optional scaling

bash scripts-cli/run.sh producer=3 consumer=4

See scripts-cli/README.md for more details.

Access Points

Kafka UI: http://localhost:8080
MinIO Console: http://localhost:9001
SQL Querier API: http://localhost:8000
Streamlit Editor: http://localhost:8501

Roadmap

Planned extensions focus on real-world relevance rather than feature completeness:

Integration with real transaction APIs (e.g. Stripe, Revolut)
Metrics and observability improvements
Fraud detection and analytics use cases

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
docker		docker
images		images
minio		minio
scripts-cli		scripts-cli
scripts-config		scripts-config
src		src
.editorconfig		.editorconfig
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TrStream

About

Motivation

Overview

Architecture

Data flow

Key features

Tech stack

Running locally

Access Points

Roadmap

About

Uh oh!

Releases 1

Packages

Languages

AlessioCappello2/TrStream

Folders and files

Latest commit

History

Repository files navigation

TrStream

About

Motivation

Overview

Architecture

Data flow

Key features

Tech stack

Running locally

Access Points

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages