BoilStream - DuckLake Data Lakehouse

Scalable Two-Tier Architecture

Built on DuckLake with PostgreSQL catalog and data inlining. Horizontally scalable instances with Object Store based shared state and locking. Multi-tenant architecture with full isolation per user. On-demand cold tier hydration via CLI or API triggers. The boilstream extension automatically and seamlessly configures ducklake with temporary credentials for secure access.

Clients

BI Tools

Power BI, Tableau

dbt

Transformations

DuckDB

Extension

Apps

Any PG client

↓

PostgreSQL Interface (pgwire)

Ingestion

Kafka

HTTP/2

FlightRPC

→

Horizontally Scalable Cluster

BoilStream
Single Binary
Hot Tier
DuckDB + PG Catalog

BoilStream
Single Binary
Hot Tier
DuckDB + PG Catalog

...
N instances

No single master – all nodes are equal peers

↓ On-demand hydration ↑

Cold Tier

S3 / Azure / GCS

DuckLake Parquet

                    
INSTALL boilstream FROM community;
LOAD boilstream;

-- Login with email, password, and MFA code
PRAGMA boilstream_login('https://your-server.com/user@example.com', 'password', '123456');

-- List and use your ducklakes
FROM boilstream_ducklakes();
USE my_catalog;
SELECT * FROM events;

From Ingestion to Insights

A complete real-time data lakehouse platform.

🦆

DuckDB Extension

The boilstream extension manages DuckLake for you, vends temporary credentials for seamless hot + cold tier access. Secure OPAQUE PAKE authentication with MFA support.

Kafka Protocol + JIT Avro

Confluent Schema Registry compatible Avro with JIT-compiled decoder – 3-5x faster than Apache Arrow's Rust decoder published Oct 2025. Use standard Kafka clients to stream data.

DuckLake Cold Storage

Automatic S3 Parquet snapshots with DuckLake catalog registration. Remote DuckDB clients with DuckLake extension work seamlessly.

🛡

Enterprise Ready

SSO with Entra ID (upload/download XML files) with automated user provisioning (SCIM), RBAC access control, audit trails, and user/admin dashboards. Built-in registration as an alternative. Prometheus monitoring. Configure multiple cloud backends and assign BoilStream roles for users.

Zero-Copy Pipeline

Envelope recycling and zero-copy Arrow processing eliminate memory allocations. 2.5+ GB/s throughput (16 vCPU) with 10,000 concurrent sessions.

SQL

Materialized Views

DuckLake VIEWs are materialized with never-ending DuckDB queries that transform data in real-time. Standard SQL syntax, continuous execution, BoilStream innovation.

⇄

Horizontal Scaling

Deploy multiple BoilStream instances behind a load balancer. Shared state via Object Store enables seamless scaling. Each instance is a single binary with zero external dependencies. BoilStream downloads embedded Rust postgres server on launch.

▲

On-Demand Cold Tier Hydration

Promote cold tier data to hot tier via CLI or API triggers. Flexible hydration control for tables and topics with priority queuing.

🔒

Multi-Tenant Isolation

Full tenant isolation within a single deployment. Separate DuckLakes, encrypted secrets, chrooted filesystems, and isolated sessions per user. RBAC with BoilStream roles.

LakeHouse Storage,
RealTime Speed

Scalable Two-Tier Architecture

From Ingestion to Insights

DuckDB Extension

Kafka Protocol + JIT Avro

DuckLake Cold Storage

Enterprise Ready

Zero-Copy Pipeline

Materialized Views

Horizontal Scaling

On-Demand Cold Tier Hydration

Multi-Tenant Isolation

Start querying in minutes

LakeHouse Storage,RealTime Speed

Scalable Two-Tier Architecture

From Ingestion to Insights

DuckDB Extension

Kafka Protocol + JIT Avro

DuckLake Cold Storage

Enterprise Ready

Zero-Copy Pipeline

Materialized Views

Horizontal Scaling

On-Demand Cold Tier Hydration

Multi-Tenant Isolation

Start querying in minutes

Get in Touch

LakeHouse Storage,
RealTime Speed