Comparison with Other Systems
How JDBIN relates to Parquet, DuckDB, ClickHouse, SSTables, and Lucene.
Comparison with Other Systems
JDBIN is not a "copy" of any single existing system. It has traits from several systems, but its architectural starting point is different.
Short answer
JDBIN is not a new Parquet and not a new DuckDB. It borrows ideas such as segments, indexes, immutable structure, and selective reads, but combines them around Cloudflare R2, HTTP byte-range fetches, and a Worker-based query engine.
1. Parquet
Parquet is a file format.
CSV
-> Parquet file
-> Spark / DuckDB / Athena / BigQueryParquet includes things like:
- columnar storage
- metadata
- pages
- compression
- statistics
But it is not a query engine. It only says: "this is how the data is stored."
JDBIN differs because it includes not only a format, but also a Worker-based query path.
2. Apache ORC
ORC follows a very similar idea to Parquet.
It adds things like:
- bloom filters
- indexes
- statistics
- predicate pushdown
But it still is not a query engine by itself.
3. DuckDB storage engine
DuckDB is a real database.
SQL
-> DuckDB parser
-> optimizer
-> execution engine
-> storage engine
-> Parquet / diskDuckDB includes:
- a SQL parser
- an optimizer
- an execution engine
- a storage engine
The JDBIN path looks more like this:
Worker
-> planner
-> range reader
-> JDBIN
-> R2The key difference is that, under the current boundary, JDBIN does not have a general SQL parser. The documentation instead emphasizes a Worker-side query planner that performs only the byte-range fetches actually needed from R2.
4. ClickHouse MergeTree
This is one of the closest comparisons.
MergeTree uses things like:
- parts
- indexes
- marks
- granules
so that only a small part of the file needs to be read.
JDBIN follows a similar idea:
Worker
-> planner
-> range read
-> segment
-> string pool
-> manifestThe major difference is that MergeTree works on its own disk and inside its own database process. JDBIN uses object storage, here specifically R2.
5. SSTable
SSTable is a very interesting comparison.
Its model includes:
- sorted key
- immutable structure
- index
- lookup path
When new data is written, a new SSTable is created instead of overwriting the old one.
The JDBIN write path:
delta
-> manifest
-> active pointer
-> new viewis similar in the sense that a new view is built without directly overwriting old data.
6. Lucene
Lucene is not a database. It is a search index engine.
It has things like:
- an inverted index
- segments
- immutable segments
JDBIN is not an inverted index, but the segment-oriented thinking is comparable.
The real architectural difference
According to the documentation, the JDBIN architecture looks like this:
Client
-> Cloudflare Worker
-> query planner
-> range reader
-> R2 object
-> JDBINThe Worker fetches only the required byte ranges from R2 without a separate database server.
Most more traditional systems look more like this:
Disk
-> database process
-> query engine
-> clientIn other words, the database process owns the disk and the query engine.
This model instead looks like:
Object Storage (R2)
-> HTTP Range GET
-> Cloudflare Worker
-> planner
-> JSONThat is a different architectural starting point.
Summary
| Property | Parquet | DuckDB | ClickHouse | SSTable | JDBIN |
|---|---|---|---|---|---|
| Binary format | Yes | Yes | Yes | Yes | Yes |
| Own query planner | No | Yes | Yes | Partly | Yes |
| Object storage native | No | No | No | No | Yes |
| Byte-range reads as main mechanism | Partly | No | Partly | No | Yes |
| Separate DB server required | Practically yes | Yes | Yes | Yes | No |
Safe closing formulation
JDBIN is not a new Parquet and not a new DuckDB. It combines segment, index, and immutable ideas into an object-storage-native model where the Cloudflare Worker acts as the query engine and R2 acts as the canonical storage layer.