QA Knowledge Hub

Comparison with Other Systems

How JDBIN relates to Parquet, DuckDB, ClickHouse, SSTables, and Lucene.

Comparison with Other Systems

JDBIN is not a "copy" of any single existing system. It has traits from several systems, but its architectural starting point is different.

Short answer

JDBIN is not a new Parquet and not a new DuckDB. It borrows ideas such as segments, indexes, immutable structure, and selective reads, but combines them around Cloudflare R2, HTTP byte-range fetches, and a Worker-based query engine.

1. Parquet

Parquet is a file format.

CSV
  -> Parquet file
  -> Spark / DuckDB / Athena / BigQuery

Parquet includes things like:

  • columnar storage
  • metadata
  • pages
  • compression
  • statistics

But it is not a query engine. It only says: "this is how the data is stored."

JDBIN differs because it includes not only a format, but also a Worker-based query path.

2. Apache ORC

ORC follows a very similar idea to Parquet.

It adds things like:

  • bloom filters
  • indexes
  • statistics
  • predicate pushdown

But it still is not a query engine by itself.

3. DuckDB storage engine

DuckDB is a real database.

SQL
  -> DuckDB parser
  -> optimizer
  -> execution engine
  -> storage engine
  -> Parquet / disk

DuckDB includes:

  • a SQL parser
  • an optimizer
  • an execution engine
  • a storage engine

The JDBIN path looks more like this:

Worker
  -> planner
  -> range reader
  -> JDBIN
  -> R2

The key difference is that, under the current boundary, JDBIN does not have a general SQL parser. The documentation instead emphasizes a Worker-side query planner that performs only the byte-range fetches actually needed from R2.

4. ClickHouse MergeTree

This is one of the closest comparisons.

MergeTree uses things like:

  • parts
  • indexes
  • marks
  • granules

so that only a small part of the file needs to be read.

JDBIN follows a similar idea:

Worker
  -> planner
  -> range read
  -> segment
  -> string pool
  -> manifest

The major difference is that MergeTree works on its own disk and inside its own database process. JDBIN uses object storage, here specifically R2.

5. SSTable

SSTable is a very interesting comparison.

Its model includes:

  • sorted key
  • immutable structure
  • index
  • lookup path

When new data is written, a new SSTable is created instead of overwriting the old one.

The JDBIN write path:

delta
  -> manifest
  -> active pointer
  -> new view

is similar in the sense that a new view is built without directly overwriting old data.

6. Lucene

Lucene is not a database. It is a search index engine.

It has things like:

  • an inverted index
  • segments
  • immutable segments

JDBIN is not an inverted index, but the segment-oriented thinking is comparable.

The real architectural difference

According to the documentation, the JDBIN architecture looks like this:

Client
  -> Cloudflare Worker
  -> query planner
  -> range reader
  -> R2 object
  -> JDBIN

The Worker fetches only the required byte ranges from R2 without a separate database server.

Most more traditional systems look more like this:

Disk
  -> database process
  -> query engine
  -> client

In other words, the database process owns the disk and the query engine.

This model instead looks like:

Object Storage (R2)
  -> HTTP Range GET
  -> Cloudflare Worker
  -> planner
  -> JSON

That is a different architectural starting point.

Summary

PropertyParquetDuckDBClickHouseSSTableJDBIN
Binary formatYesYesYesYesYes
Own query plannerNoYesYesPartlyYes
Object storage nativeNoNoNoNoYes
Byte-range reads as main mechanismPartlyNoPartlyNoYes
Separate DB server requiredPractically yesYesYesYesNo

Safe closing formulation

JDBIN is not a new Parquet and not a new DuckDB. It combines segment, index, and immutable ideas into an object-storage-native model where the Cloudflare Worker acts as the query engine and R2 acts as the canonical storage layer.

On this page