Difficult Questions
Questions that require precise scoping and technically defensible answers.
Difficult Questions
Why don’t you use SQLite?
Likely question
Why don’t you do this with SQLite?
Short answer
Because the research question is not whether SQLite can do this, but whether object storage itself can act as a queryable read layer without a database server.
Longer answer
SQLite would be a strong default if the goal were a general embedded SQL solution. But then the research setup would be different. The interesting part here is specifically whether immutable object storage can be used efficiently as a queryable artifact.
Why don’t you use D1?
Likely question
Why don’t you use Cloudflare D1?
Short answer
D1 solves a different problem.
Longer answer
In this research, the interesting combination is immutable object + Worker + byte-range read + query planner without a database server and without a traditional relational layer.
Safe formulation
D1 is the right answer when you want a database. The point of this research is to determine when object storage itself can be enough as a read layer.
Why is this not SQL?
Likely question
Why don’t you just call this a database?
Short answer
Because this is not a general SQL engine.
Longer answer
The queries are known in advance, the planner is deterministic, and the model does not contain arbitrary SQL, joins, or OLTP-style mutation load. The correct description is a read-optimized publish and briefing architecture.
What problem does this solve?
Likely question
What problem does this actually solve?
Short answer
It solves two different problems: the editorial problem and the data delivery problem.
Longer answer
The editorial problem is that there is too much news flow. You need a pipeline that fetches multiple sources, removes duplicates, clusters the same story together, separates source facts from editorial copy, and shows confidence.
The data delivery problem is that once content is built ahead of time, it may not make sense to serve it through a full database server. Then the question becomes whether mostly-read briefing data can be delivered directly from object storage without a database server.
How does AI relate to this?
Likely question
Is this just an AI news site?
Short answer
No. AI is one part of the pipeline, but it is not in the public request path.
Longer answer
In the current model, Claude selects the shortlist and final selection, OpenAI produces publishable editorial fields, and grounding + publish-gate try to block weak signals. The public /api/data or /api/news/history path does not call AI models on every request.
How does the AI know what happened?
Likely question
Does the AI roam the web freely?
Short answer
No. The AI works on a bounded research corpus.
Longer answer
First, a corpus is built:
- raw articles from RSS sources
- story clusters
- source metadata
- audit snapshots
Then the editorial stage works on that material. The AI does not guess the world from nothing; it uses collected and auditable corpus material.
Why is AI not in the request path?
Likely question
Why not just generate the answer live?
Short answer
Because the public interface needs stable latency, reproducible content, and an audit path.
Longer answer
If AI were in the request path, latency, reproducibility, versioning, and auditability would all become weaker. In this model, AI does corpus interpretation, shortlist generation, editorial field generation, and grounding assistance before publication.
What is the publish gate?
Likely question
Does AI publish everything automatically?
Short answer
No. The publish gate separates AI output from a publishable signal.
Longer answer
The publish gate is meant to block weak or incomplete signals, lower conviction in low-certainty cases, and keep the public publish path controlled.
Safe formulation
The publish gate does not turn the system into a truth machine, but it makes it more transparent and more controlled than a plain AI-summary pipeline.
Why immutable?
Likely question
Why did you make this an immutable model?
Short answer
Because it fits object storage and makes publication and audit paths easier.
Longer answer
In an immutable model, you can version, roll back, publish, and audit without in-place writes into the active base object. That makes object storage a natural publication surface.
What is the pointer?
Likely question
Why do you need a pointer?
Short answer
The pointer determines which manifest is active right now.
Longer answer
The new base and manifest are built first, and the pointer is switched only at the end. That allows publication to happen atomically while preserving the old chain for rollback.
What is the delta engine?
Likely question
How do changes happen if the base is immutable?
Short answer
The change does not overwrite the base; it becomes an append-only delta object.
Longer answer
The typical lifecycle is:
- ingest
- write delta
- update manifest
- publish pointer
- query live state
- compaction
- rollback / retention when needed
How is publication safety handled?
Likely question
What if two publishers write at the same time?
Short answer
Publish is protected by a CAS / ETag model.
Longer answer
Pointer publish does not rely on “last write wins.” The controls used are:
pointerKeyexpectedGenerationETag guardconditional writehealth check
Why snapshots if JDBIN already works?
Likely question
If JDBIN works, why does the UI use snapshots?
Short answer
Because snapshots and the live JDBIN path solve different problems.
Longer answer
JDBIN is the canonical storage / read layer. A snapshot is the public delivery artifact. The snapshot path keeps the public UI fast and stable. The live JDBIN path makes the canonical storage layer auditable, versioned, and measurable.
Safe formulation
The default frontend request path is not a live JDBIN query, but snapshot-first JSON delivery. JDBIN/JDBON functions as a canonical store + publish engine.
Has this actually been run in production?
Likely question
Is this just a local demo?
Short answer
No. The implementation has actually been run on Cloudflare.
Longer answer
At minimum, the following has been verified:
- Worker + R2 + pointer + manifest + base/delta chain works
- the active live chain has been published into a compacted base
- the benchmark endpoint works against a real R2 object
- snapshot paths are used in the public UI
What is realistically finished right now?
Likely question
What is actually complete here?
Short answer
Several core parts are already working, even if they are not final.
Longer answer
Working now:
- Worker + R2 live
- research corpus pipeline
- AI editorial pipeline
- JDBIN/JDBON write path
- public snapshot path
- archive snapshot path
- consensus v1.1 product layer
What do the benchmarks actually prove?
Likely question
What do these benchmarks prove?
Short answer
They prove that a large immutable object can be kept in R2 and queried without loading the entire object.
Longer answer
The benchmarks show that the Worker can resolve known query shapes so that query cost can stay tied to the query shape instead of to the total object size.
What should not be claimed
- a general SQL replacement
- a finished query engine for every problem
- proven 100M / 5GB production readiness
Why is rangeReads such an important metric?
Likely question
Isn’t bytesRead enough?
Short answer
No, not by itself.
Longer answer
R2 latency accumulates not only from the number of bytes read, but also from the number of requests. That is why rangeReads is often just as important, and sometimes more important, than bytesRead.
What is not finished yet?
Likely question
What are the biggest gaps?
Short answer
The limitations should be stated directly.
Longer answer
The main unfinished areas are:
- not a general SQL engine
- not a broad query VM
- not broad compression benchmarking
- not a formalized consensus model
- not yet a light enough live UI hot path
- not yet editorial quality at the final target level
- not yet a proven 100M / 5GB production path
- not yet unified week-long p50/p95 telemetry
What should not be claimed about the project?
- do not claim that it replaces the general SQL ecosystem
- do not claim that all query types behave with equal efficiency
- do not claim that the architecture removes data-modeling tradeoffs
- do not claim that this is a new SQLite or a general database
- do not claim that the editorial product is already finished
- do not claim that consensus is already a formalized research model