SQLite as a Package Index: Architecture Deep Dive
Why we chose SQLite with FTS5 over a REST API or git repository for package metadata — and how a 3MB database replaces 700MB of git history.
Homebrew stores package metadata in a git repository with over 170,000 formulae, 700MB of history, and a JSON API that requires parsing megabytes of data for every search. stout replaces all of this with a single SQLite database — 3MB compressed with zstd, containing pre-indexed formula data, dependency graphs, and full-text search. This article explains the schema design, query patterns, and the concrete performance gains.
The problem with git-based metadata
Homebrew’s homebrew-core repository is a git repo where each formula is a Ruby file. Over 15 years, this repo has accumulated hundreds of thousands of commits. The consequences:
brew updatedownloads git objects. Even a small update means fetching, decompressing, and applying git deltas. A week-stale clone requires 10-30 seconds to update.- Search requires full scan. Finding a formula means either querying the JSON API (multi-megabyte download, parsed in Ruby) or scanning the local filesystem.
- Disk usage is permanent. Git never forgets. The
.gitdirectory alone is several hundred megabytes.
A package manager needs three things from its metadata store: fast lookup by name, fast full-text search, and fast dependency resolution. All three are database problems, not version control problems.
Why SQLite
We considered several alternatives before settling on SQLite:
A REST API (like Homebrew’s formulae.brew.sh). This requires a network round-trip for every query and a server to maintain. It adds latency to offline workflows and creates a single point of failure. A local database eliminates network dependency for search, info, and dependency resolution.
A flat file (JSON or MessagePack). This is what many simple package managers do — download a single JSON file with all formulae. It works for small registries but scales poorly. Searching 170,000 formulae in a 15MB JSON file means deserializing the entire structure and scanning linearly. With SQLite, the same search uses a B-tree index and completes in microseconds.
A key-value store (RocksDB, sled). These are fast for point lookups but lack SQL’s query flexibility. Dependency resolution requires joins and recursive queries that are natural in SQL and awkward in key-value stores.
SQLite won because it is a single file, requires no server, supports full-text search natively through FTS5, handles concurrent reads from multiple processes, and has been in production everywhere from iPhones to aircraft for over two decades.
Schema design
The stout package index has four tables:
CREATE TABLE formulae (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
full_name TEXT NOT NULL,
version TEXT NOT NULL,
revision INTEGER DEFAULT 0,
description TEXT,
homepage TEXT,
license TEXT,
bottle_stable_url TEXT,
bottle_sha256 TEXT,
bottle_size INTEGER,
updated_at INTEGER NOT NULL
);
CREATE TABLE dependencies (
formula_id INTEGER NOT NULL REFERENCES formulae(id),
dependency_id INTEGER NOT NULL REFERENCES formulae(id),
dep_type TEXT NOT NULL DEFAULT 'runtime', -- runtime, build, test, optional
PRIMARY KEY (formula_id, dependency_id)
);
CREATE TABLE bottle_tags (
formula_id INTEGER NOT NULL REFERENCES formulae(id),
tag TEXT NOT NULL, -- e.g., 'arm64_sonoma', 'ventura', 'x86_64_linux'
url TEXT NOT NULL,
sha256 TEXT NOT NULL,
size INTEGER,
PRIMARY KEY (formula_id, tag)
);
-- FTS5 virtual table for full-text search
CREATE VIRTUAL TABLE formulae_fts USING fts5(
name,
full_name,
description,
content=formulae,
content_rowid=id,
tokenize='porter unicode61'
);
The design is intentionally denormalized. bottle_stable_url and bottle_sha256 are duplicated in the formulae table for the common case (current platform), while bottle_tags holds all platform variants. This avoids a join for the most frequent query pattern: “give me the bottle URL for this formula on my platform.”
FTS5 for search
Full-text search is the feature that makes the SQLite approach dramatically faster than Homebrew’s JSON scanning. FTS5 builds an inverted index at database creation time, so queries execute against pre-built data structures rather than scanning every row.
-- Homebrew equivalent: download 5MB JSON, parse in Ruby, loop and string match
-- Time: ~2800ms
-- stout: FTS5 query
SELECT f.name, f.version, f.description
FROM formulae_fts
JOIN formulae f ON formulae_fts.rowid = f.id
WHERE formulae_fts MATCH ?
ORDER BY rank
LIMIT 20;
-- Time: ~0.3ms
The porter tokenizer applies Porter stemming, so searching for “compile” also matches “compiler” and “compilation.” The unicode61 tokenizer handles non-ASCII package names correctly.
We also support prefix queries for autocomplete:
SELECT name FROM formulae_fts WHERE formulae_fts MATCH ? || '*' LIMIT 10;
This returns results in under 1ms, enabling real-time search-as-you-type in terminal UIs.
Dependency resolution with recursive CTEs
Resolving a formula’s full dependency tree is a recursive operation. In Homebrew, this is done in Ruby by loading each formula file and walking the dependency graph one node at a time. In stout, it is a single SQL query:
WITH RECURSIVE dep_tree(id, name, depth) AS (
-- Base case: the requested formula
SELECT id, name, 0
FROM formulae
WHERE name = ?
UNION ALL
-- Recursive case: all runtime dependencies
SELECT f.id, f.name, dt.depth + 1
FROM dep_tree dt
JOIN dependencies d ON d.formula_id = dt.id
JOIN formulae f ON f.id = d.dependency_id
WHERE d.dep_type = 'runtime'
)
SELECT DISTINCT id, name FROM dep_tree ORDER BY depth DESC;
The ORDER BY depth DESC ensures dependencies are installed before the packages that need them. For ffmpeg with 25 transitive dependencies, this query completes in under 2ms. The equivalent operation in Homebrew takes 500ms-1s because each dependency requires loading and evaluating a Ruby file.
Index generation and distribution
The package index is generated server-side by a scheduled job that scrapes Homebrew’s API and builds the SQLite database:
fn build_index(formulae: &[FormulaJson]) -> Result<()> {
let conn = Connection::open("stout-index.db")?;
conn.execute_batch("PRAGMA journal_mode=OFF; PRAGMA synchronous=OFF;")?;
// Batch insert for performance
let mut stmt = conn.prepare(
"INSERT INTO formulae (name, full_name, version, description, homepage,
license, bottle_stable_url, bottle_sha256, updated_at)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)"
)?;
for f in formulae {
stmt.execute(params![
f.name, f.full_name, f.versions.stable, f.desc,
f.homepage, f.license, f.bottle_url(), f.bottle_sha256(),
f.updated_at
])?;
}
// Populate FTS index
conn.execute_batch(
"INSERT INTO formulae_fts(rowid, name, full_name, description)
SELECT id, name, full_name, description FROM formulae"
)?;
Ok(())
}
The raw database is approximately 18MB. After zstd compression at level 19, it shrinks to 3.1MB. The compression ratio is excellent because SQLite databases contain highly repetitive page structures and text data.
Distribution uses a CDN with content-addressed URLs:
https://index.stout.dev/v1/stout-index-20260330-a1b2c3d4.db.zst
stout checks for updates with an HTTP HEAD request against a manifest file. If the local index is current, no download occurs. If it is stale, the 3MB file downloads in 1-2 seconds on a typical connection.
Delta updates
Full 3MB downloads are acceptable but not ideal for users who update frequently. stout supports delta updates using SQLite’s session extension. The server generates changesets between consecutive index versions:
use rusqlite::session::Session;
fn generate_delta(old_db: &Path, new_db: &Path) -> Result<Vec<u8>> {
let old = Connection::open(old_db)?;
let new_conn = Connection::open(new_db)?;
let mut session = Session::new(&new_conn)?;
session.attach(None)?; // track all tables
session.diff("old", "formulae")?;
session.diff("old", "dependencies")?;
let changeset = session.changeset()?;
Ok(zstd::encode_all(&changeset[..], 19)?)
}
A typical daily delta is 20-50KB compressed — updating 50-200 formulae out of 170,000. The client applies the changeset and rebuilds the FTS index incrementally.
Concurrency and safety
SQLite’s WAL (Write-Ahead Logging) mode allows concurrent readers. Multiple stout processes can read the index simultaneously — common in CI environments where parallel jobs run stout install. Writes (index updates) take an exclusive lock, but they complete in under a second, so contention is minimal.
rusqlite provides a safe Rust API over SQLite’s C library. We use the bundled feature to statically compile SQLite 3.45 into the stout binary, so there is no dependency on the system SQLite version:
[dependencies]
rusqlite = { version = "0.31", features = ["bundled", "fts5"] }
Performance comparison
| Operation | Homebrew | stout (SQLite) | Speedup |
|---|---|---|---|
| Search (“json”) | 2,800ms | 0.3ms | 9,300x |
| Info (single formula) | 1,200ms | 0.1ms | 12,000x |
| Dependency resolve (ffmpeg) | 850ms | 1.8ms | 470x |
| Index update (fresh) | 15-60s (git clone) | 1.2s (3MB download) | 12-50x |
| Index update (daily) | 10-30s (git pull) | 0.3s (delta) | 30-100x |
| Disk usage | 700MB+ | 18MB (uncompressed) | 39x |
The speedups for search and info are the most dramatic because they go from “parse megabytes of JSON in an interpreter” to “B-tree lookup in a memory-mapped database.” These are the operations developers run most often, and the difference between 2.8 seconds and 0.3 milliseconds changes how the tool feels.
SQLite is not a fashionable technology choice. It does not generate blog posts about distributed consensus or eventual consistency. But for a local package index — a read-heavy, single-writer, structured data problem — it is the right tool, and the numbers confirm it.
Need Rust performance engineering or AI agent expertise?
Neul Labs — the team behind stout — consults on Rust development, performance optimization, CLI tool design, and AI agent infrastructure. We build fast, reliable systems that ship.