stout
Article

Instant Package Search with SQLite FTS5

How stout's pre-computed SQLite index with FTS5 full-text search delivers sub-50ms package lookups — offline, with no API calls.

Neul Labs ·
#stout#sqlite#fts5#search#architecture

Package search is one of the most frequent operations a developer performs with a package manager. You half-remember a library name, you want to find all packages related to “json”, or you need to check if a specific version exists. In Homebrew, this operation takes 2-4 seconds — it requires fetching and parsing a multi-megabyte JSON payload from the formulae API. stout delivers the same results in under 50 milliseconds, entirely offline, using a pre-computed SQLite database with FTS5 full-text search indexes.

When you run brew search, Homebrew needs to query its package metadata. Since the migration to the JSON API in Homebrew 4.x, this means hitting formulae.brew.sh, downloading the full formulae index (several megabytes of JSON), parsing it in Ruby, and then iterating over every entry to find matches:

time brew search json
# ==> Formulae
# jo, jq, json-c, json-glib, json5, jsonlint, ...
# real    0m2.8s

Nearly three seconds to find packages matching a four-letter string. This latency comes from multiple sources: DNS resolution, TLS handshake, response download, JSON deserialization in Ruby, and linear string scanning. If you are offline — on a plane, behind a restrictive firewall, or in an air-gapped environment — the search fails entirely.

SQLite as a package index

stout replaces the git repository and JSON API with a single SQLite database file. This database is pre-computed on the server side and distributed as a zstd-compressed file of roughly 3MB. It contains every formula and cask in the Homebrew ecosystem — names, descriptions, versions, dependencies, bottle URLs, and checksums.

The database schema is purpose-built for package manager operations:

CREATE TABLE formulae (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL UNIQUE,
    full_name TEXT NOT NULL,
    description TEXT,
    homepage TEXT,
    version TEXT NOT NULL,
    revision INTEGER DEFAULT 0,
    license TEXT,
    dependencies TEXT,  -- JSON array of dependency names
    bottle_urls TEXT,   -- JSON object keyed by platform
    bottle_sha256 TEXT  -- JSON object keyed by platform
);

CREATE TABLE casks (
    id INTEGER PRIMARY KEY,
    token TEXT NOT NULL UNIQUE,
    name TEXT,
    description TEXT,
    homepage TEXT,
    version TEXT NOT NULL,
    url TEXT,
    sha256 TEXT,
    appcast TEXT
);

SQLite reads from this database are essentially memory-mapped file reads with B-tree lookups. There is no interpreter startup, no network round-trip, no JSON parsing. A direct lookup by name is a single index seek:

time stout info jq
# jq 1.7.1 — Lightweight and flexible command-line JSON processor
# real    0m0.006s

Six milliseconds, including process startup.

FTS5: Full-text search in SQLite

For fuzzy and keyword searches, stout uses SQLite’s FTS5 extension — a full-text search engine built directly into SQLite. FTS5 creates an inverted index that maps every word in the indexed columns to the rows containing that word. This is the same fundamental data structure used by Elasticsearch and Lucene, but embedded in a single-file database with zero operational overhead.

stout creates FTS5 virtual tables at index build time:

CREATE VIRTUAL TABLE formulae_fts USING fts5(
    name,
    full_name,
    description,
    content=formulae,
    content_rowid=id,
    tokenize='porter unicode61'
);

The porter tokenizer applies Porter stemming, which means searching for “compress” also matches “compression”, “compressed”, and “compressor”. The unicode61 tokenizer handles Unicode normalization so searches work correctly with non-ASCII package names.

When you search, stout translates your query into an FTS5 MATCH expression:

stout search json
# Executes: SELECT name, description FROM formulae_fts WHERE formulae_fts MATCH 'json'

FTS5 looks up “json” in the inverted index and returns matching row IDs directly — no table scan, no string comparison loop. For 170,000+ formulae, this completes in single-digit milliseconds.

Ranked results with BM25

FTS5 includes a built-in BM25 ranking function, which stout uses to order search results by relevance. BM25 (Best Matching 25) is a probabilistic ranking algorithm that considers term frequency, inverse document frequency, and field length normalization. In practice, this means:

  • A package named json-c ranks higher than one that merely mentions JSON in its description
  • Exact name matches always appear first
  • Rare terms produce more specific results than common ones
stout search "lightweight json parser"
# json-c          — A JSON implementation in C
# jq              — Lightweight and flexible command-line JSON processor
# simdjson        — SIMD-accelerated JSON parser
# rapidjson       — Fast JSON parser/generator for C++
# ...

stout weights the name column higher than description in the ranking formula so that name matches are prioritized:

SELECT name, description,
       bm25(formulae_fts, 10.0, 5.0, 1.0) as rank
FROM formulae_fts
WHERE formulae_fts MATCH ?
ORDER BY rank
LIMIT 20;

The weights 10.0, 5.0, 1.0 correspond to the name, full_name, and description columns respectively.

Prefix queries and autocomplete

FTS5 supports prefix queries natively, which stout uses for tab-completion and interactive search. When you type stout search post, FTS5 can match “postgres”, “postgresql”, “postfix”, and “postmark” without scanning the full index:

stout search "post*"
# postgresql@16   — Object-relational database system
# postgresql@15   — Object-relational database system
# postfix         — High-performance mail transport agent
# postgis         — Adds support for geographic objects to PostgreSQL

The asterisk triggers a prefix query in FTS5, which traverses the inverted index trie to find all terms starting with the given prefix. This is an O(k) operation where k is the number of matching terms, not the number of rows in the table.

Offline by design

Because the entire index lives in a local SQLite file, search works with no network connection at all. The index is updated when you run stout update, but between updates, every query is purely local:

# Works on an airplane
stout search ffmpeg

# Works behind a corporate firewall
stout info --json ripgrep

# Works in an air-gapped server room
stout list --versions

This is particularly valuable in CI environments where minimizing external network calls improves reliability and reproducibility. The SQLite index can be baked into a container image or cached between CI runs, and all metadata queries will work without touching the network.

Index update mechanism

stout checks for index staleness using an HTTP HEAD request against the index URL, comparing the ETag or Last-Modified header with the locally stored value. This check takes 20-50ms and avoids downloading the full index when nothing has changed:

stout update
# Checking index freshness... current (last updated 2h ago)
# real    0m0.05s

When the index is stale, stout downloads the new version as a zstd-compressed file. The compression ratio is roughly 4:1, so the 12MB uncompressed database transfers as approximately 3MB. The download is atomic — stout writes to a temporary file and renames it into place, so a failed download never corrupts the existing index.

stout update
# Downloading index update (3.1 MB)... done
# Index updated: 172,431 formulae, 8,294 casks
# real    0m1.3s

Comparing search performance

Here is a direct comparison of search operations between Homebrew and stout, measured on an M2 MacBook Pro with a 200 Mbps connection:

OperationHomebrewstoutSpeedup
search json2.8s0.03s93x
search "video encoder"3.1s0.04s78x
info jq1.4s0.006s233x
info --json ripgrep1.6s0.008s200x

The speedup for info commands is even larger than for search because Homebrew must fetch individual formula JSON from the API, while stout performs a single indexed lookup.

Why SQLite

The choice of SQLite over alternatives like RocksDB, LMDB, or a custom format was deliberate. SQLite is the most widely deployed database engine in the world, battle-tested across billions of devices. It requires no server process, no configuration, and no background daemon. It is a single file that can be copied, backed up, and inspected with standard tools. And with FTS5, it provides search capabilities that would otherwise require a separate search engine.

For stout, this means the entire package index — metadata, search index, dependency graph, and bottle URLs for 170,000+ packages — fits in a single portable file that any SQLite client can open and query directly.

Need Rust performance engineering or AI agent expertise?

Neul Labs — the team behind stout — consults on Rust development, performance optimization, CLI tool design, and AI agent infrastructure. We build fast, reliable systems that ship.