Parallel Package Downloads with stout
How stout uses Tokio's async runtime to download multiple packages simultaneously — saturating your network instead of waiting sequentially.
Installing a package with a deep dependency tree on Homebrew is an exercise in patience. Each bottle downloads one at a time — fetch, verify, extract, then move on to the next. On a gigabit connection, you’ll watch your network sit nearly idle while packages trickle in sequentially. stout takes a fundamentally different approach: resolve the dependency graph first, then download every bottle concurrently using Tokio’s async runtime. The result is that installing 10 packages takes roughly the same wall-clock time as installing one.
Why Homebrew downloads sequentially
Homebrew’s download pipeline is synchronous Ruby code. When you run brew install ffmpeg, the process looks like this:
- Resolve ffmpeg’s dependencies (libx264, libx265, opus, lame, fdk-aac, etc.)
- For each dependency, in order: download the bottle, verify the SHA256, extract it, link it
- Only after the previous dependency finishes does the next one begin
This sequential approach exists because of Ruby’s threading model. MRI Ruby (the standard interpreter Homebrew uses) has a Global Interpreter Lock (GIL) that prevents true parallel execution of Ruby threads. While Ruby can technically spawn threads, they cannot execute Ruby code concurrently — only one thread holds the GIL at a time. For I/O-bound work like downloads, Ruby threads can help somewhat, but Homebrew’s codebase was never architected to use them for fetching.
The practical impact is brutal. Consider installing ffmpeg, which pulls in roughly 25 dependencies:
brew install ffmpeg
# Downloads each dependency one at a time
# 25 bottles × ~2s each = ~50 seconds of download time
# on a connection that could fetch them all in ~4 seconds
Your 500 Mbps connection is running at roughly 10% utilization for the entire duration of the install.
How Tokio enables concurrent downloads
stout is built on Tokio, the most widely used async runtime in the Rust ecosystem. Tokio provides a multi-threaded scheduler that can drive thousands of concurrent I/O operations across a small pool of OS threads. This is not threading in the Ruby sense — there is no global lock, and the runtime can genuinely execute multiple network requests at the same time.
When stout needs to download bottles, it spawns a Tokio task for each one. Each task independently opens an HTTPS connection, streams the response body to disk, and verifies the checksum — all without blocking any other task. The Tokio scheduler multiplexes these tasks across available CPU cores, and the OS kernel handles the concurrent socket I/O.
The core download loop in stout conceptually works like this:
// Simplified illustration of stout's parallel download approach
let download_futures: Vec<_> = bottles
.iter()
.map(|bottle| {
tokio::spawn(async move {
let response = client.get(&bottle.url).send().await?;
let bytes = stream_to_file(response, &bottle.path).await?;
verify_sha256(&bottle.path, &bottle.expected_hash)?;
Ok::<_, Error>(bytes)
})
})
.collect();
// All downloads run concurrently — join waits for all to finish
let results = futures::future::join_all(download_futures).await;
Each tokio::spawn creates a lightweight task (not an OS thread). Tokio’s work-stealing scheduler distributes these tasks across a thread pool sized to match your CPU core count. Since downloads are I/O-bound, even a 4-core machine can comfortably drive 50+ concurrent downloads.
Resolve first, download second
stout’s download pipeline separates dependency resolution from fetching. This two-phase approach is what makes parallel downloads possible.
Phase 1: Graph resolution. stout reads the pre-computed dependency graph from its SQLite index. Because dependencies are stored as adjacency lists in the database, resolving the full transitive dependency set for any package is a matter of recursive SQL queries — typically completing in under 10ms. The result is a flat list of every bottle that needs to be fetched, with their URLs and expected checksums.
stout install ffmpeg --dry-run
# Would install 25 packages:
# aom, dav1d, fdk-aac, fontconfig, freetype, frei0r,
# gmp, gnutls, lame, libass, libbluray, librist,
# libsoxr, libvidstab, libvmaf, libvorbis, libvpx,
# libx264, libx265, opus, rav1e, rubberband, sdl2,
# snappy, ffmpeg
Phase 2: Parallel fetch. With the full list of bottles known upfront, stout issues all download requests concurrently. There is no need to wait for libx264 to finish downloading before starting opus — they have no runtime dependency on each other at the download stage. Dependencies only matter at link time, and stout handles that after all bottles are on disk.
stout also applies a configurable concurrency limit to avoid overwhelming the server or local network:
# Default: 8 concurrent downloads
stout install ffmpeg
# Override for faster connections
stout install ffmpeg --max-concurrent-downloads 20
# Or set it globally
stout config set max_concurrent_downloads 20
Connection reuse and HTTP/2 multiplexing
stout uses the reqwest HTTP client (built on hyper), which supports HTTP/2 by default. With HTTP/2, multiple downloads to the same host share a single TCP connection through stream multiplexing. This eliminates the overhead of establishing separate TLS handshakes for each bottle — a significant win when downloading 25+ packages from the same CDN.
The connection pool is managed by the HTTP client and persists across the download phase. For a typical ffmpeg installation where all bottles come from ghcr.io (GitHub’s container registry), stout establishes one or two TCP connections and multiplexes all 25 downloads over them.
Streaming verification
stout does not wait for a download to complete before starting verification. Each bottle is verified as it streams to disk using an incremental SHA-256 hasher. The download task feeds each chunk to both the output file and the hasher simultaneously. When the last byte arrives, the hash is already computed — there is no second pass over the file:
// Simplified: stream, write, and hash in a single pass
while let Some(chunk) = response.chunk().await? {
file.write_all(&chunk).await?;
hasher.update(&chunk);
}
let computed = hasher.finalize();
assert_eq!(computed, expected_hash);
This means verification adds zero additional I/O time to the download phase.
Real-world impact
The difference is measurable and dramatic. Here are benchmarks on a 500 Mbps connection installing ffmpeg (25 dependencies):
| Step | Homebrew | stout |
|---|---|---|
| Dependency resolution | 1.2s | 0.008s |
| Download (25 bottles) | 48s | 4.8s |
| Extraction + linking | 6s | 3.2s |
| Total | 55.2s | 8.0s |
The download phase alone is 10x faster because stout saturates the available bandwidth instead of using it one bottle at a time.
For CI environments, the improvement is even more pronounced. CI runners often install 50-100 packages from a Brewfile at the start of each job. Sequential downloads on Homebrew can take several minutes; stout finishes in 15-20 seconds.
# CI pipeline example
time stout bundle install --file=Brewfile
# Installing 87 packages (34 already cached)...
# Downloaded 53 bottles (412 MB total) in 6.8s
# Extracted and linked in 4.1s
# real 0m10.9s
Controlling bandwidth usage
If you need to limit bandwidth consumption — for instance on a shared office connection or a metered CI runner — stout supports rate limiting:
# Limit total download bandwidth to 50 MB/s
stout install ffmpeg --rate-limit 50M
# Or globally
stout config set download_rate_limit "50M"
The rate limit is applied across all concurrent downloads collectively, not per-download. This gives you precise control over total bandwidth usage while still benefiting from concurrency.
Summary
Parallel downloading is not a minor optimization — it is a architectural transformation. Homebrew’s sequential pipeline was designed for an era of slower connections and smaller dependency trees. stout’s Tokio-powered concurrent downloader treats your network connection as the shared resource it is and uses it efficiently. For any package with more than a couple of dependencies, the difference is the gap between waiting and not waiting.
Need Rust performance engineering or AI agent expertise?
Neul Labs — the team behind stout — consults on Rust development, performance optimization, CLI tool design, and AI agent infrastructure. We build fast, reliable systems that ship.