Using Tokio for Parallel Package Downloads
How stout uses Tokio's async runtime to download, verify, and extract multiple packages concurrently — a practical guide to async Rust patterns.
When you run stout install ffmpeg, stout resolves 25+ dependencies, downloads their bottles in parallel, verifies cryptographic hashes, and extracts them to the Cellar — all in about 3 seconds on a typical connection. Homebrew does the same work sequentially, taking 30-60 seconds. The difference is Tokio.
This article walks through the async patterns stout uses for parallel package downloads: the runtime configuration, concurrency control, progress reporting, error handling, and the specific pitfalls we encountered building this on top of Tokio.
Runtime setup
stout uses Tokio’s multi-threaded runtime with a custom configuration. The default thread count (one per CPU core) is excessive for a CLI tool that is mostly waiting on network I/O. We limit the runtime to 4 worker threads, which is enough to saturate a gigabit connection while keeping CPU and memory overhead low:
fn main() -> Result<()> {
let runtime = tokio::runtime::Builder::new_multi_thread()
.worker_threads(4)
.enable_all()
.build()?;
runtime.block_on(async { cli::run().await })
}
We use block_on at the top level rather than #[tokio::main] because we need to configure the runtime before entering async context. This also makes it explicit that there is exactly one runtime — a common source of bugs in applications that accidentally create multiple.
The download pipeline
A package install in stout follows this pipeline:
- Resolve the dependency graph (synchronous, SQLite queries)
- Plan the download order (topological sort, identify what is already installed)
- Download all missing bottles concurrently
- Extract each bottle as its download completes
- Link binaries into the PATH
Steps 3 and 4 overlap — extraction begins as soon as a download finishes, while other downloads continue. This is where Tokio’s value shows.
Bounded concurrency with Semaphore
Spawning 25 unbounded downloads simultaneously is wasteful. It opens too many TCP connections, causes congestion, and can trigger rate limiting from the bottle CDN. stout uses a tokio::sync::Semaphore to limit concurrent downloads:
use tokio::sync::Semaphore;
use tokio::task::JoinSet;
use std::sync::Arc;
const MAX_CONCURRENT_DOWNLOADS: usize = 8;
pub async fn download_all(bottles: Vec<BottleSpec>, cellar: &Path) -> Result<()> {
let semaphore = Arc::new(Semaphore::new(MAX_CONCURRENT_DOWNLOADS));
let mut tasks = JoinSet::new();
for spec in bottles {
let permit = semaphore.clone().acquire_owned().await?;
let cellar = cellar.to_path_buf();
tasks.spawn(async move {
let result = download_and_extract(&spec, &cellar).await;
drop(permit); // release before awaiting join
(spec.name.clone(), result)
});
}
let mut errors = Vec::new();
while let Some(join_result) = tasks.join_next().await {
let (name, result) = join_result?;
if let Err(e) = result {
errors.push(format!("{name}: {e}"));
}
}
if errors.is_empty() {
Ok(())
} else {
Err(anyhow!("Failed to install: {}", errors.join(", ")))
}
}
The semaphore ensures at most 8 downloads run at once. When a download finishes and drops its permit, the next waiting task proceeds. This gives us natural backpressure without manual queue management.
Streaming downloads with checksum verification
stout does not buffer entire bottles in memory before verifying them. For large packages (GCC is ~100MB, Qt is ~60MB), this would spike memory usage unnecessarily. Instead, we stream the response body through a SHA-256 hasher and write to a temporary file simultaneously:
use sha2::{Sha256, Digest};
use tokio::io::AsyncWriteExt;
async fn download_and_extract(spec: &BottleSpec, cellar: &Path) -> Result<()> {
let response = reqwest::get(&spec.url)
.await?
.error_for_status()?;
let content_length = response.content_length();
let mut stream = response.bytes_stream();
let mut hasher = Sha256::new();
let tmp_path = cellar.join(format!(".{}.tar.zst.tmp", spec.name));
let mut file = tokio::fs::File::create(&tmp_path).await?;
let mut downloaded: u64 = 0;
while let Some(chunk) = stream.try_next().await? {
hasher.update(&chunk);
file.write_all(&chunk).await?;
downloaded += chunk.len() as u64;
report_progress(&spec.name, downloaded, content_length);
}
file.flush().await?;
let hash = format!("{:x}", hasher.finalize());
if hash != spec.sha256 {
tokio::fs::remove_file(&tmp_path).await?;
return Err(anyhow!(
"Checksum mismatch for {}: expected {}, got {}",
spec.name, spec.sha256, hash
));
}
// Extract synchronously on a blocking thread to avoid starving the runtime
let tmp = tmp_path.clone();
let dest = cellar.join(&spec.name).join(&spec.version);
tokio::task::spawn_blocking(move || extract_tar_zstd(&tmp, &dest)).await??;
tokio::fs::remove_file(&tmp_path).await?;
Ok(())
}
Two details matter here:
bytes_stream() instead of bytes(). The bytes() method buffers the entire response body before returning. bytes_stream() returns a Stream of chunks, typically 8-16KB each, which we process incrementally. For a 100MB bottle, this keeps memory usage at the chunk size rather than 100MB.
spawn_blocking for extraction. Tar decompression is CPU-bound. Running it on a Tokio worker thread would block that thread from servicing other downloads. spawn_blocking moves the work to a separate thread pool designed for blocking operations.
Progress reporting across concurrent tasks
Showing download progress for 8 simultaneous downloads requires thread-safe state. stout uses indicatif::MultiProgress with one progress bar per active download:
use indicatif::{MultiProgress, ProgressBar, ProgressStyle};
use std::sync::Arc;
fn create_progress(multi: &MultiProgress, name: &str, total: Option<u64>) -> ProgressBar {
let pb = multi.add(ProgressBar::new(total.unwrap_or(0)));
pb.set_style(
ProgressStyle::default_bar()
.template("{prefix:.cyan} [{bar:20}] {bytes}/{total_bytes} {bytes_per_sec}")
.unwrap()
.progress_chars("=> "),
);
pb.set_prefix(name.to_string());
pb
}
The MultiProgress handle is Send + Sync, so it can be shared across Tokio tasks via Arc. Each task updates its own ProgressBar, and MultiProgress handles the terminal rendering — redrawing all bars on each update without flickering.
Retry logic with exponential backoff
Network requests fail. CDN edge nodes return 503s under load. Connections reset. stout retries failed downloads up to 3 times with exponential backoff:
async fn download_with_retry(url: &str, max_retries: u32) -> Result<reqwest::Response> {
let mut last_error = None;
for attempt in 0..=max_retries {
if attempt > 0 {
let delay = Duration::from_millis(500 * 2u64.pow(attempt - 1));
tokio::time::sleep(delay).await;
}
match reqwest::get(url).await.and_then(|r| r.error_for_status()) {
Ok(response) => return Ok(response),
Err(e) => {
last_error = Some(e);
}
}
}
Err(last_error.unwrap().into())
}
The backoff delays are 500ms, 1s, and 2s. These are short because the user is watching a terminal — long backoffs feel broken. If all retries fail, the error propagates to the JoinSet collector, which reports it without aborting other downloads.
Cancellation with tokio::select
When a user presses Ctrl+C during a parallel download, stout needs to clean up temporary files and release partial downloads. Tokio’s select! macro enables cooperative cancellation:
use tokio::signal;
pub async fn install_with_cancellation(bottles: Vec<BottleSpec>) -> Result<()> {
tokio::select! {
result = download_all(bottles, cellar) => result,
_ = signal::ctrl_c() => {
eprintln!("\nInstallation cancelled. Cleaning up...");
cleanup_temp_files(cellar).await?;
std::process::exit(130)
}
}
}
When ctrl_c() resolves, Tokio drops the download_all future, which cancels all in-flight downloads. The temporary files are then cleaned up before exit.
Performance results
On a 100 Mbps connection installing ffmpeg (25 dependencies, ~180MB total):
| Approach | Wall-clock time |
|---|---|
| Sequential (Homebrew) | 42s |
| Unbounded parallel | 6.1s (with CDN throttling) |
| Bounded parallel (8) | 4.8s |
| Bounded + streaming extract | 3.2s |
The bounded approach is faster than unbounded because it avoids connection congestion and CDN rate limits. Streaming extraction overlaps I/O with decompression, shaving another 1.6 seconds.
Lessons learned
Do not block the Tokio runtime. This is the most common async Rust mistake. Any CPU-bound work over a few microseconds should go to spawn_blocking. We initially ran zstd decompression on worker threads and saw download throughput drop by 60% because decompression was starving the executor.
Semaphore permits must be dropped explicitly. If a task panics while holding a semaphore permit, the permit is leaked and the effective concurrency limit shrinks. We wrap downloads in std::panic::AssertUnwindSafe and catch panics to ensure permits are always released.
JoinSet is better than join_all for error collection. futures::future::join_all waits for all tasks and returns results in order. JoinSet::join_next returns results as they complete, letting stout report progress incrementally and fail fast when appropriate.
Tokio adds complexity compared to synchronous code. But for a package manager — where network I/O dominates wall-clock time and users are waiting — the 10x speedup from parallel downloads justifies every line of async machinery.
Need Rust performance engineering or AI agent expertise?
Neul Labs — the team behind stout — consults on Rust development, performance optimization, CLI tool design, and AI agent infrastructure. We build fast, reliable systems that ship.