RepoPilotOpen in app →

bheisler/criterion.rs

Statistics-driven benchmarking library for Rust

Healthy

Healthy across the board

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 2w ago
  • 44+ active contributors
  • Distributed ownership (top contributor 27% of recent commits)
Show all 6 evidence items →
  • Apache-2.0 licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/bheisler/criterion.rs)](https://repopilot.app/r/bheisler/criterion.rs)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/bheisler/criterion.rs on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: bheisler/criterion.rs

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/bheisler/criterion.rs shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 2w ago
  • 44+ active contributors
  • Distributed ownership (top contributor 27% of recent commits)
  • Apache-2.0 licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live bheisler/criterion.rs repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/bheisler/criterion.rs.

What it runs against: a local clone of bheisler/criterion.rs — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in bheisler/criterion.rs | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 45 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>bheisler/criterion.rs</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of bheisler/criterion.rs. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/bheisler/criterion.rs.git
#   cd criterion.rs
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of bheisler/criterion.rs and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "bheisler/criterion.rs(\\.git)?\\b" \\
  && ok "origin remote is bheisler/criterion.rs" \\
  || miss "origin remote is not bheisler/criterion.rs (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "src/lib.rs" \\
  && ok "src/lib.rs" \\
  || miss "missing critical file: src/lib.rs"
test -f "src/measurement.rs" \\
  && ok "src/measurement.rs" \\
  || miss "missing critical file: src/measurement.rs"
test -f "src/analysis.rs" \\
  && ok "src/analysis.rs" \\
  || miss "missing critical file: src/analysis.rs"
test -f "src/benchmark.rs" \\
  && ok "src/benchmark.rs" \\
  || miss "missing critical file: src/benchmark.rs"
test -f "src/stats/mod.rs" \\
  && ok "src/stats/mod.rs" \\
  || miss "missing critical file: src/stats/mod.rs"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 45 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~15d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/bheisler/criterion.rs"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

Criterion.rs is a statistics-driven microbenchmarking library for Rust that automates rigorous performance measurement with built-in statistical analysis, automated regression detection, and publication-quality plot generation. It eliminates noisy one-shot measurements by running benchmarks multiple times, analyzing variance, and detecting performance regressions with statistical confidence. Monorepo structure: core library in root src/, benchmark examples in benches/benchmarks/ (async_measurement_overhead.rs, compare_functions.rs, iter_with_setup.rs), plot generation in plot/ subdirectory, compatibility shim in bencher_compat/, and mdBook documentation in book/. The crate uses a fluent builder API (BenchmarkId, Criterion::benchmark) that wraps measurement harnesses.

👥Who it's for

Rust library maintainers and systems engineers who need to track performance characteristics of their code across commits and releases, detect unintended slowdowns early, and publish benchmark results with statistical rigor rather than anecdotal timing.

🌱Maturity & risk

Production-ready and actively maintained. Version 0.7.0 with workspace setup, comprehensive CI/CD in .github/workflows/ci.yaml, dual Apache/MIT licensing, and recent development activity. However, note the README indicates active development has moved to the criterion-rs organization; this is the archived upstream.

Moderate risk: project ownership transitioned from Brook Heisler (largely absent) to criterion-rs organization, creating a split focus. MSRV is Rust 1.80 (2024), so older projects may have compatibility issues. Heavy optional dependency chain (rayon, tokio, async-std, smol, plotters) increases build surface; ensure you enable only needed features. No dependency lock constraints in top-level Cargo.toml beyond semver ranges.

Active areas of work

No recent commits visible in file metadata, but the project status note indicates active transition to criterion-rs/criterion.rs organization. The original repo is accepting PRs slowly; new development should target the new org. Version 0.7.0 is current stable with MSRV 1.80.

🚀Get running

git clone https://github.com/bheisler/criterion.rs
cd criterion.rs
cargo test
cargo bench --bench bench_main

Daily commands: Criterion is a library, not a standalone tool. Run benchmarks via: cargo bench --bench <name> (e.g., cargo bench --bench bench_main). Results appear in target/criterion/ with HTML reports. For the example suite: cargo bench --all in benches/ directory runs all benchmarks in benches/benchmarks/*.rs.

🗺️Map of the codebase

  • src/lib.rs — Main entry point and public API of the criterion benchmarking library; defines the Criterion struct and core benchmarking interface.
  • src/measurement.rs — Defines the Measurement trait and timing infrastructure that underpins all benchmark execution and statistics collection.
  • src/analysis.rs — Core statistical analysis engine that performs regression, outlier detection, and generates performance reports from raw measurements.
  • src/benchmark.rs — Implements the Benchmark struct that orchestrates test execution, sample collection, and result aggregation.
  • src/stats/mod.rs — Statistics module providing mean, variance, confidence intervals, and outlier detection algorithms essential for benchmark analysis.
  • Cargo.toml — Workspace configuration defining dependencies (plotters, serde, rayon) and feature flags that control optional functionality.
  • README.md — High-level overview and migration notice pointing to the new criterion-rs organization; critical context for understanding repository status.

🛠️How to make changes

Add a new Measurement type (e.g., memory, cache misses)

  1. Create a new struct implementing the Measurement trait in src/measurement.rs or a new file (src/measurement.rs)
  2. Implement start(), end(), and to_f64() methods to capture and convert the metric (src/measurement.rs)
  3. Update Criterion to accept the measurement type via generic parameter or builder pattern (src/criterion.rs)
  4. Add statistical analysis support in analysis.rs if custom aggregation is needed (src/analysis.rs)

Add a new statistical analysis or outlier detection method

  1. Implement the algorithm as a function or struct in src/stats/mod.rs (src/stats/mod.rs)
  2. Integrate into the Analysis struct's processing pipeline (src/analysis.rs)
  3. Add corresponding fields to BenchmarkResult or Estimate struct for result storage (src/benchmark.rs)
  4. Update HTML template rendering to display the new metric in src/html/mod.rs (src/html/mod.rs)

Add support for a new output format (e.g., JSON, YAML)

  1. Create a new module src/json_report.rs (or similar) mirroring csv_report.rs (src/csv_report.rs)
  2. Implement serialization logic using serde for BenchmarkResult and Estimate (src/benchmark.rs)
  3. Register the new reporter in Report::write_data() or Report::generate() (src/report.rs)
  4. Add configuration option in src/config.rs to enable/disable the new format (src/config.rs)

Add a new command-line option or configuration flag

  1. Define the option in the Config struct in src/config.rs with a builder method (src/config.rs)
  2. Parse the flag from environment or command-line args in Criterion::configure_from_args() (src/criterion.rs)
  3. Thread the config value through Benchmark execution in src/benchmark.rs (src/benchmark.rs)
  4. Update documentation in CONTRIBUTING.md or README.md (CONTRIBUTING.md)

🔧Why these technologies

  • Rust + procedural macros (criterion!, black_box) — Zero-cost abstractions and compile-time optimization barriers; prevents LLVM from eliding benchmarked code
  • Rayon (parallel iterators) — Enables cross-platform multi-threaded sampling for faster benchmark collection without explicit threading
  • Plotters (SVG charts) — Pure Rust plotting without external dependencies; self-contained HTML reports with embedded SVG and JSON data
  • Serde (serialization) — Flexible format support (JSON, CSV) for storing benchmarks, baselines, and enabling tool integration
  • Criterion.js (browser-side analysis) — Interactive HTML reports with client-side filtering and comparison without server overhead

⚖️Trade-offs already made

  • Statistics-first design over single-run timing

    • Why: CPU throttling, OS scheduling, and cache effects make single measurements unreliable
    • Consequence: Benchmarks are slower (default 100 samples) but results are reproducible and statistically valid; users trade wall-clock time for confidence
  • File-based baseline storage instead of central database

    • Why: Simplifies deployment, avoids external service dependencies, and keeps results in version control
    • Consequence: Baseline comparison is local-only; no cross-machine or CI/CD aggregation without custom tooling
  • Tukey's IQR outlier detection (1.5×IQR fence) vs robust M-estimators

    • Why: Tukey is fast, well-understood, and appropriate for moderately-sized samples (100–1000 iterations)
    • Consequence: May over-smooth when outliers are legitimate (e.g., CPU frequency scaling); trade-off: simplicity for occasional false negatives
  • HTML + embedded JSON reports instead of live dashboard

    • Why: Static reports are portable, archivable, and integrate seamlessly with CI/CD (GitHub Pages, artifact storage)
    • Consequence: No real-time monitoring; users must manually review reports or script custom aggregation

🚫Non-goals (don't propose these)

  • Does not provide real-time performance monitoring or continuous profiling
  • Does not handle distributed benchmark aggregation across machines or CI workflows natively
  • Does not support flame graphs or CPU profiling (delegates to perf, cargo-flamegraph)
  • Not a database for historical benchmark tracking (stores only current vs. prior baseline)
  • Does not provide built-in regression testing alerts or pass/fail thresholds beyond statistical significance

🪤Traps & gotchas

  1. Feature flags are semi-required: plotters feature defaults to false; benchmarks won't generate HTML plots without it. Enable with cargo bench --all-features. 2) MSRV 1.80 is hard; older Rust toolchains will fail silently on async tests. 3) The repository is archived/superseded; PRs to bheisler/criterion.rs are slow; new issues should go to criterion-rs/criterion.rs. 4) Measurement overhead varies by platform; benches/benchmarks/measurement_overhead.rs must be run locally to tune noise floors. 5) No built-in secrets; benchmark output includes absolute timing data—sanitize before publishing.

🏗️Architecture

💡Concepts to learn

  • Welford's algorithm for online variance — Criterion computes running statistics without storing all samples; understanding this enables debugging unexpected confidence intervals and variance underestimation
  • Bootstrapping (resampling statistics) — Criterion uses bootstrap resampling to derive confidence intervals on benchmark slopes and detect regressions; critical for interpreting generated reports
  • Linear regression on iteration timing — Criterion fits linear models to (iterations, total_time) pairs to isolate per-iteration cost from setup overhead; visible in benches/benchmarks/measurement_overhead.rs
  • Outlier detection (MAD/IQR filtering) — Benchmark results spike on system noise; Criterion filters outliers before statistical analysis to avoid false regressions
  • Effect size and Cohen's d — Regression reports use effect size (not just p-value) to judge practical significance of timing changes; prevents false positives on tiny real-world impacts
  • Burn-in and JIT stabilization — Microbenchmarks require warming up the JIT and CPU caches; Criterion's harness design (warm_up_time config) accounts for this to avoid measuring initialization cost
  • Throughput vs. wall-clock latency modes — Criterion supports both iter() timing (total time / iterations) and custom measurement units; understanding the distinction prevents incorrect benchmark interpretation
  • bheisler/bencher — Criterion's predecessor; bencher_compat/ shim provides backwards compatibility migration path
  • criterion-rs/criterion.rs — Active fork and current canonical repository; new development and PRs should target here instead of bheisler/criterion.rs
  • bheisler/cargo-criterion — Companion CLI tool that wraps Criterion for easier integration into CI/CD pipelines and regression tracking
  • rust-lang/rust-analyzer — Heavily benchmarked with Criterion for performance regression detection in compiler tooling
  • tikv/iai — Alternative Rust benchmarking framework (instruction-level, deterministic) documented in book/src/iai/ for comparison

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive tests for custom_measurements module

The book contains documentation for custom measurements (book/src/user_guide/custom_measurements.md) and there's a benchmark example (benches/benchmarks/custom_measurement.rs), but the main crate likely lacks unit tests for the custom measurement API. This is a critical feature that needs test coverage to prevent regressions, especially given the recent org migration mentioned in the README.

  • [ ] Locate the custom measurement module in src/ (likely src/measurement.rs or similar)
  • [ ] Add unit tests for custom Measurement trait implementations
  • [ ] Add tests for measurement configuration and integration with the main Criterion struct
  • [ ] Ensure tests cover error cases and edge cases for custom measurement types

Add missing CI workflow for MSRV (Minimum Supported Rust Version) verification

The Cargo.toml specifies rust-version = '1.80' and CONTRIBUTING.md mentions updating .github/workflows/ci.yml when changing MSRV, but there's no dedicated workflow visible testing against the minimum supported version. This prevents catching accidental MSRV breakage in PRs.

  • [ ] Create .github/workflows/msrv.yml to test against rust-version 1.80
  • [ ] Use actions-rs/toolchain or rustup to install the MSRV version
  • [ ] Run cargo check and cargo test with the MSRV toolchain
  • [ ] Consider testing all workspace members (main crate, plot, bencher_compat)

Add documentation for async benchmarking best practices in book/src/user_guide/

The crate has async runtime support (tokio, async-std, smol dependencies and benches/benchmarks/async_measurement_overhead.rs), and book/src/user_guide/benchmarking_async.md exists, but there's no documented guide for troubleshooting common async benchmarking issues or performance considerations specific to different runtimes.

  • [ ] Review benches/benchmarks/async_measurement_overhead.rs to understand async overhead patterns
  • [ ] Create book/src/user_guide/async_benchmarking_troubleshooting.md covering runtime selection tradeoffs
  • [ ] Document measurement overhead differences between tokio, async-std, and smol runtimes
  • [ ] Add examples for benchmarking concurrent code and spawned tasks

🌿Good first issues

  • Add missing documentation examples to book/src/user_guide/ for the custom measurement API (seen in benches/benchmarks/custom_measurement.rs but not documented in SUMMARY.md)
  • Extend benches/benchmarks/with_inputs.rs parametric benchmark example to cover rayon parallel iterator cases and document in the getting_started guide
  • Create a new benchmark in benches/benchmarks/ demonstrating the external_process.rs pattern (infrastructure exists but lacks a complete runnable example alongside the Python script)

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 3dbc6c6 — Redirect new contributions to Criterion-rs organization (#899) (berkus)
  • af5cc00 — Extract shared package attributes into workspace (#879) (berkus)
  • 567405d — release: bump criterion and criterion-plot versions (#878) (lemmih)
  • ccccbcc — fix: deal with throughput in bits (#861) (lemmih)
  • deb0eb0 — feat: support throughput reports in bits (#833) (birneee)
  • d4fd7cc — Add CI job checking library builds with oldest allowed dependencies (#854) (faern)
  • 43bf90a — release version 0.6.0 (#860) (lemmih)
  • 92696e4 — deps: unpin clap (#858) (lemmih)
  • 5756a5d — chore: bump MSRV to 1.80 (#859) (lemmih)
  • 9d887c0 — Fixed typo in faq.md (#852) (kienmarkdo)

🔒Security observations

The criterion.rs benchmarking library demonstrates good security practices overall. It is a development-only tool with limited exposure to untrusted input, which inherently reduces attack surface. The codebase explicitly manages dependency features and includes an audit workflow. Main concerns are: (1) the regex dependency is outdated but not critically vulnerable, (2) serialization of benchmark data via serde/ciborium could warrant additional input validation, and (3) presence of external Python scripts requires verification of their security posture. No hardcoded credentials, injection vulnerabilities, or critical misconfigurations were identified. The project would benefit from explicit feature declaration in all dependencies and supply chain attestation mechanisms.

  • Low · Outdated Dependency: regex — Cargo.toml - dependencies.regex. The regex crate version 1.5.1 is specified, which is significantly outdated. Current stable versions are 1.10+. While regex DoS vulnerabilities have been largely mitigated in modern versions, using outdated versions may miss important performance and stability improvements. Fix: Update regex to the latest stable version (currently 1.10+). Run 'cargo update' and verify compatibility with the codebase.
  • Low · Permissive Default Features in Dependencies — Cargo.toml - dependencies.serde, serde_json, ciborium. Several dependencies have explicitly disabled default features (clap, regex, plotters, num-traits, futures), which is good practice. However, the ciborium dependency (0.2.0) does not explicitly specify features, and serde/serde_json with derive features could potentially expose serialization vulnerabilities if untrusted data is processed without validation. Fix: Review the usage of serde/serde_json to ensure untrusted input is validated before deserialization. Consider explicitly specifying required features for ciborium and testing with security-focused fuzzing.
  • Low · Python Script in Benchmarks — benches/benchmarks/external_process.py. The file 'benches/benchmarks/external_process.py' exists alongside Rust benchmark code. Python scripts can introduce dependencies and potential security issues if not properly sandboxed or if they invoke external commands without proper validation. Fix: Verify that the Python script is only used in development/test environments and does not process untrusted input. Ensure proper input sanitization if it spawns external processes. Consider documenting its purpose and limitations.
  • Low · No SBOM or Supply Chain Visibility — .github/workflows/. While the codebase includes a CI audit workflow (.github/workflows/audit.yml), there is no Software Bill of Materials (SBOM) generation visible in the configuration. This reduces supply chain transparency. Fix: Implement automated SBOM generation in CI/CD pipeline using tools like cargo-sbom or cyclonedx. Include dependency verification and attestation in the release process.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · bheisler/criterion.rs — RepoPilot