RepoPilotOpen in app →

rust-lang/hashbrown

Rust port of Google's SwissTable hash map

Healthy

Healthy across the board

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 2w ago
  • 11 active contributors
  • Distributed ownership (top contributor 34% of recent commits)
Show all 6 evidence items →
  • Apache-2.0 licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/rust-lang/hashbrown)](https://repopilot.app/r/rust-lang/hashbrown)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/rust-lang/hashbrown on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: rust-lang/hashbrown

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/rust-lang/hashbrown shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 2w ago
  • 11 active contributors
  • Distributed ownership (top contributor 34% of recent commits)
  • Apache-2.0 licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live rust-lang/hashbrown repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/rust-lang/hashbrown.

What it runs against: a local clone of rust-lang/hashbrown — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in rust-lang/hashbrown | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 47 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>rust-lang/hashbrown</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of rust-lang/hashbrown. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/rust-lang/hashbrown.git
#   cd hashbrown
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of rust-lang/hashbrown and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "rust-lang/hashbrown(\\.git)?\\b" \\
  && ok "origin remote is rust-lang/hashbrown" \\
  || miss "origin remote is not rust-lang/hashbrown (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "src/lib.rs" \\
  && ok "src/lib.rs" \\
  || miss "missing critical file: src/lib.rs"
test -f "src/raw.rs" \\
  && ok "src/raw.rs" \\
  || miss "missing critical file: src/raw.rs"
test -f "src/map.rs" \\
  && ok "src/map.rs" \\
  || miss "missing critical file: src/map.rs"
test -f "src/control/mod.rs" \\
  && ok "src/control/mod.rs" \\
  || miss "missing critical file: src/control/mod.rs"
test -f "src/control/group/mod.rs" \\
  && ok "src/control/group/mod.rs" \\
  || miss "missing critical file: src/control/group/mod.rs"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 47 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~17d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/rust-lang/hashbrown"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

hashbrown is a high-performance Rust port of Google's SwissTable hash map algorithm, providing drop-in replacements for std::HashMap and std::HashSet with ~2x faster lookups, 1 byte of overhead per entry vs. 8, and SIMD-accelerated key scanning. It works in no_std environments and is now the default HashMap implementation in the Rust standard library (since 1.36). Single-crate structure with modular architecture: src/raw.rs (core SwissTable implementation), src/map.rs (HashMap wrapper), src/set.rs (HashSet wrapper); src/control/ contains SIMD group abstractions (sse2.rs, neon.rs, lsx.rs, generic.rs for portability); src/external_trait_impls/ adds optional serde and rayon support; benches/ includes micro-benchmarks for insert_unique_unchecked, set_ops, with_capacity, and general_ops.

👥Who it's for

Rust developers building performance-critical applications (embedded systems, kernels, high-throughput services) who need faster hash map operations than the standard library default, and library authors maintaining Rust collections that must support both std and no_std targets.

🌱Maturity & risk

Production-ready and actively maintained. The codebase is well-established (adopted into Rust std since 1.36), uses comprehensive CI/CD via GitHub Actions (rust.yml, release-plz.yml, Miri testing), has 823K lines of Rust with full linting (missing_docs, unsafe_op_in_unsafe_fn warnings), and maintains an MSRV of 1.85.0 across multiple platforms via Cross.toml. Recent activity evident from release automation and changelog maintenance.

Low risk for production use. Dependency surface is minimal (foldhash, rayon, serde optional; allocator-api2 for advanced allocators) with stable, well-maintained upstream. Single implicit risk: as a core data structure ported from C++, any memory safety bug could have wide impact across dependent crates. Edition set to 2024 means it tracks cutting-edge Rust—users on older editions may face compatibility issues.

Active areas of work

Active maintenance with release automation (release-plz.yml orchestrates version bumping and publishing). Recent focus on CI stability (rust.yml runs clippy, miri, tests across platforms via Cross.toml). No specific in-progress features visible, but the codebase tracks Rust language evolution (2024 edition, nightly feature flag for may_dangle attribute).

🚀Get running

git clone https://github.com/rust-lang/hashbrown
cd hashbrown
cargo test
cargo bench --bench general_ops

Daily commands: Development: cargo test (full test suite), cargo bench (all benchmarks), cargo clippy (linting). CI: ci/run.sh (main test matrix), ci/miri.sh (undefined behavior detection). To run a specific benchmark: cargo bench --bench insert_unique_unchecked -- --verbose.

🗺️Map of the codebase

  • src/lib.rs — Entry point and public API facade that re-exports HashMap, HashSet, and core abstractions; required reading to understand the crate's public surface.
  • src/raw.rs — Core RawTable implementation containing the low-level SwissTable algorithm, hash bucket management, and probe logic; foundational to all hash operations.
  • src/map.rs — HashMap wrapper around RawTable that implements Rust's HashMap API; primary user-facing API after lib.rs.
  • src/control/mod.rs — Control byte metadata system that tracks bucket occupancy and hash tag information; critical performance optimization in SwissTable.
  • src/control/group/mod.rs — SIMD group abstraction (SSE2, NEON, LSX, generic) for efficient batch bucket scanning; architecture-specific performance critical path.
  • src/table.rs — Generic Table wrapper providing unified interface over RawTable with iterator adapters and utility methods.
  • src/alloc.rs — Memory allocation abstractions supporting no_std environments; enables core functionality without stdlib.

🛠️How to make changes

Add a new hash map method

  1. Implement the method logic in src/raw.rs on RawTable if it requires low-level bucket access (src/raw.rs)
  2. Add the public-facing wrapper method to src/map.rs HashMap<K, V, S> (src/map.rs)
  3. Add corresponding method to src/set.rs HashSet<T, S> if applicable (src/set.rs)
  4. Add doc comments following the existing style and ensure it appears in src/lib.rs re-exports (src/lib.rs)
  5. Add test cases to tests/ to validate behavior matches std HashMap (tests/equivalent_trait.rs)

Optimize probe sequence for a new CPU architecture

  1. Create new SIMD group implementation file (e.g., src/control/group/avx512.rs) with batch bucket matching (src/control/group/avx512.rs)
  2. Implement the Group trait matching signature in src/control/group/mod.rs (src/control/group/mod.rs)
  3. Add target_arch cfg gate in src/control/group/mod.rs to select your implementation (src/control/group/mod.rs)
  4. Benchmark the new path using benches/bench.rs with target CPU architecture (benches/bench.rs)

Add a custom hasher integration test

  1. Create test file in tests/ (e.g., tests/custom_hasher.rs) that instantiates HashMap<K, V, MyHasher> (tests/hasher.rs)
  2. Verify the hasher integrates with src/hasher.rs BuildHasher trait if needed (src/hasher.rs)
  3. Use src/raw.rs RawTable::with_hasher() or src/map.rs HashMap::with_hasher_and_capacity() in test (src/map.rs)

Enable serde support for a downstream type

  1. Verify serde feature is enabled in Cargo.toml (enabled by default) (Cargo.toml)
  2. Check src/external_trait_impls/serde.rs for Serialize/Deserialize impls on HashMap/HashSet (src/external_trait_impls/serde.rs)
  3. Add serde integration test to tests/serde.rs following existing patterns (tests/serde.rs)

🔧Why these technologies

  • SwissTable algorithm (SIMD-accelerated probing) — Provides cache-efficient bucket scanning with 16 parallel comparisons per CPU cycle on x86_64 SSE2, dramatically reducing L1 cache misses vs. linear probing.
  • SIMD Group abstraction (SSE2, NEON, LSX, generic) — Encapsulates architecture-specific intrinsics while maintaining portable fallback for portability without sacrificing performance on modern CPUs.
  • Control byte metadata (occupancy + tag) — Separates metadata from values for cheaper rejection tests and enables faster group-level scanning without touching actual data.
  • no_std with optional std/alloc — Enables deployment in embedded, kernel, and WASM environments where std is unavailable; essential for Linux kernel and embedded use cases.
  • Trait-based entry API (raw_entry, entry, rustc_entry) — Provides zero-copy, in-place mutation patterns matching Rust std::collections without sacrificing safety or efficiency.

⚖️Trade-offs already made

  • SwissTable vs. standard open addressing

    • Why: SwissTable trades slightly higher memory overhead (1 control byte per 8 buckets) for significantly better cache locality and SIMD parallelism, improving amortized O(1) constant factors.
    • Consequence: ~12.5% memory overhead for control metadata; higher peak performance in benchmarks at cost of modest baseline memory use.
  • SIMD groups (16 buckets/scan on SSE2) vs. 1-at-a-time probing

    • Why: Batch scanning reduces probe count in collision cases and leverages CPU vector units optimally.
    • Consequence: Code requires platform-specific intrinsics and fallbacks; compilation for unknown architectures falls back to slower generic group (1 bucket/iteration).
  • Optional std/alloc features vs. always-require std

    • Why: Supports no_std embedded use cases without penalizing downstream std users.
    • Consequence: Allocation abstraction in src/alloc.rs adds minor indirection; Cargo feature matrix increases testing surface area.
  • Unsafe for memory layout and SIMD intrinsics

    • Why: Performance-critical paths require unvetted raw pointer manipulation and vendor intrinsics; no safe Rust equivalent without 10–50% slowdown.
    • Consequence: Requires careful safety audits; MIRI, UBSan, and TSan testing in CI essential; any unsoundness is a critical security bug.

🚫Non-goals (don't propose these)

  • Does not provide a stable iteration order—hash seed is randomized per process
  • Does not guarantee protection against hash collision attacks without a custom hasher
  • Does not implement linked hash maps or LRU eviction
  • Does not provide real-time (bounded latency) guarantees; rehashing can cause O(n) stalls
  • Does not support key removal during iteration; iteration invalidation is undefined behavior

🪤Traps & gotchas

(1) Edition 2024: requires Rust 1.85.0+; older toolchains will fail silently. (2) SIMD backends are compile-time selected by feature detection—cross-platform builds via Cross.toml required for testing on non-native targets (ARM NEON, MIPS LSX). (3) unsafe code heavily concentrated in src/raw.rs and control/ modules; must use RUSTFLAGS="--cap-lints=warn" cargo miri locally to catch UB during development. (4) Default hasher (foldhash) does NOT provide HashDoS resistance vs. SipHash—users requiring cryptographic guarantees must override with a custom hasher (see README). (5) Memory layout assumptions: control bytes immediately precede bucket data; relying on specific alignment—changing this breaks binary compatibility.

🏗️Architecture

💡Concepts to learn

  • SwissTable Probing — Core algorithm hashbrown implements: uses control byte metadata to find empty slots/matches in ~1-2 cache lines instead of linear probing's worst-case scan; understanding this explains the 2x speedup claim
  • SIMD Group Scanning — Multiple entries checked in parallel (SSE2: 16 bytes, NEON: 16 bytes) reducing branch mispredictions; why src/control/group/ exists and why platform-specific implementations (sse2.rs, neon.rs, lsx.rs) matter for performance
  • Control Byte Encoding — Each bucket stores a 1-byte hash signature instead of full hash; enables small memory footprint (1 byte vs. 8) and vectorized searches—critical to understanding src/control/tag.rs and bitmask operations
  • Quadratic Probing with Deletion — SwissTable uses tombstones (deleted marker) rather than reclaiming slots immediately; affects resize logic in src/raw.rs and why iteration must handle marked-deleted entries
  • no_std Compatibility — Crate requires explicit alloc crate dependency (src/alloc.rs) and avoids std library calls; critical for embedded/kernel usage and why allocator-api2 optional feature exists
  • Cross-Platform SIMD Dispatch — Compile-time detection (src/control/group/mod.rs uses cfg! and #[cfg]) selects SSE2/NEON/LSX/generic; why Cross.toml and ci/run.sh test multiple architectures and why portable builds must use generic fallback
  • Load Factor and Resizing — SwissTable resizes at 87.5% occupancy to maintain performance guarantees; explains why src/raw.rs triggers resize and why benches/ measure insertion at different densities
  • abseil/abseil-cpp — Original C++ SwissTable implementation; architectural reference and algorithm source
  • rust-lang/rust — Standard library consumer of hashbrown; HashMap adoption since 1.36 means std lib perf depends on this crate's improvements
  • orlp/foldhash — Default hasher dependency; fast but weak against intentional collisions—used by hashbrown if default-hasher feature enabled
  • rayon-rs/rayon — Optional rayon integration (src/external_trait_impls/rayon/) provides parallel iterator support for HashMap/HashSet
  • serde-rs/serde — Optional serialization support (src/external_trait_impls/serde.rs) for persistence and data interchange

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive SIMD group implementation tests in tests/

The repo has multiple SIMD implementations (SSE2, NEON, LSX, and generic) in src/control/group/ but there are no dedicated unit tests verifying correctness of each SIMD backend. Current tests in tests/ don't isolate group-level SIMD behavior. Adding targeted tests would catch regressions in platform-specific code paths and ensure consistency across implementations.

  • [ ] Create tests/simd_groups.rs with tests for each SIMD backend (sse2, neon, lsx, generic)
  • [ ] Add test cases for group::Group methods: match_empty(), match_byte(), match_h2(), match_byte_and_null()
  • [ ] Verify tests run on CI for available platforms (add to .github/workflows/rust.yml if needed)
  • [ ] Test both correctness and bit-pattern matching for hash control bytes across architectures

Add missing safety documentation for unsafe blocks in src/raw.rs and src/table.rs

The lints config has missing_safety_doc = "allow" but the codebase contains numerous unsafe blocks with complex invariants (especially in raw.rs and table.rs for memory layout, bucket indexing, and probe sequences). Documenting safety preconditions would improve maintainability and prevent future bugs. This aligns with unsafe_op_in_unsafe_fn lint compliance.

  • [ ] Review all unsafe fn and unsafe blocks in src/raw.rs (particularly RawTable methods)
  • [ ] Review all unsafe blocks in src/table.rs related to bucket access and iteration
  • [ ] Document invariants: valid bucket indices, control byte semantics, growth conditions, and probing guarantees
  • [ ] Add SAFETY comments to each unsafe block explaining why it's safe given the preconditions
  • [ ] Consider enabling missing_safety_doc lint after documentation is complete

Add benchmarks for worst-case probe sequence behavior and collision scenarios

The benches/ directory has general_ops.rs and set_ops.rs but lacks benchmarks for pathological cases that stress the probing algorithm and load factor management. SwissTable's performance depends heavily on collision handling. Adding benchmarks for high-load-factor insertions, worst-case hash functions, and quadratic probing behavior would help detect performance regressions.

  • [ ] Create benches/probe_sequences.rs with benchmarks for maps at 80%+ load factors
  • [ ] Add benchmark for worst-case hash functions (colliding hashes) to measure probe depth
  • [ ] Add benchmark comparing iteration performance at various load factors
  • [ ] Run benchmarks before/after changes to src/raw.rs probing logic
  • [ ] Document expected behavior in benchmark comments (reference SwissTable paper if applicable)

🌿Good first issues

  • Add missing doc comments to src/control/tag.rs and src/util.rs—currently lack #![warn(missing_docs)] coverage and are small surface area for learning control byte semantics without touching hot paths
  • Extend benches/with_capacity.rs to include benchmark variants for different load factors (25%, 50%, 75%, 87.5% occupancy) to expose tuning opportunities for pre-allocation patterns
  • Add a no_miri test variant in ci/miri.sh that skips SIMD group implementations and forces src/control/group/generic.rs to validate scalar fallback path under UB detection

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 420e83b — Merge pull request #722 from cuviper/rustc_try_insert (cuviper)
  • 51cecbd — Move the RustcOccupiedError note as requested in review (cuviper)
  • 16d0f37 — Add HashMap::rustc_try_insert (cuviper)
  • 18a04c5 — Merge pull request #721 from clarfonthey/branch-rename (marcoieni)
  • ee8a0ee — Rename master to main in release-plz workflow (clarfonthey)
  • 147df65 — Merge pull request #720 from xtqqczze/authors (clarfonthey)
  • 64a0acb — Remove package.authors field from Cargo metadata (xtqqczze)
  • 867db72 — Merge pull request #716 from atouchet/rdm (Amanieu)
  • 57b760b — Update Readme (atouchet)
  • 7564121 — Merge pull request #715 from heiher/lsx-vori (clarfonthey)

🔒Security observations

The hashbrown codebase demonstrates strong security practices overall. It is a well-maintained Rust library with strict compiler lints enabled, no obvious injection vulnerabilities, no hardcoded secrets, and no exposed credentials. The main issues identified are configuration problems (invalid edition field, incomplete dependency declaration) rather than security vulnerabilities. Dependencies are appropriately optional and versioned. The repository follows Rust security best practices with unsafe_op_in_unsafe_fn checks and comprehensive code quality linting. No Docker/infrastructure security issues are present. Immediate fixes needed for the edition typo and dependency declaration completion to ensure successful builds.

  • Medium · Incomplete Dependency Declaration — Cargo.toml - equivalent dependency. The Cargo.toml file appears to have a truncated dependency entry for 'equivalent' package. The version specification is cut off ('ve' instead of a complete version string), which could lead to build failures or unexpected version resolution. Fix: Complete the dependency declaration with a full version specification (e.g., 'equivalent = { version = "1.0", ... }'). Verify the dependency resolves correctly by running 'cargo check' and 'cargo tree'.
  • Low · Edition Field Typo — Cargo.toml - edition field. The Cargo.toml specifies 'edition = "2024"' which is not a valid Rust edition. Valid editions are 2015, 2018, and 2021. This appears to be a typo or configuration error that may cause compilation issues. Fix: Correct the edition to a valid value. For modern Rust projects, use 'edition = "2021"' unless there is a specific reason to use an older edition.
  • Low · Strict Lint Configuration May Impact Maintenance — Cargo.toml - [lints] section. Multiple strict lints are enabled as warnings (unsafe_op_in_unsafe_fn, missing_docs, unreachable_pub, etc.). While beneficial for code quality, this increases maintenance burden and may cause CI failures if not consistently addressed. Fix: Maintain consistent adherence to lint standards. Consider documenting lint policies in CONTRIBUTING.md. For missing_docs warnings, ensure all public APIs have proper documentation.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.