dgraph-io/badger
Fast key-value DB in Go.
Healthy across the board
Permissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 1d ago
- ✓32+ active contributors
- ✓Distributed ownership (top contributor 19% of recent commits)
Show 3 more →Show less
- ✓Apache-2.0 licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/dgraph-io/badger)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/dgraph-io/badger on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: dgraph-io/badger
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/dgraph-io/badger shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 1d ago
- 32+ active contributors
- Distributed ownership (top contributor 19% of recent commits)
- Apache-2.0 licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live dgraph-io/badger
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/dgraph-io/badger.
What it runs against: a local clone of dgraph-io/badger — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in dgraph-io/badger | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of dgraph-io/badger. If you don't
# have one yet, run these first:
#
# git clone https://github.com/dgraph-io/badger.git
# cd badger
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of dgraph-io/badger and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "dgraph-io/badger(\\.git)?\\b" \\
&& ok "origin remote is dgraph-io/badger" \\
|| miss "origin remote is not dgraph-io/badger (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 4. Critical files exist
test -f "db.go" \\
&& ok "db.go" \\
|| miss "missing critical file: db.go"
test -f "memtable.go" \\
&& ok "memtable.go" \\
|| miss "missing critical file: memtable.go"
test -f "levels.go" \\
&& ok "levels.go" \\
|| miss "missing critical file: levels.go"
test -f "manifest.go" \\
&& ok "manifest.go" \\
|| miss "missing critical file: manifest.go"
test -f "iterator.go" \\
&& ok "iterator.go" \\
|| miss "missing critical file: iterator.go"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/dgraph-io/badger"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Badger is a fast, embeddable key-value database written in pure Go that provides ACID transactions with serializable snapshot isolation (SSI). It's designed as a performant alternative to RocksDB, handling hundreds of terabytes of data and serving as the underlying database for Dgraph. The codebase implements a log-structured merge (LSM) tree with value-log separation, supporting concurrent transactions, compression (via klauspost/compress), and persistence guarantees verified by nightly Jepsen-style bank tests. Single-package monorepo structure: core DB logic in root .go files (backup.go, batch.go) alongside a CLI tool in badger/cmd/ (backup, restore, stream, bench commands). Value storage uses LSM tree with separate value log; y/ subdirectory likely contains utilities. Tests are colocated (_test.go files). The CLI wraps the core DB with commands like flatten, rotate, and bank (transaction testing).
👥Who it's for
Go backend engineers building distributed systems, graph databases, or embedded applications needing a durable key-value store with transactional guarantees. Users of Dgraph, Jaeger Tracing, and UsenetExpress rely on Badger. Contributors are typically database or systems engineers optimizing for throughput, latency, and consistency.
🌱Maturity & risk
Badger is production-ready and battle-tested. It's used at scale (hundreds of terabytes), runs nightly 8-hour bank tests with --race flag, has been tested against filesystem-level anomalies, and is built on Go 1.23+ with a detailed CHANGELOG.md. The project has comprehensive CI/CD workflows (ci-badger-tests.yml, ci-badger-bank-tests-nightly.yml) and clear versioning policy (VERSIONING.md), indicating active maintenance and stability guarantees.
Low risk for production use, but note: the codebase has retracted versions (v4.0.0, v4.3.0 — see CHANGELOG for reasons), indicating past breaking changes. Dependencies are minimal and carefully selected (ristretto v2 for caching, xxhash for checksums, flatbuffers for serialization), all mature libraries. The main risk is Go version coupling (built with Go 1.23.0, toolchain go1.25.0), which may impact downstream projects on older Go versions. No obvious single-maintainer risk given org backing (dgraph-io).
Active areas of work
Active development with recent toolchain updates (Go 1.25.0 toolchain specified). CI pipelines validate functionality across nightly bank tests, dgraph integration tests, and standard unit tests. Renovate automation is enabled (.github/renovate.json) for dependency updates. The repo maintains backward compatibility while fixing issues (retracted versions show corrective releases).
🚀Get running
git clone https://github.com/dgraph-io/badger.git && cd badger && go mod download && go build ./badger/cmd/root.go. Or use the Makefile: make build. Check Makefile in root and badger/ subdirectory for targets.
Daily commands: For library use: import github.com/dgraph-io/badger/v4 in your Go code, open a DB with badger.Open(), and use transactions. For CLI tools: make build in badger/ directory, then ./badger/cmd/root <subcommand> (e.g., ./badger/cmd/root backup, ./badger/cmd/root restore). Dev server: no server mode; Badger is embedded. Run tests: go test ./... or make test.
🗺️Map of the codebase
db.go— Core database engine implementing the main KV store API (Open, Get, Set, Delete) and transaction handling—the primary entry point for all DB operations.memtable.go— In-memory write buffer that batches keys before they are compacted to disk; critical for write performance and consistency.levels.go— Implements LSM tree level management and compaction strategy that determines read/write performance characteristics.manifest.go— Persists metadata about SSTable versions and compaction state; corruption here causes data loss or recovery failures.iterator.go— Enables sequential key iteration across memtables and SSTables; fundamental for range queries and cleanup operations.options.go— Configuration struct and defaults that affect all performance knobs (cache size, compression, compaction levels).backup.go— Handles consistent snapshots and restore logic; essential for understanding durability guarantees and backup workflows.
🧩Components & responsibilities
- db.go (DB struct) — Orchestrates Get, Set, Delete, transactions, and compaction. Maintains level invari
🛠️How to make changes
Add a new compression algorithm
- Define compression codec constants and factory in options.go alongside existing CompressionType enum (
options.go) - Update compaction.go to call new codec in the compactionFilterFn where SSTable data is rewritten (
compaction.go) - Ensure backward compatibility by handling codec IDs in manifest.go when reading table versions (
manifest.go)
Implement a custom garbage collection policy
- Extend the DiscardPolicy interface and register it in discard.go (
discard.go) - Update db.go's RunValueLogGC() method to apply your custom discard policy when scanning keys (
db.go) - Add configuration in options.go to allow users to select the GC policy (
options.go)
Add a new CLI command (e.g., validate)
- Create a new file badger/cmd/validate.go following the pattern of badger/cmd/backup.go (
badger/cmd/validate.go) - Register the command in badger/cmd/root.go by adding it to the RootCmd.AddCommand() list (
badger/cmd/root.go) - Implement your logic using db.go's public API or manifest.go for offline inspection (
db.go)
Add encryption at rest for keys
- Extend options.go to include EncryptionKey field and encryption state (
options.go) - Modify batch.go and db.go Write/Get paths to encrypt before memtable insert and decrypt on retrieval (
batch.go) - Update compaction.go to transparently re-encrypt when rewriting SSTables during compaction (
compaction.go)
🔧Why these technologies
- LSM Tree (Log-Structured Merge Tree) — Optimizes write throughput via sequential disk I/O and deferred compaction; separates fast in-memory writes from slower disk merges.
- SkipList (memtable) — Lock-free concurrent reads and writes without stopping writers; enables high-concurrency inserts in the write buffer.
- Flatbuffers (fb/) — Zero-copy serialization for SSTable metadata and indices; reduces CPU and memory overhead in compaction.
- Go's sync primitives (RWMutex, channels) — Simplifies concurrent access to levels and manifest without external locks; integrates with Go scheduler.
- Ristretto cache (dgraph-io/ristretto) — Probabilistic admission control reduces cache pollution and memory footprint vs. naive LRU.
⚖️Trade-offs already made
-
LSM compaction via background goroutines rather than synchronous writes
- Why: Minimizes write latency and jitter; allows batching small writes into large disk I/Os.
- Consequence: Delayed space reclamation; stale keys occupy disk until compaction; read amplification during heavy writes.
-
Memtable flush to disk is blocking; transaction commits wait for manifest update
- Why: Ensures durability: once Commit() returns, data is on disk.
- Consequence: Write latency is higher than in-memory DBs; writes serialize on manifest fsync.
-
Iterator does not snapshot; reads ongoing writes if they occur during iteration
- Why: Reduces memory overhead and latency of snapshot creation.
- Consequence: Non-serializable reads; caller must handle concurrent mutations; MVCC-like semantics not enforced.
-
Value log (separate file for large values) is optional but default-on
- Why: Avoids bloating SSTables with large blobs; keeps indices lean.
- Consequence: Adds per-value indirection; two I/Os required for large values (metadata from SSTable, payload from value log).
-
Encryption at rest via application code (external), not built-in
- Why: Keeps core DB simple; users choose when/where to encrypt.
- Consequence: No transparent encryption; metadata (keys) are plaintext on disk; users must manage cipher state.
🚫Non-goals (don't propose these)
- Does not provide distributed consensus or replication (single-node only).
- Does not enforce ACID isolation; reads may see partially-committed writes.
- Does not include built-in authentication or role-based access control.
- Does not support SQL or secondary indices (KV-only API).
- Does not provide point-in-time recovery; backups are snapshots, not write-ahead logs for PITR.
🪤Traps & gotchas
No obvious serverless/containerization story: Badger is embedded only; each process has its own instance (no shared server mode). Filesystem dependencies: Uses memory-mapped I/O and sync syscalls; behavior may differ across filesystems (especially important for docker/vm testing). LSM tree compaction: Background compactions can cause latency spikes; tuning requires understanding Options.CompactL0OnClose and memtable settings. Version compatibility: Retracted v4.0.0 and v4.3.0 — always pin to a verified released version in go.mod. Go version lock: Built with Go 1.23.0/1.25.0 toolchain; older projects may face compatibility issues.
🏗️Architecture
💡Concepts to learn
- Log-Structured Merge (LSM) Tree — Badger's core data structure; understanding LSM (memtable → WAL → sorted runs → compaction) is essential to grasp why writes are fast but reads require bloom filters and why compaction tuning matters.
- Serializable Snapshot Isolation (SSI) — Badger's transaction consistency model (ACID with SSI); it prevents phantom reads and write skew without locking, critical for understanding why concurrent transactions don't deadlock.
- Memory-Mapped I/O (mmap) — Badger uses mmap for large value logs and SSTs; understanding mmap's zero-copy semantics, page cache interactions, and fsync behavior is vital for debugging performance issues.
- Bloom Filters — Badger uses bloom filters in ristretto and SST metadata to avoid disk I/O on negative lookups; understanding false positives vs. filter size trade-off is crucial for performance tuning.
- Write-Ahead Logging (WAL) — Badger's crash recovery mechanism; transactions are logged before application; understanding WAL guarantees explains why Badger survives power loss and filesystem anomalies.
- MVCC (Multi-Version Concurrency Control) — Badger's mechanism for supporting concurrent reads and writes; snapshots isolate readers from writers via version timestamps, fundamental to understanding transaction semantics.
- Jepsen Testing — Badger's nightly 8h bank tests validate consistency under adversarial conditions (network partitions, crash simulation); understanding the model verifies you can trust transactional guarantees.
🔗Related repos
dgraph-io/dgraph— Dgraph is the primary consumer of Badger and the reason Badger was built; understanding Dgraph's query execution helps contextualize Badger's ACID transactional needs.etcd-io/bbolt— Bolt/bbolt is an alternative Go KV store with B+tree (not LSM); useful for comparison when deciding between transaction overhead vs. write throughput trade-offs.dgraph-io/ristretto— Ristretto (v2.2.0) is Badger's in-memory cache layer; understanding its Doorkeeper bloom filter and TinyLFU eviction policy is essential for tuning Badger's caching behavior.cockroachdb/cockroach— CockroachDB uses RocksDB as inspiration and shares similar LSM patterns; examining its consistency testing (Jepsen) methodology parallels Badger's bank tests.jaegertracing/jaeger— Jaeger uses Badger as one of its storage backends; seeing real-world usage patterns (span storage, query optimization) reveals Badger's strengths in time-series-like workloads.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive encryption-at-rest integration tests
The repo has encryption-at-rest documentation (docs/encryption-at-rest.md) but lacks dedicated test coverage. The file structure shows backup_test.go, batch_test.go, db_test.go, etc., but no encryption_test.go. This is critical for a key-value store claiming encryption support to have explicit test coverage validating encryption workflows, key rotation, and encrypted backup/restore scenarios.
- [ ] Create encryption_test.go with tests for enabling/disabling encryption on DB initialization
- [ ] Add integration tests for encrypted backup and restore using badger/cmd/backup.go and badger/cmd/restore.go
- [ ] Test key rotation scenarios referenced in docs/encryption-at-rest.md
- [ ] Add tests validating that unencrypted data cannot be read by encrypted DB instances and vice versa
- [ ] Reference db.go OpenOptions for EncryptionKey parameter in tests
Implement missing CLI command tests for badger utility
The badger CLI tool has multiple commands (backup, restore, rotate, stream, flatten, info, bench) in badger/cmd/ with only rotate_test.go present. Other commands lack test coverage. Adding tests for these commands would improve reliability of the backup/restore pipeline and data integrity tools that users depend on.
- [ ] Create badger/cmd/backup_test.go to test backup creation with various options
- [ ] Create badger/cmd/restore_test.go to test data restoration and integrity verification
- [ ] Create badger/cmd/flatten_test.go to test value log flattening operations
- [ ] Create badger/cmd/stream_test.go to test streaming operations
- [ ] Add integration test validating backup → restore workflow maintains data integrity
Add platform-specific directory handling tests for all OS variants
The repo contains platform-specific implementations (dir_unix.go, dir_windows.go, dir_plan9.go, dir_aix.go) but likely lacks comprehensive tests validating behavior across these platforms. The test files don't show dir_test.go. This is important for a storage engine to ensure consistent behavior across OS boundaries.
- [ ] Create dir_test.go with platform-agnostic tests for directory operations
- [ ] Add tests for file sync behavior across Unix, Windows, Plan9, and AIX implementations
- [ ] Test edge cases: missing directories, permission errors, concurrent access patterns
- [ ] Add build tags to test Windows-specific behavior (dir_windows.go) on Windows CI
- [ ] Validate that LockFile and DirLock implementations work correctly per platform
🌿Good first issues
- Add missing unit tests for badger/cmd/rotate.go and badger/cmd/rotate_test.go: currently only rotate_test.go exists with minimal coverage; expand to test edge cases (empty DB, malformed rotations, concurrent rotations).
- Document LSM tree structure and value-log separation in architecture docs: README mentions 'log-structured merge' but provides no detailed explanation of how data flows through levels, compaction strategy, or when value separation occurs. Create docs/architecture/lsm-tree.md with diagrams.
- Implement missing backup/restore validation command: badger/cmd/validate.go should verify backup integrity (checksum validation, consistency checks, transaction boundary validation) before restore, improving reliability for large-scale deployments.
⭐Top contributors
Click to expand
Top contributors
- @renovate[bot] — 19 commits
- @ryanfoxtyler — 18 commits
- @dependabot[bot] — 16 commits
- @matthewmcneely — 9 commits
- @mangalaman93 — 9 commits
📝Recent commits
Click to expand
Recent commits
7427d56— perf: skip lsm lookup for expired entries during value log rewrite (#2269) (lamb007)773a835— fix: Prevent NPE on sync with inmemory DB (#2264) (alpe)abbb9a5— fix(cd): upload build artifacts to GitHub Release (#2273) (matthewmcneely)796cb85— chore(core): remove unused event log (#2257) (xqqp)3290a60— chore: Update changelog (#2256) (matthewmcneely)d3b6b86— fix(aix): add aix directory synchronization support (#2115) (pmur)24bb0c1— test: add checksum tests for package y (#2246) (miladev95)9460713— fix: correct the comment on value size in skl.node (#2250) (ahrtr)10068c8— chore(ci): update arm runner label (#2248) (matthewmcneely)a700dc3— chore: prepare for v4.9.0 release (#2247) (matthewmcneely)
🔒Security observations
BadgerDB demonstrates good security practices with a responsible disclosure policy and no obvious hardcoded secrets detected in the file structure. The codebase uses well-maintained dependencies with recent updates (Go 1.25.0, recent version pins). However, security posture is impacted by: (1) retracted versions without fully documented vulnerabilities, (2) incomplete security documentation for version support timelines, and (3) lack of visible dependency vulnerability scanning in the CI/CD pipeline (based on provided workflows). The key-value database nature of the project suggests encryption-at-rest and access control are in scope but not fully evaluated here. Recommend enhanced security documentation, regular dependency audits, and explicit code owner assignments for security-critical paths.
- Medium · Retracted Versions in Module —
go.mod (retract directives). The go.mod file contains retracted versions (v4.0.0 and v4.3.0) due to issues #1888, #1889, #2113, and #2121. While retraction is a security measure, users on these versions may be vulnerable to the underlying issues. The specific vulnerabilities are not documented in the provided context. Fix: Document the specific security issues for retracted versions in SECURITY.md or CHANGELOG.md. Ensure users understand the risks and upgrade paths. Consider publishing security advisories for v4.0.0 and v4.3.0. - Low · Incomplete Security Vulnerability Reporting Details —
SECURITY.md. The SECURITY.md file provides a responsible disclosure policy, but lacks specific details about supported versions for security patches, security update cadence, and known limitations (e.g., which versions are actively supported). Fix: Enhance SECURITY.md with: version support matrix, security patch timeline, acknowledgment process, and details on any known limitations in the key-value database implementation regarding encryption or access controls. - Low · Missing Dependency Pinning in Critical Path —
go.mod (golang.org/x/sys dependency). While go.mod uses semantic versioning, there are no pre-release pins for security-critical dependencies like golang.org/x/sys (v0.35.0), which handles OS-level operations and could impact the database's security posture. Fix: Implement regular dependency audits using 'go list -json -m all | nancy sleuth' or 'trivy' to detect vulnerable transitive dependencies. Consider pinning golang.org/x/sys to specific patch versions in CI/CD. - Low · No CODEOWNERS Enforcement Visible —
.github/CODEOWNERS. The .github/CODEOWNERS file exists but without visibility into its contents, it's unclear if critical security-related files (backup.go, db.go, encryption features) have mandatory reviewers assigned. Fix: Ensure that CODEOWNERS includes mandatory security-related files: backup.go, db.go, any encryption-related modules, and cryptographic operations. Require approval from security-aware maintainers.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.