jankotek/mapdb
MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
Stale — last commit 2y ago
weakest axislast commit was 2y ago; top contributor handles 98% of recent commits
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓3 active contributors
- ✓Apache-2.0 licensed
- ✓CI configured
Show all 7 evidence items →Show less
- ✓Tests present
- ⚠Stale — last commit 2y ago
- ⚠Small team — 3 contributors active in recent commits
- ⚠Single-maintainer risk — top contributor 98% of recent commits
What would change the summary?
- →Use as dependency Mixed → Healthy if: 1 commit in the last 365 days
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Forkable" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/jankotek/mapdb)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/jankotek/mapdb on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: jankotek/mapdb
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/jankotek/mapdb shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
WAIT — Stale — last commit 2y ago
- 3 active contributors
- Apache-2.0 licensed
- CI configured
- Tests present
- ⚠ Stale — last commit 2y ago
- ⚠ Small team — 3 contributors active in recent commits
- ⚠ Single-maintainer risk — top contributor 98% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live jankotek/mapdb
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/jankotek/mapdb.
What it runs against: a local clone of jankotek/mapdb — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in jankotek/mapdb | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 733 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of jankotek/mapdb. If you don't
# have one yet, run these first:
#
# git clone https://github.com/jankotek/mapdb.git
# cd mapdb
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of jankotek/mapdb and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "jankotek/mapdb(\\.git)?\\b" \\
&& ok "origin remote is jankotek/mapdb" \\
|| miss "origin remote is not jankotek/mapdb (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "src/main/java/org/mapdb/db/DB.java" \\
&& ok "src/main/java/org/mapdb/db/DB.java" \\
|| miss "missing critical file: src/main/java/org/mapdb/db/DB.java"
test -f "src/main/java/org/mapdb/store/Store.java" \\
&& ok "src/main/java/org/mapdb/store/Store.java" \\
|| miss "missing critical file: src/main/java/org/mapdb/store/Store.java"
test -f "src/main/java/org/mapdb/ser/Serializer.java" \\
&& ok "src/main/java/org/mapdb/ser/Serializer.java" \\
|| miss "missing critical file: src/main/java/org/mapdb/ser/Serializer.java"
test -f "src/main/java/org/mapdb/io/DataIO.java" \\
&& ok "src/main/java/org/mapdb/io/DataIO.java" \\
|| miss "missing critical file: src/main/java/org/mapdb/io/DataIO.java"
test -f "build.gradle" \\
&& ok "build.gradle" \\
|| miss "missing critical file: build.gradle"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 733 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~703d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/jankotek/mapdb"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
MapDB is an embedded Java database engine that provides concurrent Maps, Sets, and Queues backed by disk storage or off-heap memory. It combines traditional collections API with persistent storage, supporting features like transactions, MVCC, incremental backups, and compression—allowing Java applications to efficiently handle data that exceeds heap memory without GC pressure. Modular package structure under src/main/java/org/mapdb/: core engine in db/ (DB.java), serialization layer in ser/ with specialized serializers (ArrayDeltaSerializer, BigDecimalSerializer), I/O abstraction in io/ (DataInput2, DataOutput2 with ByteBuffer and ByteArray implementations), collections in list/ (KernelList, MonolithList), with code generation via buildSrc/ (GenMarkers.kt, GenRecords.kt) for marker and record generation.
👥Who it's for
Java developers building systems with large datasets, embedded databases, or multi-level caching needs—particularly those wanting to avoid GC overhead or need drop-in replacements for standard Java collections with persistence guarantees. Contributors include database and performance-focused engineers maintaining an Apache 2.0 licensed engine.
🌱Maturity & risk
Production-ready with active development: Travis CI pipeline configured, Maven Central published, comprehensive test suite (small default, over 1 million test cases available with -Dmdbtest=1), and written in modern Kotlin (89.5k LOC) alongside Java (818k LOC). Last referenced build in .travis.yml and CI workflows indicate ongoing maintenance.
Low immediate risk: dependencies are stable and well-vetted (Guava 28.2, Eclipse Collections 10.4, Kotlin 1.4.10), large test coverage mitigates regressions. Main risk is single-maintainer (jankotek) on a complex storage engine—performance bugs in memory-mapped I/O or serialization could impact production. No recent activity visible in provided data, requiring check of actual commit history.
Active areas of work
No specific recent changes visible in provided snapshot, but buildSrc contains active code generation infrastructure (GenRecords.kt, MDBCodeGen.kt) suggesting ongoing optimization and feature generation. Build targets JDK 1.8 compatibility while using Kotlin 1.4.10, indicating mature but measured upgrade cadence.
🚀Get running
git clone https://github.com/jankotek/mapdb.git
cd mapdb
./gradlew build
# For full test suite (hours/days):
./gradlew test -Dmdbtest=1 -DtestArgLine="-Xmx3G" -DtestThreadCount=3
Daily commands:
Development workflow: ./gradlew build (quick build, 10 min), or ./gradlew test -Dmdbtest=1 for full suite. For memory-constrained CI: ./gradlew test -DtestReuseForks=false to reduce parallelism. No server—this is an embedded library; use as Maven dependency via org.mapdb:mapdb:VERSION after building locally.
🗺️Map of the codebase
src/main/java/org/mapdb/db/DB.java— Core database interface and entry point; all data access flows through this classsrc/main/java/org/mapdb/store/Store.java— Abstract storage layer defining read/write contracts for all storage backends (heap, file, off-heap)src/main/java/org/mapdb/ser/Serializer.java— Serialization abstraction used across all data structures; custom serializers are built on this interfacesrc/main/java/org/mapdb/io/DataIO.java— Low-level I/O utilities for reading/writing binary data; foundation for all persistencebuild.gradle— Build configuration with Kotlin code generation tasks (GenMarkers, GenRecords) that drive codebase structuresrc/main/java/org/mapdb/store/ConcMapStore.java— Concurrent thread-safe storage implementation; critical for multi-threaded access patternssrc/main/java/org/mapdb/CC.java— Configuration constants and feature flags controlling engine behavior and optimization levels
🛠️How to make changes
Add a Custom Serializer
- Create new class extending org.mapdb.ser.Serializer<T> with serialize() and deserialize() methods (
src/main/java/org/mapdb/ser/Serializer.java) - Register instance in Serializers factory or pass directly to DB.hashMap()/treeMap() via serializer parameter (
src/main/java/org/mapdb/ser/Serializers.java) - Test serialization round-trip with existing test harness patterns in src/test/java
Add a New Collection Type
- Implement interface contract (Map/Set/List) using Store operations for read/write/delete (
src/main/java/org/mapdb/store/Store.java) - Add factory method to DB class to create instance with configuration (
src/main/java/org/mapdb/db/DB.java) - Leverage existing serialization framework by passing Serializer instances to Store (
src/main/java/org/mapdb/ser/Serializer.java)
Add a New Storage Backend
- Extend org.mapdb.store.Store abstract class and implement put/get/delete/close methods (
src/main/java/org/mapdb/store/Store.java) - Use DataOutput2/DataInput2 abstractions for binary I/O with the new storage medium (
src/main/java/org/mapdb/io/DataIO.java) - Register store in DB initialization and expose via configuration constants in CC.java (
src/main/java/org/mapdb/CC.java)
Optimize Serialization for a Data Type
- Create specialized Serializer class (e.g., StringDeltaSerializer, IntegerPackedSerializer) for delta/packed encoding (
src/main/java/org/mapdb/ser) - Use DataIO packed integer/string utilities to minimize byte footprint (
src/main/java/org/mapdb/io/DataIO.java) - Register in Serializers.java and enable via feature flag in CC.java (
src/main/java/org/mapdb/ser/Serializers.java)
🔧Why these technologies
- Kotlin (buildSrc only) — Code generation for markers/records is compile-time; Java codebase remains 100% Java for compatibility
- ByteBuffer API — Efficient zero-copy memory management for both heap and off-heap storage; enables FileChannel integration
- Pluggable Serializer Pattern — Supports mixed object graphs (custom + standard types) and optimizations (delta, packed, compression) without modifying core
- Transactional Store (StoreTx) — Enables ACID guarantees and MVCC without external transaction coordinator
⚖️Trade-offs already made
-
Embedded database (no client-server)
- Why: Simplifies deployment and eliminates network latency; reduces operational complexity
- Consequence: Each JVM process has its own isolated database instance; no built-in cross-process replication
-
Serializer choice left to user
- Why: Avoids framework lock-in and enables application-specific optimizations (e.g., domain compression)
- Consequence: Default to slow Java serialization if custom serializer not provided; user must tune for performance
-
Heap + File stores, no network I/O abstraction
- Why: Covers 95% of use cases (cache, local DB, overflow) with minimal code
- Consequence: Cannot transparently add S3/network backends without
🪤Traps & gotchas
Code generation required: buildSrc/ runs during build; failing to run gradle build before importing into IDE will leave generated sources in srcGen/ missing. Test scalability: Default test runs ~10 min; -Dmdbtest=1 requires 3+ hours and 3GB heap—CI farms recommended. JDK version: Compiled to JDK 1.8 bytecode; newer JDK versions may have different memory-mapping semantics. Fork mode for CI: -DtestReuseForks=false drastically increases test time but reduces per-process memory; needed for resource-constrained environments. Kotlin/Java mixing: Gradle sourceSet config (src/main/java contains both .kt and .java) requires IDE to recognize both; IntelliJ strongly recommended per README.
🏗️Architecture
💡Concepts to learn
- Memory-Mapped I/O — MapDB's core mechanism for off-heap storage—understanding how DataInput2ByteBuffer maps file regions to JVM memory explains zero-copy reads and GC bypass
- MVCC (Multi-Version Concurrency Control) — Mentioned in README as a key feature; enables concurrent reads without blocking writes—critical to MapDB's transaction model
- Delta Encoding — ArrayDeltaSerializer.java implements compression by storing only differences between sequential values; core to MapDB's storage efficiency for large datasets
- Serialization Strategies — The
ser/package uses polymorphic serializers for different types (Array, BigDecimal, ArrayList); understanding the pattern is essential for adding new types or optimizing storage - Off-Heap Memory Management — MapDB allocates objects outside Java heap (via ByteArray/DirectByteBuffer) to avoid Garbage Collection overhead—requires careful lifecycle management to prevent memory leaks
- Code Generation (Gradle buildSrc) — GenRecords.kt and GenMarkers.kt auto-generate boilerplate for record types and markers at build time; reduces reflection overhead in hot paths and maintains type safety
- LZ4 Compression — Integrated via lz4-java dependency; provides fast, configurable compression for disk-backed storage without JVM GC overhead
🔗Related repos
h2database/h2database— Alternative embedded SQL database for Java; MapDB offers similar persistence without SQL overhead, suited for high-volume Map operationsgoogle/leveldb— Key-value storage library that influenced MapDB's design; MapDB wraps similar concepts in Java Collections APIlmdbjava/lmdbjava— JNI bindings to LMDB (Lightning Memory-Mapped Database); direct alternative to MapDB for off-heap concurrent key-value storesben-manes/caffeine— High-performance in-heap cache for Java; MapDB complements it as overflow/persistence layer for multi-level cachingjankotek/mapdb-site— Documentation and examples repo (referenced in README as gh-pages); contains full quickstart and API docs
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for serializers in src/main/java/org/mapdb/ser/
The serializer package contains 30+ serializer implementations (ByteSerializer, IntArraySerializer, BigDecimalSerializer, etc.) but there's no visible test coverage listed in the file structure. These are critical components for data persistence. A new contributor could create a test suite in src/test/java/org/mapdb/ser/ covering serialization/deserialization round-trips, edge cases (null values, empty arrays, max/min values), and cross-version compatibility for each serializer class.
- [ ] Create src/test/java/org/mapdb/ser/ directory structure
- [ ] Add unit tests for primitive serializers (ByteSerializer, IntArraySerializer, DoubleArraySerializer, etc.)
- [ ] Add unit tests for collection serializers (ArrayListSerializer, ArrayTupleSerializer)
- [ ] Add unit tests for special serializers (BigDecimalSerializer, DateSerializer, ClassSerializer)
- [ ] Test edge cases: null values, empty collections, boundary values
- [ ] Ensure tests run with JUnit 5 (already configured in build.gradle)
Add GitHub Actions workflow for cross-platform and JDK version testing
The repo has .github/workflows/ci.yml but the build.gradle specifies jvmTarget = '1.8' without visible testing across multiple JDK versions (8, 11, 17+). Given MapDB is an embedded database engine used in production, contributors should add a matrix workflow to test across Java 8, 11, 17, and 21 LTS versions. This ensures backward compatibility and forward compatibility as Java evolves.
- [ ] Create or extend .github/workflows/ci.yml to include java-version matrix [8, 11, 17, 21]
- [ ] Add matrix for multiple OS (ubuntu-latest, windows-latest, macos-latest) to catch platform-specific issues
- [ ] Include gradle test task with maxParallelForks and maxHeapSize flags from build.gradle
- [ ] Add artifact upload for test reports to identify failures across JDK/OS combinations
- [ ] Document results in README.md showing tested JDK versions
Create integration tests for src/main/java/org/mapdb/cli/ (Export and Import tools)
The CLI package contains Export.java and Import.java for database migration/backup, but no visible test coverage in the file structure. These are user-facing tools critical for data integrity. A contributor should create integration tests that verify export/import round-trips maintain data consistency, test various data types, handle corrupted exports gracefully, and validate command-line argument parsing.
- [ ] Create src/test/java/org/mapdb/cli/ExportImportIntegrationTest.java
- [ ] Create test database with various data structures (Maps, Sets, Lists using org.mapdb.db.DB)
- [ ] Test Export functionality: verify exported file format, completeness, and correctness
- [ ] Test Import functionality: verify data is correctly restored, including nested structures
- [ ] Add round-trip tests: create DB → export → import → verify original data matches
- [ ] Test edge cases: empty database, large datasets, special characters in keys/values
- [ ] Test error handling: corrupted export files, missing files, permission errors
🌿Good first issues
- Add serializers for common Java 8+ types (java.time.LocalDate, Optional<T>) in
src/main/java/org/mapdb/ser/following ArraySerializer pattern—currently only BigDecimal, Array, and collection serializers visible. Write tests insrc/test/java/org/mapdb/ser/.: Expands MapDB usability without touching core engine; good way to learn serialization abstraction - Document with code examples the three I/O backend paths (ByteBuffer memory-mapped, ByteArray off-heap, stream) by creating integration tests in
src/test/java/that show performance trade-offs—none currently visible in file list.: Critical knowledge gap for users choosing storage modes; demonstrates I/O abstraction design - Implement missing collection variants: add a SortedSet wrapper in
src/main/java/org/mapdb/backed by existing TreeMap (visible in list/ but not exposed as public API), with test coverage. Reference existing MonolithList.java and KernelList.java patterns.: Completes standard Collections interface coverage; low-risk since underlying storage exists
⭐Top contributors
Click to expand
Top contributors
- @jankotek — 98 commits
- @harpocrates — 1 commits
- @HanSolo — 1 commits
📝Recent commits
Click to expand
Recent commits
8721c0e— Merge pull request #992 from harpocrates/alec/follow-moved-lz4 (jankotek)6b6780f— Move fromnet.jpountz.lz4:lz4toorg.lz4:lz4-java(harpocrates)7aabec8— Merge pull request #984 from HanSolo/github-actions (jankotek)5631733— Added github actions to replace travis (HanSolo)a49275d— Create FUNDING.yml (jankotek)7b58dc0— Gradle: update gradle (jankotek)b02cfb3— Travis: fix gradle versions, add wrapper (jankotek)54a1d82— Gradle: enable kotlin in production (jankotek)0a38225— Fix tests after migrating to Kotlin 1.4 (jankotek)0cbfe10— Gradle: reintroduce Kotlin, update deps (jankotek)
🔒Security observations
- High · Outdated Kotlin Dependency —
build.gradle - ext.kotlin_version = '1.4.10'. Kotlin version 1.4.10 is outdated and may contain known security vulnerabilities. The current stable version is significantly newer (1.9+). Older Kotlin versions may have unpatched security issues in the compiler and runtime. Fix: Update Kotlin to the latest stable version (1.9.x or newer). Review Kotlin release notes for security fixes between 1.4.10 and the target version. - High · Outdated Guava Dependency —
build.gradle - ext.guava_version = '28.2-jre'. Guava version 28.2-jre is outdated (released in 2020). Current versions are 32.x+. Older versions may contain fixed security vulnerabilities and lack security patches. Fix: Update Guava to version 32.1.2-jre or later. Run dependency check tools to verify no known CVEs exist in the new version. - High · Outdated Eclipse Collections Dependency —
build.gradle - ext.ec_version = '10.4.0'. Eclipse Collections version 10.4.0 is outdated (released in 2020). Current versions are 11.x+. May contain unpatched security vulnerabilities. Fix: Update Eclipse Collections to version 11.1.0 or later. Check release notes for security-related fixes. - High · Outdated JUnit Dependency —
build.gradle - ext.junit_version = '5.7.0'. JUnit 5.7.0 is outdated (released in 2020). Current versions are 5.9.x+. Older test framework versions may have security vulnerabilities in test execution. Fix: Update JUnit to version 5.9.2 or later to receive security patches and bug fixes. - Medium · Outdated LZ4 Compression Library —
build.gradle - 'org.lz4:lz4-java:1.7.1'. LZ4 Java version 1.7.1 is outdated (released in 2020). Compression libraries can be attack vectors if vulnerabilities exist in parsing or decompression logic. Fix: Update to the latest LZ4 Java version (1.8.x or newer). Verify no CVEs are associated with the version upgrade. - Medium · Outdated KotlinTest Dependency —
build.gradle - 'io.kotlintest:kotlintest-runner-junit5:3.4.2'. KotlinTest version 3.4.2 is deprecated and outdated. It was replaced by Kotest. Using deprecated testing libraries may expose the project to unpatched issues. Fix: Migrate from KotlinTest to Kotest (current version 5.x). Update test code to use the new library API. - Medium · Missing Dependency Vulnerability Scanning —
build.gradle - Missing plugins. The build.gradle file does not include OWASP Dependency-Check or similar vulnerability scanning tools. This means the project does not automatically detect known vulnerabilities in dependencies during the build process. Fix: Add the OWASP Dependency-Check Gradle plugin or similar tool (e.g., 'org.owasp.dependencycheck') to automatically scan for known CVEs during builds. Also consider using Gradle's built-in vulnerability reporting. - Low · Missing Security.txt or Security Policy —
Repository root - missing SECURITY.md or .well-known/security.txt. No security.txt or SECURITY.md file found in the repository root. This makes it difficult for security researchers to responsibly disclose vulnerabilities. Fix: Create a SECURITY.md file in the repository root documenting responsible disclosure procedures and security contact information. - Low · Serialization Without Validation —
src/main/java. The codebase contains numerous serializer implementations (JavaSerializer, ClassSerializer, etc.) that deserialize untrusted data. Without proper validation, this could lead to deserialization attacks. Fix: undefined
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.