cozodb/cozo
A transactional, relational-graph-vector database that uses Datalog for query. The hippocampus for AI!
Healthy across all four use cases
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓15 active contributors
- ✓MPL-2.0 licensed
- ✓CI configured
Show all 6 evidence items →Show less
- ✓Tests present
- ⚠Stale — last commit 1y ago
- ⚠Concentrated ownership — top contributor handles 72% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/cozodb/cozo)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/cozodb/cozo on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: cozodb/cozo
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/cozodb/cozo shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across all four use cases
- 15 active contributors
- MPL-2.0 licensed
- CI configured
- Tests present
- ⚠ Stale — last commit 1y ago
- ⚠ Concentrated ownership — top contributor handles 72% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live cozodb/cozo
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/cozodb/cozo.
What it runs against: a local clone of cozodb/cozo — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in cozodb/cozo | Confirms the artifact applies here, not a fork |
| 2 | License is still MPL-2.0 | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 550 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of cozodb/cozo. If you don't
# have one yet, run these first:
#
# git clone https://github.com/cozodb/cozo.git
# cd cozo
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of cozodb/cozo and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "cozodb/cozo(\\.git)?\\b" \\
&& ok "origin remote is cozodb/cozo" \\
|| miss "origin remote is not cozodb/cozo (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(MPL-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"MPL-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is MPL-2.0" \\
|| miss "license drift — was MPL-2.0 at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 4. Critical files exist
test -f "cozo-core/src/lib.rs" \\
&& ok "cozo-core/src/lib.rs" \\
|| miss "missing critical file: cozo-core/src/lib.rs"
test -f "cozo-core/src/data/value.rs" \\
&& ok "cozo-core/src/data/value.rs" \\
|| miss "missing critical file: cozo-core/src/data/value.rs"
test -f "cozo-core/src/data/program.rs" \\
&& ok "cozo-core/src/data/program.rs" \\
|| miss "missing critical file: cozo-core/src/data/program.rs"
test -f "cozo-core/src/cozoscript.pest" \\
&& ok "cozo-core/src/cozoscript.pest" \\
|| miss "missing critical file: cozo-core/src/cozoscript.pest"
test -f "cozo-core/src/fixed_rule/mod.rs" \\
&& ok "cozo-core/src/fixed_rule/mod.rs" \\
|| miss "missing critical file: cozo-core/src/fixed_rule/mod.rs"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 550 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~520d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/cozodb/cozo"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
CozoDB is a transactional relational-graph-vector database that executes Datalog queries natively on a unified data model, enabling seamless integration of classical SQL, recursive graph traversal, and vector similarity search (HNSW, MinHash-LSH) in a single query engine. It's written primarily in Rust (2.2M LOC) and provides first-class support for AI workloads through vector indices and full-text search capabilities. Monorepo structure: cozo-core/ contains the Datalog engine with PEG grammar (cozoscript.pest), data structures (src/data/), and test suites; cozo-bin/ wraps it as a REPL/server; cozorocks/ is the RocksDB storage backend; language bindings live as sibling members (cozo-lib-python, cozo-lib-wasm, etc.). Build via Cargo workspace with unified Cargo.lock.
👥Who it's for
AI/ML engineers building knowledge-intensive applications who need to combine structured relational data, graph traversal, and vector embeddings without managing separate systems; backend developers adopting Datalog for complex recursive queries; and database researchers exploring logic programming at scale.
🌱Maturity & risk
Actively developed with v0.7 shipping MinHash-LSH and full-text search (v0.6 brought HNSW vectors). Published across 8+ language bindings (Node.js, Python, Java, Swift, Go, C, Clojure, WebAssembly) and available on npm, PyPI, and Maven Central, indicating production-ready stability. CI/CD present (GitHub Actions at .github/workflows/build.yml), though no explicit star count visible in provided data.
The codebase is heavily Rust-centric with critical dependencies on RocksDB (via cozorocks/ member) whose stability is not documented here; no explicit dependency audit or SBOM visible. Single primary maintainer risk is typical for emerging databases. The monorepo spans 11 Cargo members which increases coordination complexity. Breaking change velocity not visible from snippet, but Datalog semantics changes could impact users.
Active areas of work
v0.7 recently shipped with MinHash-LSH for near-duplicate detection, full JSON value support, and expanded full-text search. HNSW vector indexing from v0.6 is being stabilized and integrated deeper into Datalog semantics. Likely active work on query planner optimization and binding stability across platforms.
🚀Get running
git clone https://github.com/cozodb/cozo.git
cd cozo
cargo build --release
cargo run --bin cozo -- --help # cozo-bin is the entry point
Or run examples: cargo run --example run --manifest-path cozo-core-examples/Cargo.toml
Daily commands:
REPL: cargo run --bin cozo (enters interactive shell). Server mode: cargo run --bin cozo -- --server <port> (starts HTTP server with web UI). Programmatic: see cozo-core-examples/src/bin/run.rs for Rust API usage.
🗺️Map of the codebase
cozo-core/src/lib.rs— Main library entry point; exposes the database API and orchestrates query execution with the Datalog engine.cozo-core/src/data/value.rs— Core value type system for all Datalog tuples and expressions; fundamental to serialization and query semantics.cozo-core/src/data/program.rs— Represents parsed Datalog programs and rules; critical for understanding query structure and rule execution.cozo-core/src/cozoscript.pest— PEG grammar for Cozo's Datalog-based query language; defines all syntax rules for parsing.cozo-core/src/fixed_rule/mod.rs— Dispatch system for built-in algorithmic rules (BFS, Dijkstra, PageRank, etc.); bridges Datalog with graph algorithms.cozo-core/src/fts/mod.rs— Full-text search implementation with tokenization and indexing; integrates semantic search with relational queries.cozo-bin/src/main.rs— CLI and REPL entry point; demonstrates how to instantiate and interact with the database.
🛠️How to make changes
Add a new graph algorithm
- Create a new file in
cozo-core/src/fixed_rule/algos/(e.g.,my_algorithm.rs) with a function that takes graph edges and parameters, returns results as aRelation. (cozo-core/src/fixed_rule/algos/my_algorithm.rs) - Register the algorithm in the match statement inside
cozo-core/src/fixed_rule/mod.rs, mapping the rule name (e.g.,"my_algo") to your function. (cozo-core/src/fixed_rule/mod.rs) - Write tests in the module and document expected input schema (edge list with src/dst/weight columns) in comments. (
cozo-core/src/fixed_rule/algos/my_algorithm.rs)
Add a new built-in function or aggregation
- For scalar functions, add a variant to the enum or match arm in
cozo-core/src/data/functions.rs, implement evaluation logic by callingValuemethods. (cozo-core/src/data/functions.rs) - For aggregations, add a new case to
cozo-core/src/data/aggr.rswith init/step/finalize logic matching theAggregatortrait. (cozo-core/src/data/aggr.rs) - Update the grammar in
cozo-core/src/cozoscript.pestif the syntax is new (e.g., for special aggregate functions). (cozo-core/src/cozoscript.pest) - Add tests in
cozo-core/src/data/tests/functions.rsoraggrs.rsto verify evaluation with various input types. (cozo-core/src/data/tests/functions.rs)
Extend the query language with a new construct
- Define the new syntax rule in
cozo-core/src/cozoscript.pest(e.g., a new clause type or operator). (cozo-core/src/cozoscript.pest) - Add a corresponding variant to the AST in
cozo-core/src/data/program.rsto represent the parsed construct. (cozo-core/src/data/program.rs) - Implement compilation/execution logic in the query planner (typically within
lib.rsor a new evaluation module) to transform the new AST node into executable steps. (cozo-core/src/lib.rs) - Add integration tests in a new file under
cozo-core-examples/src/bin/or use the REPL incozo-bin/src/repl.rsfor manual testing. (cozo-core-examples/src/bin/run.rs)
Add a new tokenizer or FTS pipeline variant
- Create a new file in
cozo-core/src/fts/tokenizer/(e.g.,my_tokenizer.rs) implementing theTokenizertrait with atokenize()method. (cozo-core/src/fts/tokenizer/my_tokenizer.rs) - Register it in
cozo-core/src/fts/tokenizer/mod.rsas a public variant or factory function so it can be selected in FTS options. (cozo-core/src/fts/tokenizer/mod.rs) - Update
cozo-core/src/fts/ast.rsif new options or AST nodes are needed to configure the tokenizer. (cozo-core/src/fts/ast.rs) - Test the tokenizer with various languages and edge cases in a test module or integration test. (
cozo-core/src/fts/tokenizer/my_tokenizer.rs)
🔧Why these technologies
- Rust — Memory safety without GC; enables high-performance storage engine and FFI bindings across Python/Java/WASM/Swift/Node.js/C.
- Datalog with Relational-Graph-Vector model — Declarative query language suited for recursive graph traversal, aggregations, and fixpoint computation; naturally expresses join-heavy queries.
- RocksDB (embedded key-value store) — Durable, ACID-compliant transactional storage with efficient range queries; allows local persistence without external database.
🪤Traps & gotchas
No explicit version constraints in snippet: check cozo-core/Cargo.toml for MSRV (Minimum Supported Rust Version); likely requires recent nightly for HNSW/LSH features. RocksDB binding complexity: cozorocks/ may require C++ toolchain (CMake visible in file list); macOS/Windows builds may need additional setup. Datalog semantics are non-obvious: negation-as-failure and stratification rules aren't apparent from file names alone—study examples first. Web UI baked into binary: cozo-bin/src/index.html is embedded; modifying UI requires recompile. Language binding FFI: each binding (Python, Node.js, Swift) has its own calling convention; changes to core FFI boundary require coordinated updates across all bindings.
🏗️Architecture
💡Concepts to learn
- Datalog — Core query language of CozoDB; understanding Horn clauses, unification, negation-as-failure, and stratification is essential to write correct queries and contribute to the query planner
- HNSW (Hierarchical Navigable Small World) — Approximate nearest-neighbor graph structure for vector search integrated in v0.6; understanding its layer structure and search algorithm is required to optimize vector indices in CozoDB
- MinHash and Locality-Sensitive Hashing (LSH) — v0.7 feature for near-duplicate detection; understanding hash sketches and probability bounds is needed to use and tune this for document deduplication tasks
- Stratification in Logic Programming — CozoDB enforces stratification to avoid non-determinism with negation; understanding fixed-point iteration and stratified evaluation is critical to debug query semantics
- RocksDB (LSM Tree) — Underlying persistent storage engine used by
cozorocks/member; understanding log-structured merges and write amplification helps optimize storage layout and transaction handling - PEG (Parsing Expression Grammar) — Query syntax is defined via
cozoscript.pestusing pest parser combinator; understanding PEG semantics (ordered choice, not ambiguous) is needed to modify or extend the grammar - Full-Text Indexing (Inverted Index) — v0.7 introduces integrated full-text search; understanding term tokenization, stemming, and ranking is needed to index and query text fields alongside vectors
🔗Related repos
apache/arrow-datafusion— Alternative vectorized SQL query engine written in Rust; shares execution optimization goals but uses relational algebra instead of Datalogmeilisearch/meilisearch— Full-text and vector search engine; complements CozoDB's vector capabilities but lacks relational/graph features and Datalog supporttypesense/typesense— Open-source search/vector database; another alternative for embedding search but without the transactional guarantees and graph recursion of CozoDBSWI-Prolog/swipl-devel— Reference Datalog/logic programming implementation; educational comparison for understanding Datalog semantics and stratification rules that CozoDB enforcesqdrant/qdrant— Pure vector database with HNSW; can be integrated alongside CozoDB for dedicated vector workloads but lacks Datalog unification and graph traversal
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive integration tests for fixed_rule algorithms
The cozo-core/src/fixed_rule/algos directory contains 14 sophisticated graph algorithms (PageRank, Louvain, A*, Dijkstra, etc.) but there are no visible integration tests in cozo-core/src/data/tests/ that specifically validate these algorithms end-to-end with real Datalog queries. This is high-value because graph algorithms are core to Cozo's value proposition, and comprehensive tests would catch regressions early and serve as executable documentation.
- [ ] Create cozo-core/src/fixed_rule/tests/ directory
- [ ] Add integration test file for each algorithm category (e.g., tests/shortest_path.rs for Dijkstra/BFS variants, tests/centrality.rs for PageRank/degree_centrality)
- [ ] Write tests that execute actual Datalog queries using the fixed rules with known input graphs and validate output correctness
- [ ] Reference test cases from benchmarks (cozo-core/benches/wiki_pagerank.rs, etc.) as inspiration for realistic data
- [ ] Ensure tests cover edge cases (empty graphs, disconnected components, cycles) for each algorithm
Add GitHub Actions workflow for multi-platform WASM/Node.js builds
The workspace includes cozo-lib-wasm and cozo-lib-nodejs packages, but .github/workflows/build.yml likely focuses on core Rust builds. WASM and Node.js have platform-specific compilation quirks (wasm-pack versions, Node native addons) that should be validated on every PR. This prevents silent failures when contributors update dependencies or APIs that affect these high-visibility language bindings.
- [ ] Review current .github/workflows/build.yml to understand existing CI structure
- [ ] Create new .github/workflows/wasm-nodejs.yml targeting Node 18+ and wasm-pack latest
- [ ] Add build steps:
cd cozo-lib-wasm && wasm-pack buildandcd cozo-lib-nodejs && npm install && npm test - [ ] Include test runs for cozo-lib-wasm to validate WASM functionality in browser environment (e.g., using wasm-bindgen-test)
- [ ] Add badges to README.md for the new workflow status
Add Datalog query validation tests and examples for the grammar
cozo-core/src/cozoscript.pest defines the grammar, but there are no visible comprehensive tests validating valid/invalid query parsing in cozo-core/src/data/tests/. With a Datalog-based query language, contributors need clear test examples of what parses correctly vs. error cases. This reduces onboarding friction and prevents grammar regressions.
- [ ] Create cozo-core/src/data/tests/parsing.rs with a test module for grammar validation
- [ ] Add positive test cases covering: basic rules, aggregations, recursion, graph patterns, vector operations, and advanced features visible in cozo-core-examples/src/bin/
- [ ] Add negative test cases for common syntax errors with expected error messages
- [ ] Reference the grammar file comments and any documentation in README.md to extract test scenarios
- [ ] Ensure tests validate both AST structure and error recovery for partial/malformed queries
🌿Good first issues
- Add comprehensive documentation and examples for MinHash-LSH near-duplicate search in
cozo-core-examples/with a worked example showing how to index document vectors and query for similar items - Write integration tests for JSON value support (added in v0.7) in
cozo-core/src/data/tests/json.rs—currently minimal coverage; test nested objects, arrays, type coercion, and query filtering on JSON paths - Create a Python quickstart guide and runnable example in
cozo-lib-python/demonstrating a realistic AI use case (e.g., semantic search over embeddings + metadata joins), similar to the embedded web UI incozo-bin/src/index.html
⭐Top contributors
Click to expand
Top contributors
- @zh217 — 72 commits
- @creatorrr — 6 commits
- @github-actions[bot] — 4 commits
- @mateusvmv — 3 commits
- @liangxianzhe — 3 commits
📝Recent commits
Click to expand
Recent commits
481af05— Merge pull request #286 from preludeorg/fix-stored-prefix-join (zh217)d1c5355— Merge pull request #290 from cozo-community/main (zh217)faf89ef— fix: Fix regression in newrocks.rs (creatorrr)ff9a4fc— Fix stored relation prefix_join on key range (Keith Robertson)b0549df— Merge pull request #284 from cozo-community/f/newrocks (zh217)b16a552— Fix cozo-core/src/storage/newrocks.rs (creatorrr)115d76e— wip: Add support for rust-rocksdb storage engine (creatorrr)695d028— Merge pull request #282 from andrewbaxter/expose-ast (zh217)2b1f12d— More doc (Andrew Baxter)e7c56a5— Expose AST (Andrew Baxter)
🔒Security observations
Cozo demonstrates a moderately secure codebase architecture with some concerns. Strengths include: use of Rust (memory-safe), no obvious hardcoded credentials, and modular design. Key concerns are: potential injection risks in query processing, XSS risks in embedded web server, lack of explicit
- Medium · Potential SQL/Datalog Injection in Query Processing —
cozo-core/src/data/expr.rs, cozo-core/src/data/functions.rs, cozo-core/src/cozoscript.pest. The codebase implements a Datalog query engine (cozo-core) with parsing from user input (cozoscript.pest). While Datalog is safer than SQL, the presence of expression evaluation (expr.rs) and function execution (functions.rs) could allow injection attacks if user input is not properly validated before execution. Fix: Implement comprehensive input validation and sanitization for all user-supplied Datalog queries. Use parameterized query patterns where possible. Add fuzzing tests for expression parsing and evaluation. - Medium · Embedded Web Server with Potential XSS Risk —
cozo-bin/src/index.html, cozo-bin/src/server.rs. The cozo-bin server component contains an embedded index.html file. Without proper content security policies and output encoding, there could be XSS vulnerabilities when serving dynamic content or user-controlled data. Fix: Implement strict Content Security Policy (CSP) headers. Ensure all dynamic content is properly HTML-escaped. Validate and sanitize all data before rendering in the UI. Use template engines that auto-escape by default. - Medium · Insufficient Security Documentation —
Repository root, cozo-bin/src/security.txt. A security.txt file exists in cozo-bin/src/, but based on the file structure provided, there is no clear indication of documented security practices, threat models, or security guidelines in the main repository root or SECURITY.md file. Fix: Create a comprehensive SECURITY.md file at the repository root documenting: security policies, reporting procedures, known limitations, security best practices for users, and maintenance windows. - Low · Workspace with Multiple Language Bindings —
cozo-lib-python, cozo-lib-java, cozo-lib-nodejs, cozo-lib-swift, cozo-lib-wasm, cozo-lib-c. The workspace includes bindings for multiple languages (Python, Java, JavaScript/Node, Swift, WASM, C). Each binding introduces potential security risks if not properly maintained, as vulnerabilities in one binding could affect multiple platforms. Fix: Establish a security maintenance schedule for all language bindings. Keep FFI (Foreign Function Interface) layers secure. Regularly audit each binding for platform-specific vulnerabilities. Use automated dependency scanning for each language ecosystem. - Low · CSV/JSON Data Import Without Input Validation —
cozo-core/src/fixed_rule/utilities/csv.rs, cozo-core/src/fixed_rule/utilities/jlines.rs. The fixed_rule utilities include CSV and JSON parsing (csv.rs, jlines.rs). Without proper validation of imported data, this could lead to deserialization attacks or unexpected behavior when processing malformed or malicious data files. Fix: Implement strict input validation and size limits for imported data. Use safe parsing libraries with known security track records. Add file size validation and parsing timeout mechanisms. - Low · Benchmarking Code in Repository —
cozo-core/benches/. Benchmark files exist in the codebase (benches/). While benchmarks are useful, if they process untrusted data or contain example data, they should not be accessible in production environments. Fix: Ensure benchmark code is excluded from production builds. Use feature flags to conditionally compile benchmark code. Never distribute benchmark binaries to end users. - Low · Git Submodules Used (.gitmodules) —
.gitmodules. The presence of .gitmodules indicates git submodules are used. Submodules can introduce supply chain risks if not pinned to specific verified commits and regularly audited. Fix: Document all submodules and their purposes. Pin submodules to specific commit hashes rather than branches. Regularly audit submodule sources for security. Implement automated checks in CI/CD to validate submodule integrity.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.