RepoPilotOpen in app →

HelixDB/helix-db

HelixDB is an open-source graph-vector database built from scratch in Rust.

Mixed

Single-maintainer risk — review before adopting

weakest axis
Use as dependencyConcerns

copyleft license (AGPL-3.0) — review compatibility; top contributor handles 93% of recent commits

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit today
  • 4 active contributors
  • AGPL-3.0 licensed
Show all 8 evidence items →
  • CI configured
  • Tests present
  • Small team — 4 contributors active in recent commits
  • Single-maintainer risk — top contributor 93% of recent commits
  • AGPL-3.0 is copyleft — check downstream compatibility
What would change the summary?
  • Use as dependency ConcernsMixed if: relicense under MIT/Apache-2.0 (rare for established libs)

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Forkable
[![RepoPilot: Forkable](https://repopilot.app/api/badge/helixdb/helix-db?axis=fork)](https://repopilot.app/r/helixdb/helix-db)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/helixdb/helix-db on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: HelixDB/helix-db

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/HelixDB/helix-db shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Single-maintainer risk — review before adopting

  • Last commit today
  • 4 active contributors
  • AGPL-3.0 licensed
  • CI configured
  • Tests present
  • ⚠ Small team — 4 contributors active in recent commits
  • ⚠ Single-maintainer risk — top contributor 93% of recent commits
  • ⚠ AGPL-3.0 is copyleft — check downstream compatibility

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live HelixDB/helix-db repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/HelixDB/helix-db.

What it runs against: a local clone of HelixDB/helix-db — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in HelixDB/helix-db | Confirms the artifact applies here, not a fork | | 2 | License is still AGPL-3.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>HelixDB/helix-db</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of HelixDB/helix-db. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/HelixDB/helix-db.git
#   cd helix-db
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of HelixDB/helix-db and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "HelixDB/helix-db(\\.git)?\\b" \\
  && ok "origin remote is HelixDB/helix-db" \\
  || miss "origin remote is not HelixDB/helix-db (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(AGPL-3\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"AGPL-3\\.0\"" package.json 2>/dev/null) \\
  && ok "license is AGPL-3.0" \\
  || miss "license drift — was AGPL-3.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "helix-db/src/helix_engine" \\
  && ok "helix-db/src/helix_engine" \\
  || miss "missing critical file: helix-db/src/helix_engine"
test -f "helix-db/src/grammar.pest" \\
  && ok "helix-db/src/grammar.pest" \\
  || miss "missing critical file: helix-db/src/grammar.pest"
test -f "helix-db/Cargo.toml" \\
  && ok "helix-db/Cargo.toml" \\
  || miss "missing critical file: helix-db/Cargo.toml"
test -f "helix-cli/src/main.rs" \\
  && ok "helix-cli/src/main.rs" \\
  || miss "missing critical file: helix-cli/src/main.rs"
test -f "helix-cli/src/commands/mod.rs" \\
  && ok "helix-cli/src/commands/mod.rs" \\
  || miss "missing critical file: helix-cli/src/commands/mod.rs"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/HelixDB/helix-db"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

HelixDB is a graph-vector database built from scratch in Rust that unifies graph, vector, KV, document, and relational data models in a single platform. It eliminates the need for separate databases by providing native support for AI/RAG workloads through combined graph and vector embeddings, enabling semantic search and knowledge graph operations without external storage layers. Monorepo with workspace members: helix-db (core engine), helix-container (containerization), helix-macros (procedural macros), helix-cli (command-line tooling), plus hql-tests and metrics subprojects. The CLI under helix-cli/src/commands/ has modular command implementations (add, auth, backup, build, check, compile, dashboard, delete, feedback, init) with integration subcommands for Docker Hub, ECR, Fly, and GHCR.

👥Who it's for

AI/ML engineers and backend developers building RAG systems, knowledge graphs, and intelligent applications who need unified storage for both semantic vectors and relational/graph data without maintaining multiple database systems.

🌱Maturity & risk

Actively developed and gaining traction (YC S25 launch visible in README), with comprehensive CI/CD setup across 16+ GitHub workflows for testing (cli_tests, db_tests, hql_tests, production_db_tests). The project has a CLI, container support, and enterprise features in progress, but ecosystem completeness and long-term stability remain to be proven.

Core risk is that this is a greenfield Rust database project (3.9M LOC, built from scratch) with significant architectural complexity; single misstep in the query planner or indexing can cause data loss or corruption. The monorepo structure requires all members to compile together, and the presence of both Haxe (444K LOC) and Rust suggests potential language/migration complexity. Watch for breaking changes in the graph/vector APIs as the project stabilizes.

Active areas of work

Active feature development visible through multiple specialized workflows: dashboard support (dashboard_check.yml), HQL (Helix Query Language) testing, S3 integration/backups, and enterprise CLI testing (ENTERPRISE_CLI_TEST_PLAN.md in helix-cli). The recent addition of integrations/ subcommands and the prod_instance_tests workflow suggest push toward managed hosting and cloud deployments.

🚀Get running

git clone https://github.com/HelixDB/helix-db.git
cd helix-db
cargo build --release
./helix-cli/build.sh  # Build the CLI if working on tooling
cargo test            # Run the test suite

Daily commands: For the database server: likely via cargo run --release in the helix-db directory (exact entry point unclear from file list). For CLI: ./helix-cli/build.sh then ./helix-cli/target/release/helix [command]. See helix-cli/README.md and helix-cli/TESTING.md for detailed CLI dev setup.

🗺️Map of the codebase

  • helix-db/src/helix_engine — Core database engine directory containing the primary graph-vector query processing and execution logic that all features depend on
  • helix-db/src/grammar.pest — PEG grammar definition for HQL (Helix Query Language) parser—defines the syntax all queries must conform to
  • helix-db/Cargo.toml — Core database crate manifest specifying all Rust dependencies and compilation profiles for the graph-vector engine
  • helix-cli/src/main.rs — CLI entry point orchestrating all user-facing commands (init, build, deploy, logs) that interact with the database
  • helix-cli/src/commands/mod.rs — Command routing and registration module defining the CLI's available operations and their dispatch logic
  • helix-container/src/main.rs — Container runtime entry point that initializes and runs HelixDB as a standalone service with network exposure
  • Cargo.toml — Workspace-level manifest declaring all member crates (helix-db, helix-cli, helix-container, etc.) and shared build profiles

🛠️How to make changes

Add a new HQL query operation

  1. Define the new grammar rule in the PEG syntax for the operation (helix-db/src/grammar.pest)
  2. Implement the operation handler in the helix_engine module with execution logic (helix-db/src/helix_engine)
  3. Add unit tests validating the parsing and execution in a test module (helix-db/src/helix_engine (new test file))
  4. Write an HQL integration test in the hql-tests workspace member (hql-tests (new test file))

Add a new CLI command

  1. Create a new command handler module following the pattern of existing commands (helix-cli/src/commands/new_command.rs)
  2. Register the command in the command module router (helix-cli/src/commands/mod.rs)
  3. Add the command argument parser following the project's clap configuration (helix-cli/src/main.rs)
  4. Write integration tests in the tests directory (helix-cli/src/tests/utility_tests.rs)
  5. Add CI workflow validation if the command has external dependencies (.github/workflows/cli_tests.yml)

Add a new database subsystem or index type

  1. Create a new module directory under helix_engine following the bm25 pattern (helix-db/src/helix_engine/new_subsystem/mod.rs)
  2. Implement the core algorithm in a primary file (e.g., algorithm.rs) (helix-db/src/helix_engine/new_subsystem/algorithm.rs)
  3. Add test cases in a *_tests.rs file in the same module (helix-db/src/helix_engine/new_subsystem/new_subsystem_tests.rs)
  4. Add performance benchmarks in the benches directory (helix-db/benches/new_subsystem_benches.rs)
  5. Add the subsystem to the grammar if it exposes new HQL syntax (helix-db/src/grammar.pest)

Add a new integration provider or deployment target

  1. Create a new integration module following existing patterns (docker_hub, ecr, fly, ghcr) (helix-cli/src/commands/integrations/new_provider.rs)
  2. Register the integration in the integrations module router (helix-cli/src/commands/integrations/mod.rs)
  3. Implement authentication and deployment logic with error handling (helix-cli/src/commands/integrations/new_provider.rs)
  4. Add integration tests with mock API responses (helix-cli/src/tests/utility_tests.rs)
  5. Add CI workflow if external services are needed (.github/workflows/cli_tests.yml)

🔧Why these technologies

  • Rust — Memory-safe systems language enabling high-performance graph-vector operations without garbage collection; critical for sub-100ms latency on large datasets
  • PEG (Parsing Expression Grammar) via pest — Declarative, readable grammar definition for HQL language that serves as single source of truth for query syntax and enables easy language evolution
  • Docker & Container Runtime — Enables reproducible, isolated database deployments across heterogeneous infrastructure; critical for enterprise adoption and DevOps workflows
  • BM25 + Vector Indexing (HNSW implied) — Hybrid search combining full-text relevance ranking with approximate nearest neighbor vector search; addresses both keyword and semantic queries simultaneously
  • CLI-first tooling (via clap) — Provides accessible entry point for database initialization, schema compilation, and lifecycle management without requiring GUI; supports infrastructure-as-code workflows

⚖️Trade-offs already made

  • Monorepo workspace structure with separate crates for db, cli, and container
    • Why: Enables shared version management, unified testing, and tight coupling needed for database + tooling coherence
    • Consequence: Larger repository surface area and potential for circular dependency pressure; requires careful module boundaries between helix-db

🪤Traps & gotchas

No explicit .env.example: environment variables for auth, S3, container registries likely required but not documented in file list—check helix-cli/README.md and TESTING.md. Monorepo coupling: changing helix-macros forces rebuild of helix-db and helix-cli due to workspace resolver = "2"; expect long compile times. Haxe presence (444K LOC): unclear purpose and may indicate legacy code or DSL transpilation; ask maintainers before touching. LTO + single codegen-unit in release: release builds are optimized but slow (intentional); dev mode uses incremental=true for speed. HQL syntax: no grammar file visible; likely defined in helix-db or docs, not in repo—check docs.helix-db.com before implementing queries.

🏗️Architecture

💡Concepts to learn

  • Graph-Vector Index — Core differentiator of HelixDB; you must understand how graph traversal and vector similarity search are combined for semantic queries.
  • Vector Embeddings & Semantic Search — Enables RAG systems to find contextually relevant data; critical to HelixDB's AI/ML use case.
  • Procedural Macros (Rust) — HelixDB uses helix-macros for compile-time schema definition and ORM codegen; understanding derive macros is essential for extending the ORM.
  • Memory-Mapped I/O & Lock-Free Data Structures — Rust database built from scratch likely uses mmap and atomic operations for performance; critical for understanding data durability and concurrency.
  • LSM Trees / B-Trees for Storage Engines — HelixDB's underlying storage layer for indexing and persistence; knowing these structures helps predict performance characteristics.
  • HQL (Helix Query Language) — Domain-specific language for HelixDB queries; must learn syntax and semantics to write tests and optimize query planning.
  • Container Registry Integration & Image Deployment — helix-cli integrations with Docker Hub, ECR, Fly, GHCR are primary deployment paths; understanding container publishing is essential for ops features.
  • milvus-io/milvus — Open-source vector database; direct competitor solving the vector search piece (HelixDB unifies graph + vector).
  • neo4j/neo4j — Leading graph database; HelixDB combines its graph model with vector search, reducing the need for dual databases.
  • qdrant/qdrant — Rust-native vector database with similar performance-focused design; architectural inspiration for HelixDB's vector layer.
  • apache/cassandra — Distributed KV/document store; HelixDB supports KV and document models, sharing similar partitioning/consistency concerns.
  • HelixDB/helix-docs — Official documentation repo (inferred); necessary for understanding HQL syntax, API design, and deployment guides.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for helix-cli commands with Docker/container workflows

The repo has extensive CLI commands (auth, backup, build, dashboard, migrate, push, sync, etc.) in helix-cli/src/commands/ but no dedicated integration test suite that validates end-to-end CLI workflows. ENTERPRISE_CLI_TEST_PLAN.md exists but testing infrastructure is incomplete. This would catch regressions in critical user-facing commands and leverage the existing docker.rs module.

  • [ ] Create helix-cli/tests/integration/ directory with test modules for each command category
  • [ ] Add integration tests for workspace_flow.rs commands (init → build → push → sync cycle)
  • [ ] Create helix-cli/tests/fixtures/ with sample HQL files and test data
  • [ ] Add GitHub Action workflow (.github/workflows/cli_integration_tests.yml) that runs against a containerized HelixDB instance
  • [ ] Document test setup in helix-cli/TESTING.md with reproduction steps

Add helix-db library API documentation and examples module

The workspace includes helix-db core library but README and docs only focus on CLI/dashboard usage. There's no helix-db/examples/ directory with sample code showing how to use HelixDB as an embedded library. This blocks new contributors from understanding programmatic APIs and reduces adoption for library users.

  • [ ] Create helix-db/examples/ directory with 3-5 focused examples (e.g., basic_connection.rs, vector_insert_query.rs, graph_traversal.rs)
  • [ ] Add doc comments to key public structs/functions in helix-db/src/lib.rs with /// examples blocks
  • [ ] Create helix-db/LIBRARY_USAGE.md with setup instructions, architecture overview, and links to examples
  • [ ] Add example build/run instructions to main README.md under new 'As a Library' section
  • [ ] Verify examples compile and run in CI (add to existing clippy_check.sh or create new workflow)

Add production database benchmarking workflow and baseline metrics

The repo has production_db_tests.yml workflow and a metrics/ crate in workspace, but no documented performance benchmarks or baseline comparisons. New contributors don't know performance expectations and can't detect regressions. This is critical for a database project competing on performance.

  • [ ] Create metrics/benches/ directory with criterion.rs benchmarks for key operations (vector insert/query, graph traversal, HQL execution)
  • [ ] Add GitHub Action workflow (.github/workflows/benchmarks.yml) that runs benchmarks on PR and compares against main branch baseline
  • [ ] Create metrics/BENCHMARKS.md documenting benchmark results, methodology, and hardware specs
  • [ ] Modify Cargo.toml [workspace] to include proper bench targets in metrics crate
  • [ ] Add benchmark result tracking (e.g., using GitHub comments or artifacts) to show performance impact of PRs

🌿Good first issues

  • Add missing integration test coverage for the init command (helix-cli/src/commands/init.rs) in helix-cli/TESTING.md—write scenarios for project scaffolding with different graph templates.
  • Create a contributing guide that explains the HQL query language syntax and examples, since HQL_TESTS workflow exists but grammar/docs are not in the repo—extract from db_tests and write a helix-db/docs/HQL_GUIDE.md.
  • Implement a new container registry integration (e.g., Azure Container Registry following the pattern of helix-cli/src/commands/integrations/docker_hub.rs and ecr.rs) to extend cloud deployment options.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • b776c7b — feat(cli) : enterprise cli (#898) (xav-db)
  • dae5d71 — Update helix-cli version to 2.3.4 in Cargo.toml and Cargo.lock for consistency with recent changes. (xav-db)
  • add8690 — feat(cli): Enterprise cli (#889) (xav-db)
  • 2e93de3 — Update dependencies in Cargo.lock for helix-dsl-macros, helix-enterprise-ql, and sonic-rs to latest versions with checks (xav-db)
  • 731e0de — Update helix-enterprise-ql dependency version in Cargo.toml from Git URL to version 0.1.1 for improved stability. (xav-db)
  • 9b1d2e0 — changes (xav-db)
  • 86f4623 — Merge branch 'dev' of https://github.com/HelixDB/helix-db into enterprise-cli (xav-db)
  • 2d82e1a — Enhance enterprise cluster configuration and request handling (xav-db)
  • 467a145 — Add TypeScript Client Generation Plan and Enhance Enterprise Cluster Configuration (xav-db)
  • 662b2a9 — fix (core): perf improvements + brute force fix (#885) (xav-db)

🔒Security observations

HelixDB demonstrates a reasonable security foundation as a Rust-based project with modern build configurations. The codebase shows good practices with LTO enabled and stripped debug info in release builds. However, several areas require attention: no visible SECURITY.md policy for vulnerability reporting, potential for runtime optimization improvements, and the need for comprehensive static analysis of the actual Rust source code (helix-db/, helix-cli/, etc.). The workspace structure appears well-organized with proper separation of concerns. Primary recommendations include establishing a security reporting policy, evaluating build optimization levels, and conducting thorough dependency audits using cargo audit to identify any vulnerable transitive dependencies not visible in the provided Cargo.toml snippet.

  • Medium · Unsafe Panic Configuration in Production — Cargo.toml - [profile.dev] and [profile.release]. Both dev and release profiles are configured with panic = "abort". While this is acceptable for release builds, it may mask debugging information during development and could lead to abrupt termination without proper error handling in production scenarios. Fix: Consider using panic = "unwind" in dev profile for better debugging. Ensure proper error handling mechanisms are in place before relying on panic abort in production.
  • Medium · Limited Build Optimization in Release — Cargo.toml - [profile.release]. The release profile uses opt-level = 2 instead of the maximum opt-level = 3. While this reduces compile time, it may impact runtime performance and security optimizations in production. Fix: Evaluate using opt-level = 3 for maximum performance and security optimizations in release builds, unless there are specific compatibility concerns.
  • Low · Debug Information Stripped Without Verification — Cargo.toml - [profile.release]. The release profile includes strip = "debuginfo" which removes debugging symbols. While this reduces binary size, ensure that symbol information is preserved separately for production debugging and crash analysis. Fix: Maintain separate debug symbol files for production deployments to enable proper crash analysis and error tracking without increasing binary size.
  • Low · No Visible Security Policy — Repository root. No SECURITY.md or security policy file is evident in the repository structure for reporting security vulnerabilities responsibly. Fix: Create a SECURITY.md file with responsible disclosure guidelines and contact information for security researchers to report vulnerabilities privately.
  • Low · Workspace Resolver Version Not Pinned — Cargo.toml - resolver setting. The Cargo.toml uses resolver = "2" without explicit version pinning, which could introduce subtle dependency resolution differences across environments. Fix: Ensure consistent dependency resolution by regularly updating Cargo.lock and documenting the resolver version strategy in the development guidelines.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Mixed signals · HelixDB/helix-db — RepoPilot