apache/cassandra
Open source transactional distributed database. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure without compromising performance.
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 1d ago
- ✓35+ active contributors
- ✓Distributed ownership (top contributor 26% of recent commits)
Show all 6 evidence items →Show less
- ✓Apache-2.0 licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/apache/cassandra)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/apache/cassandra on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: apache/cassandra
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/apache/cassandra shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 1d ago
- 35+ active contributors
- Distributed ownership (top contributor 26% of recent commits)
- Apache-2.0 licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live apache/cassandra
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/apache/cassandra.
What it runs against: a local clone of apache/cassandra — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in apache/cassandra | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch trunk exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of apache/cassandra. If you don't
# have one yet, run these first:
#
# git clone https://github.com/apache/cassandra.git
# cd cassandra
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of apache/cassandra and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "apache/cassandra(\\.git)?\\b" \\
&& ok "origin remote is apache/cassandra" \\
|| miss "origin remote is not apache/cassandra (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify trunk >/dev/null 2>&1 \\
&& ok "default branch trunk exists" \\
|| miss "default branch trunk no longer exists"
# 4. Critical files exist
test -f "bin/cassandra" \\
&& ok "bin/cassandra" \\
|| miss "missing critical file: bin/cassandra"
test -f ".build/cassandra-build-maven-pom.xml" \\
&& ok ".build/cassandra-build-maven-pom.xml" \\
|| miss "missing critical file: .build/cassandra-build-maven-pom.xml"
test -f "CONTRIBUTING.md" \\
&& ok "CONTRIBUTING.md" \\
|| miss "missing critical file: CONTRIBUTING.md"
test -f ".build/checkstyle.xml" \\
&& ok ".build/checkstyle.xml" \\
|| miss "missing critical file: .build/checkstyle.xml"
test -f ".circleci/config.yml" \\
&& ok ".circleci/config.yml" \\
|| miss "missing critical file: .circleci/config.yml"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/apache/cassandra"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Apache Cassandra is a distributed NoSQL database that scales horizontally across commodity hardware by partitioning data across multiple nodes. It prioritizes availability and partition tolerance (AP in CAP theorem), offering linear scalability, tunable consistency, and masterless replication without compromising read/write performance even as the cluster grows. Java-centric monorepo: core database engine in src/ (likely), CQL query language support (.build/build-cqlsh.xml), separate Accord subsystem (.build/build-accord.xml), comprehensive build automation in .build/ with Maven POMs (cassandra-build-maven-pom.xml, cassandra-deps-maven-pom.xml), Python tooling for cqlsh in bin/, and Docker containerization templates for multiple Linux distributions.
👥Who it's for
DevOps engineers, database architects, and backend developers building large-scale applications that require high availability, horizontal scalability, and low-latency reads/writes across geographically distributed data centers (e.g., time-series data, IoT sensors, real-time analytics).
🌱Maturity & risk
Production-ready and highly mature—Cassandra is Apache's longstanding distributed database (originated at Facebook, open-sourced 2008), with extensive CI/CD pipelines (.build/ci/ directory), Docker support (.build/docker/), comprehensive test coverage, and active development visible in build orchestration files. Actively maintained with recent infrastructure updates.
Low operational risk for stable deployments, but complexity risk for new users—distributed systems require careful tuning of replication factor, consistency levels, and partition key design. The large Java codebase (55MB) and multiple build systems (Maven, Ant) introduce learning curve. Breaking changes between major versions require careful migration planning.
Active areas of work
Active development on multiple fronts: Accord transactional layer integration (build-accord.xml), dependency management enhancements (cassandra-deps-maven-pom.xml), security scanning via OWASP (build-owasp.xml) and Snyk, codebase refactoring tracked via Checkstyle (checkstyle.xml with suppressions), and multi-platform Docker builds (almalinux, debian, redhat).
🚀Get running
git clone https://github.com/apache/cassandra.git
cd cassandra
# Install Java (see supported versions in build.xml)
# Install Python for cqlsh
ant build # or mvn clean install via .build/cassandra-build-maven-pom.xml
bin/cassandra -f # start in foreground
bin/cqlsh # in another terminal to connect
Daily commands:
ant build # compile via .build/build-jars.sh or direct Ant
./bin/cassandra -f # foreground mode with stdout logging
# Or containerized:
docker build -f .build/docker/debian-build.docker .
docker run -it cassandra
🗺️Map of the codebase
bin/cassandra— Entry point script that launches the Cassandra daemon; understanding startup sequence is essential for runtime behavior and debugging..build/cassandra-build-maven-pom.xml— Primary Maven build configuration defining all dependencies, plugins, and build lifecycle; required reading for build system understanding.CONTRIBUTING.md— Contribution guidelines and development workflow; every contributor must follow these standards for patch submission and code quality..build/checkstyle.xml— Code style enforcement rules applied to all Java sources; violations block CI/CD and pull request acceptance..circleci/config.yml— CircleCI pipeline definition controlling automated testing, builds, and deployment; critical for understanding CI/CD constraints and test execution.TESTING.md— Testing methodology and test execution framework documentation; required for running and writing tests correctly..gitmodules— Git submodule configuration for external dependencies (notably Accord protocol); essential for proper repository initialization.
🛠️How to make changes
Add a new build artifact type (RPM, Deb, etc.)
- Define artifact format and packaging rules (
.build/cassandra-build-maven-pom.xml) - Create Docker build environment for the target platform (
.build/docker/build-redhat.sh) - Add Docker image definition (
.build/docker/almalinux-build.docker) - Register artifact in main build pipeline (
.build/build-artifacts.sh) - Add CircleCI job to produce and store artifact (
.circleci/config.yml)
Add a new code quality check or style rule
- Define checkstyle rules (or SonarQube rules for semantic checks) (
.build/checkstyle.xml) - If needed, suppress rule for legacy code (
.build/checkstyle_suppressions.xml) - Add verification step to pre-commit script (
.build/ci/precommit_check.sh) - Update quality gate thresholds if adding metrics (
.build/sonar/sonar-quality-gate.json)
Add a new test suite or testing mode
- Implement test executor and integration logic (
.build/run-tests.sh) - Create or update test Docker environment if containerized (
.build/docker/ubuntu-test.docker) - Add test job and orchestration to CircleCI (
.circleci/config.yml) - Document test execution in contributor guide (
TESTING.md)
Integrate a new external dependency (protocol, client library, etc.)
- Add dependency version to Maven POM (
.build/cassandra-build-maven-pom.xml) - If major dependency, consider git submodule for source (
.gitmodules) - Update OWASP dependency-check suppressions if known vulnerabilities are accepted (
.build/owasp/dependency-check-suppressions.xml) - Validate license compatibility in build and compliance checks (
.build/build-rat.xml)
🔧Why these technologies
- Maven (Apache) — Standardized Java build system with dependency management; widely adopted in enterprise ecosystems; supports plugin-based extensibility for custom build phases.
- Java (primary implementation) — Core language for distributed database engine; JVM provides memory management, concurrency primitives, and GC tuning required for low-latency data serving.
- Python (dtests, CQLsh) — Functional and integration testing via dtests with CCM (Cassandra Cluster Manager); CQLsh provides interactive query shell for client interaction.
- CircleCI + Jenkins — CircleCI for lightweight CI/CD with artifact caching; Jenkins for Kubernetes-based distributed testing at scale to handle complex multi-node scenarios.
- Docker — Containerized build environments (Debian, RedHat, Ubuntu) ensure reproducible artifacts and test isolation; enables cross-platform artifact generation.
- Accord Protocol (git submodule) — Transactional consistency protocol integrated for ACID guarantees; managed as separate submodule to allow independent versioning and development.
⚖️Trade-offs already made
-
Monolithic Maven POM instead of multi-module Maven reactor
- Why: Simpler dependency resolution and build ordering for a tightly-coupled database engine; avoids cross-module coupling complexity.
- Consequence: Single large POM requires careful version management; incremental builds across modules cannot be parallelized as effectively.
-
Dual CI/CD: CircleCI (fast feedback) + Jenkins (distributed testing)
- Why: CircleCI provides rapid artifact generation and unit test feedback; Jenkins handles distributed dtests requiring multi-node clusters.
- Consequence: Operational overhead managing two CI systems; potential for divergent artifact definitions or test coverage between platforms.
-
Pre-commit hooks enforced locally before push
- Why: Fail-fast developer feedback loop; prevents broken commits from reaching repository and blocking CI jobs.
- Consequence: Developers must maintain git hook installation; new clones require setup step; some developers disable hooks, bypassing checks.
-
Checkstyle for formatting rules; SonarQube for semantic quality
- Why: Checkstyle is lightweight and fast; SonarQube provides deeper semantic analysis (bug patterns, security vulnerabilities) without slowing pre-commit.
- Consequence: Two separate quality tools to configure and maintain; potential for conflicting rules or redundant checks; SonarQube requires external service.
-
Submodules for Accord protocol instead of JAR dependency
- Why: Accord is tightly
- Consequence: undefined
🪤Traps & gotchas
Java version constraints: build.xml specifies supported Java versions (search for 'java.supported' property)—using wrong version causes build failure. Python for cqlsh: bin/cqlsh requires specific Python 3.x version (check function 'is_supported_version' in the script). Git submodules: .build/git/git-hooks/pre-commit/100-verify-submodules-pushed.sh enforces submodule state; failing to commit submodule changes causes pre-commit rejection. Multi-build complexity: Both Maven (POMs in .build/) and Ant coexist; mixing build tools can cause inconsistencies. Test environment setup: Docker builds expect specific Linux distributions; host OS matters for some tests (.build/docker/_docker_init_tests.sh has platform-specific logic).
🏗️Architecture
💡Concepts to learn
- Consistent Hashing & Token Ring — Cassandra's partitioning strategy uses token ranges to assign data to nodes; understanding tokens explains how data distributes and why rebalancing works during node addition/removal.
- Quorum Consistency — Cassandra allows tuning read/write consistency (ONE, QUORUM, ALL); tunable consistency is core to its availability vs. strong-consistency tradeoff and must be understood for correct application design.
- SSTable (Sorted String Table) & LSM Tree — Cassandra stores immutable SSTables on disk organized as a log-structured merge tree; this architecture enables fast writes and explains compaction strategy tuning.
- Gossip Protocol — Cassandra uses epidemic gossip for peer discovery and cluster state dissemination without a master; understanding gossip explains partition detection and node failure handling.
- Write-Ahead Logging (CommitLog) — Cassandra logs writes before applying them to memtable; this durability mechanism is critical for understanding crash recovery and consistency guarantees.
- Bloom Filters & Index Structures — Cassandra uses Bloom filters to avoid unnecessary SSTable reads during compaction and query execution; affects query performance and storage overhead.
- Accordion (now Accord) Consensus — New transactional layer (.build/build-accord.xml) brings ACID guarantees to Cassandra; critical for understanding the shift from eventual to strong consistency in specific use cases.
🔗Related repos
apache/cassandra-accord— Separate repository containing the Accord transactional consensus protocol being integrated into Cassandra (referenced in .build/build-accord.xml).apache/cassandra-goclient— Official Go client library for Cassandra, enabling polyglot development against this database.scylladb/scylla— Alternative distributed database reimplements Cassandra wire protocol in C++, targeting same use cases with different performance tradeoffs.apache/cassandra-python-driver— Official Python driver for Cassandra; works alongside cqlsh (bin/cqlsh) for programmatic data access.apache/cassandra-dtest— Cassandra distributed test suite (Python-based) covering multi-node cluster behavior; likely called by .build/ci/ pipeline.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive CI/CD workflow validation tests for Docker build matrix
The repo has multiple Dockerfile configurations (.build/docker/*.docker) and build scripts for different platforms (debian, redhat, ubuntu, almalinux) but no automated tests validating that all Docker build configurations successfully build and pass basic smoke tests. This would catch cross-platform build failures early. New contributors can create a GitHub Actions workflow that builds each Dockerfile variant and runs basic health checks (e.g., cassandra --version in each container).
- [ ] Review existing Dockerfiles in .build/docker/ (almalinux-build.docker, debian-build.docker, ubuntu-test.docker)
- [ ] Create a new GitHub Actions workflow file (.github/workflows/docker-build-matrix.yml) that builds each Dockerfile variant
- [ ] Add smoke test steps (e.g., ENTRYPOINT validation, basic command execution) for each built image
- [ ] Document the workflow in .build/docker/README.md (currently missing) explaining when Docker images are built and tested
Create integration tests for submodule management and git hooks validation
The repo uses git hooks (.build/git/git-hooks/) for submodule management and has multiple submodule-related shell scripts (.build/sh/change-submodule*.sh, .build/sh/bump-accord.sh) but there are no automated tests validating these hooks work correctly. This is critical since broken git hooks can silently fail during development. A new contributor can create tests in .build/git/ that validate hook execution, submodule state, and error handling.
- [ ] Review existing git hook scripts in .build/git/git-hooks/ and submodule management in .build/sh/
- [ ] Create a test suite (.build/git/test-git-hooks.sh or similar) that validates: hook execution on checkout/switch, submodule state after operations, error cases
- [ ] Test the accord submodule bump workflow (referenced in .build/sh/bump-accord.sh) to ensure it doesn't break the build
- [ ] Document expected git hook behavior in .build/git/README.md (currently missing)
Add Python dependency security scanning and update automation for .build/ scripts
The repo contains Python scripts with pinned dependencies (beautifulsoup4==4.12.3, jinja2==3.1.5 in requirements files) and multiple Python test utilities (.build/ci/junit_helpers.py, .build/run-ci.d/run_ci.py, .build/ci/ci_parser.py) but lacks automated dependency vulnerability scanning and update management. This is a security gap for a critical infrastructure project. A contributor can add a Dependabot configuration and/or a CI step that scans Python dependencies in .build/ for known vulnerabilities.
- [ ] Audit all Python files in .build/ci/, .build/run-ci.d/, and other directories for dependency declarations
- [ ] Create or enhance .dependabot/config.yml to include Python dependency scanning for requirements files in .build/
- [ ] Add a CI workflow step (.github/workflows/security-scan.yml) that runs
pip-auditor similar on all .build/ Python scripts - [ ] Document the dependency update process in .build/README.md, including which versions are tested against which Python releases
🌿Good first issues
- Add Checkstyle rule enforcement documentation in CONTRIBUTING.md—.build/checkstyle.xml and checkstyle_suppressions.xml exist but most new contributors don't know why suppressions exist or how to add new rules.
- Create a quick-start shell script (.build/dev-setup.sh) that automates pre-commit hook installation, Java version verification, and first build—currently scattered across .build/git/install-git-defaults.sh and build.xml with no single entry point.
- Expand .build/ci/logging_helper.py documentation with examples—Python CI helpers (logging.sh, logging_helper.py, junit_helpers.py) lack docstrings explaining how to add new CI output formatters for test reports.
⭐Top contributors
Click to expand
Top contributors
- @smiklosovic — 26 commits
- @netudima — 16 commits
- @dcapwell — 5 commits
- @maedhroz — 5 commits
- @michaelsembwever — 5 commits
📝Recent commits
Click to expand
Recent commits
bf79036— Merge branch 'cassandra-6.0' into trunk (netudima)f06770f— Reduce memory allocations in SelectStatement.getQuery (netudima)9394991— Merge branch 'cassandra-6.0' into trunk (netudima)bfc4b0b— Avoid allocation by getFunctions in SelectStatement.authorize (netudima)3742994— Merge branch 'cassandra-6.0' into trunk (netudima)e64e119— Add no-build-accord Ant option to be able to skip Accord module rebuild (netudima)fe9f6c1— Merge branch 'cassandra-6.0' into trunk (smiklosovic)3df9dc2— Merge branch 'cassandra-5.0' into cassandra-6.0 (smiklosovic)1c718ab— Fix failing select on system_views.settings for non-string keys (JwahoonKim)aa61536— Merge branch 'cassandra-6.0' into trunk (netudima)
🔒Security observations
The
- Medium · Outdated Jinja2 Dependency —
Dependencies/requirements.txt - jinja2==3.1.5. Jinja2 version 3.1.5 is used, which may contain known vulnerabilities. Jinja2 3.1.5 was released in early 2024 and newer patch versions may be available with security fixes. Fix: Update to the latest stable version of Jinja2 (3.1.6 or later). Run 'pip install --upgrade jinja2' and test thoroughly before deployment. - Medium · Outdated BeautifulSoup4 Dependency —
Dependencies/requirements.txt - beautifulsoup4==4.12.3. BeautifulSoup4 version 4.12.3 is used. While relatively recent, security advisories may exist for this or earlier versions. Ensure this version is actively maintained. Fix: Check for known CVEs against BeautifulSoup4 4.12.3. Consider upgrading to the latest version (4.12.4 or later) if security patches are available. Use 'pip check' to verify dependencies. - Low · Missing Dependency Pinning Strategy —
Dependencies/requirements.txt. Dependencies are pinned to specific versions, but there's no hash validation or lock file (like requirements.lock or poetry.lock) visible. This could allow for supply chain attacks if a dependency repository is compromised. Fix: Implement hash checking using pip's --require-hashes option or use a dependency lock file tool like pip-tools, Poetry, or Pipenv to generate and maintain hash checksums for all dependencies. - Low · Potential XSS Risk in CI/Build Configuration —
.build/ci/generate-ci-summary.sh, .build/ci/generate-test-report.sh, .circleci/. Jinja2 is used in the codebase (as seen in requirements.txt). If template files are processed with untrusted input without proper escaping, XSS vulnerabilities could arise. This is particularly relevant in build automation and CI/CD pipelines that generate HTML reports. Fix: Ensure all Jinja2 templates use auto-escaping (autoescape=True) by default. Validate and sanitize any user-provided input before passing to templates. Review template files in .build/ci/ and .circleci/ directories. - Low · Git Hooks Not Cryptographically Signed —
.build/git/git-hooks/pre-commit/100-verify-submodules-pushed.sh, .build/git/git-hooks/post-checkout/100-update-submodules.sh. Git hooks in .build/git/git-hooks/ are shell scripts without apparent signature verification. Malicious modifications could go undetected. Fix: Implement git hook signing and verification. Use git config core.hooksPath and consider using tools like Husky with signature verification to ensure hooks haven't been tampered with. - Low · Docker Build Scripts May Have Hardcoded Paths —
.build/docker/build-jars.sh, .build/docker/build-artifacts.sh, .build/docker/build-debian.sh, .build/docker/build-redhat.sh. Multiple Docker build scripts exist (.build/docker/*.sh) that may contain hardcoded paths, credentials, or configurations that could be exposed in build logs or image layers. Fix: Review Docker build scripts for hardcoded secrets, API keys, or credentials. Use Docker build arguments (ARG) and secrets (--secret flag in BuildKit) instead of embedding sensitive data. Implement multi-stage builds to minimize layer exposure. - Low · CI Configuration Files May Expose Secrets —
.circleci/config.yml, .circleci/config.yml.PAID, .circleci/config.yml.FREE. CircleCI configuration files (.circleci/config.yml, .circleci/config.yml.FREE, .circleci/config.yml.PAID) may contain environment variables or secrets in plaintext if not properly managed. Fix: Never commit secrets to version control. Use CircleCI's environment variables and context management. Rotate any exposed credentials immediately. Use the CircleCI web UI for sensitive values like API tokens and credentials.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.