cubefs/cubefs

Item: cubefs/cubefs
Rating: 5
Author: RepoPilot

cloud-native distributed storage

Healthy

Healthy across the board

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 2w ago
✓15 active contributors
✓Distributed ownership (top contributor 20% of recent commits)

Show all 6 evidence items →

✓Apache-2.0 licensed
✓CI configured
⚠No test directory detected

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/cubefs/cubefs)](https://repopilot.app/r/cubefs/cubefs)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/cubefs/cubefs on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: cubefs/cubefs

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/cubefs/cubefs shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 2w ago
15 active contributors
Distributed ownership (top contributor 20% of recent commits)
Apache-2.0 licensed
CI configured
⚠ No test directory detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live cubefs/cubefs repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/cubefs/cubefs.

What it runs against: a local clone of cubefs/cubefs — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in cubefs/cubefs | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | Last commit ≤ 44 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>cubefs/cubefs</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of cubefs/cubefs. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/cubefs/cubefs.git
#   cd cubefs
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of cubefs/cubefs and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "cubefs/cubefs(\\.git)?\\b" \\
  && ok "origin remote is cubefs/cubefs" \\
  || miss "origin remote is not cubefs/cubefs (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 44 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~14d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/cubefs/cubefs"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

CubeFS is a CNCF-graduated cloud-native distributed file and object storage system written primarily in Go. It provides unified storage capabilities through a metadata-driven architecture that separates metadata nodes from data nodes, enabling horizontal scalability and high performance for both file and object workloads in Kubernetes and cloud environments. Monorepo structured around core components: authnode/ contains authentication services, with separate cmd/ directories expected for different node types (master, datanode, etc.). The authnode example shows the pattern: individual Go packages for specific services (api_service.go, http_server.go, keystore_cache_op.go) with configuration management (config.go) and cluster coordination (cluster.go).

👥Who it's for

DevOps engineers and storage architects deploying cloud-native infrastructure who need a Kubernetes-native distributed storage solution that handles both POSIX file access and S3-compatible object storage without maintaining separate storage systems.

🌱Maturity & risk

Production-ready and actively maintained. The project is a CNCF graduated project (not incubating), has comprehensive CI/CD pipelines (.github/workflows include ci.yml, release_test.yml, codeql.yml), OpenSSF Best Practices certification, and active governance (GOVERNANCE.md, MAINTAINERS.md present). However, the large Go codebase (15M+ LOC) and distributed systems complexity means careful operational planning is required.

Moderate operational complexity: this is distributed storage requiring careful cluster configuration, multi-node coordination, and understanding of metadata/data node separation. Dependencies are well-managed with Dependabot enabled (.github/dependabot.yml), but the monolithic Go codebase means a breaking change in core APIs could ripple widely. Verify your version stability requirements—check CHANGELOG.md and RELEASE.md before upgrading to new major versions.

Active areas of work

Active development on SLSA build security (slsa/ directory with authtool, bcache, client, fdstore, fsck configurations), dependabot automation enabled, GitHub Actions expanded (scorecard.yml for security scoring, issue automation with issue_inactive.yml and issue_similarity.yml), and ongoing release management (goreleaser.yaml, release_test.yml).

🚀Get running

Clone the repo: git clone https://github.com/cubefs/cubefs.git && cd cubefs. The Makefile at the root should contain build targets—examine it with make help or cat Makefile to see available builds for different node types. Expected: make build or component-specific targets like make authnode.

Daily commands: Start with the Makefile: make or make build (inferred from standard Go project structure). For specific components like authnode: likely make authnode or direct go build ./authnode. For cluster deployment, consult INSTALL.md and configuration examples (config.go pattern suggests YAML/TOML configs). Multi-node setup required—see cluster.go for coordination needs.

🗺️Map of the codebase

authnode/api_service.go: HTTP API endpoint handler—start here to understand request routing and response patterns
authnode/cluster.go: Cluster coordination logic—critical for understanding multi-node sync and membership
authnode/config.go: Configuration schema—defines all tunable parameters for the component
authnode/keystore_cache_op.go: Credential caching implementation—understand how auth state is persisted and queried
.github/workflows/ci.yml: CI/CD pipeline—shows test, build, and validation steps required before merge
Makefile: Build system entry point—how to compile all components and what targets exist
INSTALL.md: Deployment instructions—step-by-step cluster setup required before running

🛠️How to make changes

For storage logic: examine cmd/ directories for node types (master, datanode, etc., visible in file structure hints). For authentication: authnode/ is the entry point—modify api_service.go for API endpoints, keystore_cache_op.go for credential handling, cluster.go for cluster-aware auth. For new features: follow authnode's pattern: add handler in api_service.go, backing logic in new files, expose via config.go. Add tests alongside (Go convention: _test.go files).

🪤Traps & gotchas

Distributed systems require multi-node setup—cannot meaningfully test locally without at least 3 nodes (1 master, 2+ data nodes). Authnode example suggests stateful configuration (keystore_cache_op.go, cluster.go): changes to one node may not replicate; verify cluster quorum. Makefile may have undocumented prerequisites (Go version, protoc for proto files not visible in snippet). Check .github/workflows/ci.yml for exact test/build environment. CHANGELOG.md and RELEASE.md are essential before version upgrades due to distributed state migration complexity.

💡Concepts to learn

Distributed Metadata Architecture — CubeFS separates metadata nodes from data nodes; understanding this separation is essential for cluster design, scaling strategy, and debugging performance bottlenecks
Cluster Gossip Protocol — cluster.go implements node-to-node communication for membership and coordination; critical for understanding how nodes discover each other and maintain consistency
Credential Caching with TTL — keystore_cache_op.go implements authenticated access control; understanding TTL-based cache invalidation prevents security gaps and auth stalenessissues
Quorum-Based Consensus — Distributed storage requires quorum writes for metadata durability (implied by cluster.go and multi-master references); essential for understanding consistency guarantees
S3-Compatible Object API — CubeFS exposes object storage via S3-compatible endpoints; understanding REST API parity with AWS S3 is needed for client integration and feature requests
POSIX File System Semantics — CubeFS provides POSIX compliance for file access (inode model, directory trees, permissions); understanding POSIX contracts prevents subtle bugs in file operations
Container Storage Interface (CSI) — CubeFS integrates with Kubernetes via CSI drivers; understanding CSI provisioning, mounting, and lifecycle is essential for K8s deployments

ceph/ceph — Alternative distributed storage with similar metadata/data separation and POSIX+object support, but LGPL-licensed and C++-based vs. CubeFS's Go implementation
seaweedfs/seaweedfs — Go-based distributed file and object storage with simpler architecture, good comparison for Go patterns and simpler deployment model
etcd-io/etcd — Distributed key-value store used internally by CubeFS for cluster coordination and consensus—understanding etcd helps with understanding CubeFS's metadata management
kubernetes/kubernetes — CubeFS integrates as a storage provisioner for Kubernetes; understanding K8s API patterns and CSI is essential for deployment and integration
minio/minio — S3-compatible object storage; CubeFS offers similar S3 compatibility—comparison helps understand object layer design decisions

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive integration tests for authnode API service

The authnode directory contains critical authentication infrastructure (authnode/api_service.go, authnode/keystore_fsm.go, authnode/keystore_snapshot.go) but lacks visible integration tests. Given that CubeFS is a CNCF graduated project handling distributed storage authentication, robust testing of the authnode API service, FSM state transitions, and snapshot recovery is essential for security and reliability.

[ ] Create authnode/api_service_test.go with tests for API endpoints (likely HTTP handlers in authnode/http_server.go)
[ ] Create authnode/keystore_fsm_test.go testing state transitions for FSM operations (authnode/keystore_fsm_op.go)
[ ] Create authnode/keystore_snapshot_test.go testing snapshot creation/restoration and recovery scenarios
[ ] Reference existing test patterns in blobstore/access/controller/*_test.go files for consistency
[ ] Ensure tests cover authnode/cluster.go functionality for multi-node scenarios

Add GitHub Action workflow for blobstore security scanning and unit tests

The blobstore subdirectory is a substantial, independent component (with its own Dockerfile and Makefile) but lacks dedicated CI workflows. The main ci.yml likely focuses on the core CubeFS components. Creating a dedicated blobstore workflow would ensure the access controller, shard management, and volume handling code (blobstore/access/controller/.go) are continuously tested and scanned, especially given the multiple test files already present (_test.go).

[ ] Create .github/workflows/blobstore-ci.yml that runs make test in the blobstore directory
[ ] Add CodeQL scanning step specific to blobstore Go code to detect security issues
[ ] Add go vet, golint, or similar linting steps for blobstore/access/controller/ files
[ ] Configure workflow to trigger on changes to blobstore/** paths and on pull requests
[ ] Reference the existing blobstore/Makefile to understand build/test targets

Document authnode keystore cache operations and add missing unit tests

The authnode/keystore_cache_op.go file exists but has no visible corresponding test file. Cache operations are critical for authentication performance in distributed storage. Additionally, the interaction between authnode/keystore_cache_op.go, authnode/keystore_fsm.go, and authnode/keystore_fsm_op.go lacks clear documentation. Adding tests and documentation would help contributors understand the caching strategy.

[ ] Create authnode/keystore_cache_op_test.go with unit tests for cache get/set/invalidate operations
[ ] Add godoc comments to keystore_cache_op.go explaining cache semantics (TTL, eviction policy, thread safety)
[ ] Document the relationship between cache operations and FSM state in authnode/DESIGN.md or similar
[ ] Test cache hit/miss scenarios and concurrency safety in keystore_cache_op_test.go
[ ] Verify integration with authnode/keystore_fsm_op.go operations in integration tests

🌿Good first issues

Add unit tests for authnode/keystore_cache_op.go cache eviction logic—currently no cache expiration strategy visible; implement LRU with tests to prevent unbounded memory growth
Document the cluster.go gossip/quorum protocol in a design doc (cluster.go is present but no docs/design/ visible)—essential for contributors to understand node coordination
Create missing integration tests for authnode/config.go configuration hot-reload—config.go exists but no visible test coverage for validation or change propagation across cluster

⭐Top contributors

Click to expand

@sejust — 20 commits
@ics091 — 19 commits
@clinx — 11 commits
@tangdeyi — 10 commits
@chihe — 10 commits

📝Recent commits

Click to expand

16b5d3e — fix(blobnode): fix disk meta info crc not match for fallback last version (JasonHu520)
32fa5e5 — fix(clustermgr): minus reserved space once on replica node (sejust)
9bff191 — chore(clustermgr): fixup logging format of disk id (sejust)
2c1a882 — docs(blobstore): update documentation for new version (sejust)
c2d0d79 — fix(clustermgr): skip not enabled codemode when getWeightedDataUnitCount (ics091)
7acad97 — feat(proxy): print codemode config when start serve (ics091)
ec7940d — fix(proxy): check volume routeversion before update to cache (ics091)
193bfb4 — feat(clustermgr): create volume route for admin interface (sejust)
ad09ec2 — chore(cli): move args of clustermgr volume list to flags (sejust)
3269ca5 — fix(blobnode): fixup blobnode write trace of net reader (sejust)

🔒Security observations

CubeFS demonstrates a reasonable security posture with established vulnerability reporting procedures, active use of CI/CD security tools (GitHub Actions, Semgrep, CodeQL, SLSA), and OpenSSF Best Practices badge adoption. However, several areas require attention: incomplete security documentation, lack of visible dependency manifest files for verification, and unverified Docker and configuration security. The project appears to follow security-conscious practices but would benefit from completing dependency audits, verifying container hardening, and ensuring all security tooling configurations are comprehensive and properly enforced.

Medium · Missing or Incomplete SECURITY.md File — SECURITY.md. The SECURITY.md file appears to be truncated in the provided snippet. The vulnerability reporting section ends mid-sentence ('...that C'), which suggests the documentation may be incomplete or improperly maintained. This could lead to confusion for security researchers attempting to report vulnerabilities. Fix: Complete and review the SECURITY.md file to ensure it provides clear, comprehensive guidance on vulnerability reporting procedures and timelines.
Low · Potential Credential Exposure in GitHub Actions Workflows — .github/workflows/. Multiple GitHub Actions workflow files are present (.github/workflows/*.yml). While not directly visible in the file listing, workflow files commonly contain secrets, API keys, and authentication tokens that could be accidentally exposed if not properly managed with GitHub Secrets. Fix: Audit all GitHub Actions workflow files to ensure all credentials are stored in GitHub Secrets and not hardcoded. Implement branch protection rules and require code reviews for changes to workflow files.
Low · Lack of Dependency Management Visibility — Repository root (missing go.mod/go.sum). No package manager dependency files (go.mod, go.sum for Go projects) were provided in the file listing. For a Go-based cloud storage system, dependency management is critical for identifying vulnerable transitive dependencies. Fix: Verify that go.mod and go.sum files exist and are properly maintained. Use 'go mod tidy' regularly and integrate automated dependency scanning tools like 'go list -json -m all' into CI/CD pipelines.
Low · Docker Configuration Best Practices Not Verified — blobstore/Dockerfile. A Dockerfile exists in the blobstore component, but its contents are not visible. Docker images for cloud storage systems should follow security best practices such as running as non-root, using minimal base images, and excluding sensitive files. Fix: Review and enforce Docker best practices: use non-root USER directive, employ minimal base images (alpine/distroless), implement multi-stage builds, scan images with container security tools (Trivy, Snyk), and ensure sensitive files are in .dockerignore.
Low · Semgrep Configuration Present but Policies Unclear — .semgrep.yml, .semgrepignore. A .semgrep.yml file exists indicating static analysis is configured, but without visibility into its content, the severity and scope of security checks cannot be assessed. The presence of .semgrepignore suggests certain rules may be disabled. Fix: Review .semgrep.yml to ensure comprehensive security rules are enabled. Document any exceptions in .semgrepignore with justifications. Integrate Semgrep into CI/CD with blocking enforcement for high-severity findings.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

cubefs/cubefs

Embed the "Healthy" badge

Onboarding doc

Onboarding: cubefs/cubefs

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

🪤Traps & gotchas

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive integration tests for authnode API service

Add GitHub Action workflow for blobstore security scanning and unit tests

Document authnode keystore cache operations and add missing unit tests

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next