RepoPilotOpen in app →

thanos-io/thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.

Healthy

Healthy across the board

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 1d ago
  • 20 active contributors
  • Distributed ownership (top contributor 48% of recent commits)
Show 3 more →
  • Apache-2.0 licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/thanos-io/thanos)](https://repopilot.app/r/thanos-io/thanos)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/thanos-io/thanos on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: thanos-io/thanos

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/thanos-io/thanos shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 1d ago
  • 20 active contributors
  • Distributed ownership (top contributor 48% of recent commits)
  • Apache-2.0 licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live thanos-io/thanos repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/thanos-io/thanos.

What it runs against: a local clone of thanos-io/thanos — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in thanos-io/thanos | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>thanos-io/thanos</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of thanos-io/thanos. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/thanos-io/thanos.git
#   cd thanos
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of thanos-io/thanos and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "thanos-io/thanos(\\.git)?\\b" \\
  && ok "origin remote is thanos-io/thanos" \\
  || miss "origin remote is not thanos-io/thanos (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/thanos-io/thanos"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

Thanos is a distributed monitoring system that extends Prometheus with unlimited storage retention and global query capabilities. It allows you to store historical metrics in object storage (S3, GCS, etc.) while maintaining fast query performance and providing a unified query interface across multiple Prometheus instances, enabling true high-availability metric systems with multi-year retention at scale. Monorepo structured with pkg/ containing core components (store, query, receive, compact, ruler), cmd/ for CLI entrypoints for each service (thanos sidecar, querier, store-gateway, compactor, ruler, receiver), and supporting infrastructure. Build artifacts are generated via Makefile, with Jsonnet configs for Kubernetes deployments in docs/. Protocol Buffer definitions (Cap'n Proto present) handle inter-service communication.

👥Who it's for

Platform engineers and SREs operating large-scale Prometheus deployments who need multi-year metric retention, cross-cluster querying, and high availability without managing massive local storage. Contributors are typically infrastructure team members familiar with Go, Prometheus architecture, and distributed systems.

🌱Maturity & risk

Production-ready and actively maintained. This is a CNCF Incubating project with multiple stable releases, comprehensive CI/CD pipelines (CircleCI + GitHub Actions), and an active community. The codebase shows modern Go practices (5M+ lines of Go), extensive tooling via .bingo/ for reproducible builds, and structured test infrastructure visible in the CircleCI config.

Low risk overall but with some operational complexity. The project has deep integration points with Prometheus and object storage backends, meaning misconfiguration can cause silent data loss or query inconsistency. Dependency management is handled via .bingo/ for tool versioning which is good, but the system requires careful operational planning (block sizes, retention policies, compaction scheduling). Single component failures are mitigated by HA design, but queries spanning multiple stores can fail partially.

Active areas of work

Active development across query optimization, long-term storage features, and improved HA capabilities. The repository shows ongoing work on compaction strategies, query-layer improvements, and Kubernetes integration patterns. ThanosCon 2024 (March) indicates community momentum. Regular CircleCI and GitHub Actions runs suggest multiple PRs in flight for core functionality.

🚀Get running

git clone https://github.com/thanos-io/thanos.git
cd thanos
make build
./thanos --help

For development: requires Go 1.19+, and make test runs the suite. Use make docs to build documentation with Hugo.

Daily commands:

make build
# Start a single Thanos querier (simplest setup)
./thanos query --store=127.0.0.1:10901
# Or run via Docker with compose (if docker-compose.yml exists in examples/)
docker-compose -f examples/docker-compose.yml up

See .devcontainer/ for VS Code dev environment with all dependencies pre-configured.

🗺️Map of the codebase

  • pkg/store/store.go: Core abstraction for querying blocks from object storage; essential to understand Thanos' storage retrieval layer
  • pkg/query/querier.go: Implements distributed querying logic across multiple stores/sidecars; critical for understanding how global query view works
  • pkg/compact/compact.go: Handles block compaction and downsampling; necessary for long-term retention strategy
  • pkg/receive/: Remote write receiver implementation; entry point for Prometheus remote write protocol integration
  • Makefile: Build orchestration using .bingo/ for reproducible tool versions; required to understand development workflow
  • .bingo/Variables.mk: Defines all pinned tool versions (golangci-lint, protoc, etc.); critical for matching CI environment locally
  • docs/design.md: Architecture decision document explaining core design trade-offs and component interactions
  • .circleci/config.yml: Full CI/CD pipeline definition; shows test matrix, linting requirements, and release process

🛠️How to make changes

Start in pkg/store/ for object storage and block access logic, pkg/query/ for distributed querying, or pkg/compact/ for retention/compaction. CLI changes go in cmd/thanos/ subdirectories (one per service). Add tests in *_test.go files alongside source. Jsonnet configurations for K8s deployments live in docs/components/ for reference setups.

🪤Traps & gotchas

  1. .bingo/ directory is auto-managed—commit it but don't edit directly; run make .bingo/golangci-lint etc. to update tool versions. 2) Protocol buffer definitions require protoc and protoc-gen-gogofast from .bingo/ to regenerate; missing these breaks builds silently. 3) Object storage credentials (AWS_ACCESS_KEY_ID, etc.) must be set for integration tests; local tests work but e2e tests need real or mocked S3/GCS. 4) Block format is strictly versioned—incompatible versions cause silent query failures, not errors. 5) Cap'n Proto files in repo suggest serialization beyond protobuf; understand which format is used for what (storage blocks vs gRPC messages).

💡Concepts to learn

  • Prometheus block format — Thanos stores metrics in Prometheus TSDB blocks; understanding chunk encoding, index structure, and metadata format is critical for store layer modifications
  • Distributed query execution with storeAPI — Thanos's StoreAPI gRPC interface is the protocol for querying multiple stores; essential for understanding how querier merges results across components
  • Object storage abstraction layer — Thanos abstracts S3, GCS, Azure, MinIO behind a common interface; understanding bucket operations (upload, download, list) is needed for storage layer work
  • Time-series compaction and downsampling — Thanos compacts blocks to reduce storage and improves query performance via downsampling; critical for understanding retention policies and cost optimization
  • gRPC streaming for large result sets — Thanos uses gRPC streaming (not protobuf message size limits) to return massive time-series results; important for query layer optimization and debugging OOM issues
  • Write-Ahead Log (WAL) and sidecar pattern — Thanos sidecar watches Prometheus WAL to upload blocks to object storage without modifying Prometheus; understanding WAL format prevents data loss bugs
  • High availability merge and deduplication — Thanos can deduplicate metrics from Prometheus HA pairs; understanding external labels, replica handling, and merge semantics is needed for query consistency
  • prometheus/prometheus — Thanos is built on top of Prometheus 2.0 storage format; understanding Prometheus internals (WAL, blocks, querying) is essential
  • cortexmetrics/cortex — Alternative CNCF solution for metric multi-tenancy and long-term storage; shares similar architecture patterns with Thanos but different approach
  • grafana/mimir — Grafana's metric storage system evolved from Cortex; competitor to Thanos with emphasis on multi-tenancy and cloud-native deployment
  • prometheus-operator/prometheus-operator — Kubernetes operator that deploys and manages Prometheus; commonly paired with Thanos sidecar injectors for K8s native monitoring
  • thanos-io/thanos-jsonnet — Official Jsonnet library for generating Thanos deployment configs; referenced from docs/ in main repo

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for bingo tool dependency management

The .bingo directory contains 20+ .mod/.sum file pairs for managing tool versions (golangci-lint, prometheus, alertmanager, etc.), but there's no visible test coverage ensuring these dependencies stay synchronized and can be properly resolved. A CI workflow or test suite validating that all .bingo modules can be downloaded and executed would prevent contributor friction and broken tool chains.

  • [ ] Create .github/workflows/bingo-validation.yaml to validate all .bingo/*.mod files resolve correctly
  • [ ] Add a test script in scripts/ that executes bingo get for each tool and verifies checksum matches .sum files
  • [ ] Document in CONTRIBUTING.md how to update bingo dependencies when adding new tools

Create devcontainer configuration validation and documentation

The repo has .devcontainer/Dockerfile and devcontainer.json but no tests ensuring the container builds successfully or that the welcome-message.txt is kept in sync with actual setup steps. Adding a CI check and comprehensive setup guide would improve the onboarding experience for contributors using VS Code/Codespaces.

  • [ ] Add .github/workflows/devcontainer.yaml to build and test the devcontainer image on PRs
  • [ ] Expand .devcontainer/welcome-message.txt with specific setup commands matching the Dockerfile (bingo installation, pre-commit hooks, etc.)
  • [ ] Add docs/development/devcontainer-setup.md with screenshots and troubleshooting steps

Add CI validation for CodeQL configuration consistency

The repo has .github/codeql/codeql-config.yml and .github/workflows/codeql-analysis.yml, but no tests ensuring the config references valid/available CodeQL query suites or that disabled rules are documented. Adding validation would prevent silent security analysis gaps.

  • [ ] Create scripts/validate-codeql-config.sh to parse .github/codeql/codeql-config.yml and verify all referenced queries exist in the CodeQL CLI version used
  • [ ] Add .github/workflows/codeql-config-lint.yaml to run validation on every PR touching security configs
  • [ ] Document in docs/security.md which CodeQL rules are disabled and why (with issue references)

🌿Good first issues

  • Add unit test coverage for error cases in pkg/store/bucket.go (block listing failures, corrupted metadata files)—currently sparse for edge cases
  • Create runnable example in examples/ showing Thanos querying a MinIO backend with retention policies; docs mention MinIO support but no end-to-end demo exists
  • Implement missing validation for pkg/compact/planner.go to catch invalid block retention configurations before compaction runs; currently fails at runtime

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 1f7c524 — Merge pull request #8810 from prymitive/grpc-partial-response (GiedriusS)
  • 045a320 — Add changelog (prymitive)
  • ed689ed — ruler: correctly pass query partial response (prymitive)
  • cdca548 — ruler, sidecar: Add TSDB stats endpoint to gRPC server (#8808) (simonpasquier)
  • 4cb3b81 — Merge pull request #8801 from ogulcanaydogan/fix/receiver-ignore-lost-found (GiedriusS)
  • b4c9c80 — fix(receive): validate tenant label (#8806) (guidonguido)
  • 88ba209 — Merge pull request #8799 from jdgeisler/fix-grpc-keepalive-enforcement (GiedriusS)
  • 43e5fde — receive: Fix non-retryable quorum failures returning 5xx (#8803) (saswatamcode)
  • 5940c31 — Set a KeepaliveEnforcementPolicy of 10s on all gRPC servers to match the client keepalive interval (jdgeisler)
  • 5ce4195 — make grpcserver.WithKeepaliveEnforcement configurable (jdgeisler)

🔒Security observations

The Thanos project demonstrates a strong security posture with good practices including non-root user execution in Docker, use of static analysis tools (golangci-lint, CodeQL), and security-aware design principles. However, there are areas for improvement: the security policy documentation is incomplete, base image SHA management could be more automated, and supply chain transparency could be enhanced with SBOM generation. The project's commitment to not storing sensitive data in logs and using standard cryptography libraries is commendable. No critical vulnerabilities were identified in the provided codebase snippets, but continuous monitoring of dependencies and security practices is essential for a CNCF incubating project.

  • Medium · Hardcoded Docker Base Image SHA — Dockerfile (BASE_DOCKER_SHA variable). The Dockerfile pins a specific SHA256 hash for the base image (quay.io/prometheus/busybox), but the hash is hardcoded in the Dockerfile. While this provides reproducibility, there's no documented process for updating this hash when security patches are released for the base image. This could lead to running outdated base images with known vulnerabilities. Fix: Implement an automated process to regularly scan and update the base image SHA. Consider using image scanning tools in CI/CD pipeline and document the update procedure. Maintain a changelog of base image updates.
  • Low · Incomplete Security Policy Documentation — SECURITY.md. The SECURITY.md file appears truncated (ends mid-sentence at 'We use stable Go versions to b'). The security policy documentation is incomplete, which could lead to confusion about the project's security commitments and vulnerability disclosure process. Fix: Complete the SECURITY.md documentation including: full security statement, vulnerability reporting procedure, security contact information, and supported versions for security patches.
  • Low · Potential Secrets in Environment Files — .bingo/variables.env and similar configuration files. The presence of '.bingo/variables.env' and configuration files in the repository could potentially contain sensitive information. While the .bingo directory appears to be for tool versioning, environment files sometimes inadvertently contain secrets. Fix: Audit all .env and configuration files to ensure no credentials, API keys, or sensitive data are committed. Use environment variable management best practices and consider using tools like git-secrets or pre-commit hooks to prevent accidental commits.
  • Low · Missing SBOM and Supply Chain Transparency — Repository root / CI-CD configuration. While the project uses various security tools (golangci-lint, CodeQL), there's no visible Software Bill of Materials (SBOM) or comprehensive dependency vulnerability tracking mechanism evident in the provided file structure. Fix: Generate and maintain SBOM using tools like syft or cyclonedx. Integrate dependency scanning tools (e.g., dependabot, Snyk) into CI/CD pipeline and document the supply chain security approach.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · thanos-io/thanos — RepoPilot