RepoPilotOpen in app →

uber/kraken

P2P Docker registry capable of distributing TBs of data in seconds

Healthy

Healthy across the board

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit today
  • 9 active contributors
  • Distributed ownership (top contributor 20% of recent commits)
Show all 6 evidence items →
  • Apache-2.0 licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/uber/kraken)](https://repopilot.app/r/uber/kraken)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/uber/kraken on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: uber/kraken

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/uber/kraken shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit today
  • 9 active contributors
  • Distributed ownership (top contributor 20% of recent commits)
  • Apache-2.0 licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live uber/kraken repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/uber/kraken.

What it runs against: a local clone of uber/kraken — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in uber/kraken | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>uber/kraken</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of uber/kraken. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/uber/kraken.git
#   cd kraken
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of uber/kraken and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "uber/kraken(\\.git)?\\b" \\
  && ok "origin remote is uber/kraken" \\
  || miss "origin remote is not uber/kraken (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "core/digest.go" \\
  && ok "core/digest.go" \\
  || miss "missing critical file: core/digest.go"
test -f "core/metainfo.go" \\
  && ok "core/metainfo.go" \\
  || miss "missing critical file: core/metainfo.go"
test -f "agent/agentserver/server.go" \\
  && ok "agent/agentserver/server.go" \\
  || miss "missing critical file: agent/agentserver/server.go"
test -f "build-index/tagserver/server.go" \\
  && ok "build-index/tagserver/server.go" \\
  || miss "missing critical file: build-index/tagserver/server.go"
test -f "core/peer_info.go" \\
  && ok "core/peer_info.go" \\
  || miss "missing critical file: core/peer_info.go"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/uber/kraken"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

Kraken is a P2P-powered Docker image registry that distributes blobs/layers at massive scale using a tracker-orchestrated peer network instead of traditional centralized distribution. It uses a small set of seed hosts and agent peers to achieve >50% of max download speed on every host, distributing 1M+ blobs daily at Uber with the ability to push 20K 100MB-1GB blobs in under 30 seconds. Monorepo split into logical domains: agent/ contains peer agents running on each host (agentclient, agentserver), build-index/ handles Docker tag/layer indexing with tagclient/tagserver/tagstore components, tracker/ orchestrates P2P network topology, origin/ manages seed content, and storage/ abstracts blob backends (S3, GCS, ECR, HDFS). Central CLI entry points in agent/main.go and build-index/main.go with shared config management (agent/cmd/config.go).

👥Who it's for

DevOps engineers and platform teams at large organizations running hybrid cloud infrastructure who need to distribute Docker images across thousands of hosts (15k+ per cluster) with high availability and no single point of failure. Developers contributing to Uber's internal infrastructure or those building large-scale container registries.

🌱Maturity & risk

Production-ready and actively maintained. Kraken has been in production at Uber since early 2018 handling massive scale (1M+ blobs/day). The repo shows a mature Go codebase with comprehensive testing (agent/agentserver/server_test.go, build-index/tagstore/store_test.go), CI/CD via GitHub Actions (.github/workflows/build-and-test.yaml), and linting configuration (.golangci.yml). Recent activity is evident from the Go 1.24.0 module version and modern dependencies.

Relatively low risk for a P2P system at this scale. Dependency count is moderate and well-managed through go.mod (cloud.google.com/go, AWS SDK, containerd, Docker libraries). The main risks are: (1) P2P network complexity requiring deep debugging skills for tracker/agent coordination issues, (2) heavyweight external dependencies on Docker distribution and containerd libraries that may have breaking changes, (3) lack of visible open-source community size (monolithic Uber project) means fewer external eyes on code. No evidence of recent security audits visible in the file list.

Active areas of work

Recent work appears focused on modernization: migration to Go 1.24.0, updates to containerd (v1.5.7) and OpenTelemetry instrumentation (otel/trace v1.41.0), and vulnerability scanning via GitHub Actions (.github/workflows/vulnerability-check.yaml). A labeler workflow suggests active issue triage. Limited visibility into specific open PRs from file list alone, but the active test infrastructure and CI suggest continuous development.

🚀Get running

Clone the repo and build using Make: git clone https://github.com/uber/kraken.git && cd kraken && make. Requires Go 1.24.0+ and Python 3 (see .python-version). For macOS setup: follow KRAKEN_MAC_SETUP_CODELAB.md. Dependencies install via go mod download. Check Makefile for specific build targets and test commands.

Daily commands: make build to compile binaries (outputs to build/ directory). Start tracker: ./build/tracker --config tracker.yaml. Start origin seed: ./build/origin --config origin.yaml. Start agents: ./build/agent --config agent.yaml. Build indices: ./build/build-index --config build-index.yaml. All require YAML config files; examples should exist in repo root or docs/. See agent/cmd/config.go and build-index/cmd/config.go for config schema.

🗺️Map of the codebase

  • core/digest.go — Defines content-addressable blob identification; central to how Kraken tracks and deduplicates Docker image layers across the P2P network.
  • core/metainfo.go — Models torrent-like metadata structures for blobs; essential for understanding how Kraken breaks large images into pieces for distributed transfer.
  • agent/agentserver/server.go — Entry point for the agent daemon that pulls blobs from peers; every contributor must understand the P2P pull orchestration logic.
  • build-index/tagserver/server.go — Maps Docker image tags to blob digests; the index that clients query to bootstrap P2P downloads.
  • core/peer_info.go — Represents peer identity and location data; fundamental to peer discovery and routing in the distributed system.
  • docs/ARCHITECTURE.md — Explains the overall P2P registry design, component responsibilities, and data flow across agent/tracker/origin/proxy.
  • Makefile — Build orchestration for all five Kraken components; reference for how to compile, test, and containerize the system.

🛠️How to make changes

Add a new tag resolver for a custom registry

  1. Create a new resolver implementing the TagResolver interface in build-index/tagtype/ (build-index/tagtype/docker_resolver.go)
  2. Register the resolver in the resolver map so build-index can instantiate it (build-index/tagtype/map.go)
  3. Add configuration for the resolver in the build-index config template (config/build-index/base.yaml)
  4. Write integration tests following the docker_resolver_test.go pattern (build-index/tagtype/docker_resolver_test.go)

Add a new storage backend for blobs

  1. Study how blobinfo and digest are used to address content (core/blobinfo.go)
  2. Implement a storage provider that maps digests to blob payloads (reference S3/GCS examples in aws-sdk-go and cloud.google.com/go/storage) (agent/agentserver/server.go)
  3. Wire the provider into agent configuration (config/agent/base.yaml)

Add a new metadata field to peer discovery

  1. Extend peer_info.go with new field and JSON/protobuf marshaling (core/peer_info.go)
  2. Update peer_context.go if the field affects request routing logic (core/peer_context.go)
  3. Add unit tests for marshaling and filtering on the new field (core/peer_info_test.go)

Add a new HTTP endpoint to an existing component

  1. Open the target component server (e.g., agent/agentserver/server.go or build-index/tagserver/server.go) (agent/agentserver/server.go)
  2. Register a new chi route handler following existing HTTP method patterns (agent/agentserver/server.go)
  3. Write tests in the *_test.go file mirroring existing endpoint tests (agent/agentserver/server_test.go)

🔧Why these technologies

  • Go + chi HTTP router — Lightweight concurrency model suits P2P peer coordination and high-throughput blob serving; chi provides composable middleware for auth and logging.
  • Content-addressable storage (digest-based) — Enables deduplication across image versions and automatic blob identification without central metadata; critical for multi-TB-scale efficiency.
  • Torrent-like metainfo (pieces + piece hashes) — Allows parallel download of large blobs from multiple peers; piece-level integrity verification prevents corruption propagation.
  • Pluggable tag resolvers (Docker Registry API, custom) — Accommodates hybrid cloud with multiple image sources; no dependency on a single registry backend.
  • Peer-to-peer gossip / tracker coordination — Distributes load away from central origin; peers cache and seed blobs, reducing bandwidth costs and latency for frequently accessed layers.

⚖️Trade-offs already made

  • P2P pull requires metainfo (piece list) to be fetched before streaming

    • Why: Enables parallel piece download and integrity checking, but adds latency for small blobs.
    • Consequence: Small blobs may be slower via P2P than direct origin pull; mitigated by local cache and peer diversity.
  • Build index manages tag→digest mappings, separate from origin

    • Why: Decouples tag resolution from blob serving; allows caching of manifests without re-fetching from origin on every request.
    • Consequence: Tag resolution adds one round-trip; offset by blob distribution speedup and tag cache TTL.
  • Agent exposes HTTP API rather than docker:// protocol

    • Why: Simpler integration into existing HTTP pull chains (e.g., proxy layer); stateless daemon.
    • Consequence: Callers must adapt to HTTP semantics; no full Docker daemon compatibility.
  • Storage backend pluggable (S3, GCS, local disk)

    • Why: Kraken can integrate into existing cloud infrastructure without reimplementing storage.
    • Consequence: Network latency to remote storage increases agent pull time; mitigated by local replication and caching.

🚫Non-goals (don't propose these)

  • Does not implement full Docker daemon functionality (no container runtime, no image building).
  • Not a Docker registryV2 clone; HTTP API is custom, requires adapter layer.
  • Does not handle image signing or signature verification (delegation to external tools).
  • Not designed for single-machine offline use; assumes networked peer topology.
  • Does not provide multi-tenancy or RBAC; assumes deployment in trusted cluster.

🪤Traps & gotchas

Config-heavy system: agents, tracker, origin, and build-index all require separate YAML configs (paths not visible in file list — check repo root or docs/). P2P network formation requires tracker and multiple agents running simultaneously. Database setup needed for tagstore (SQLite in dev, likely MySQL/PostgreSQL in production — see build-index/tagstore/store.go using GORM/sqlx). Storage backend must be pre-configured (S3, GCS, etc credentials required). TLS certificates needed for secure uploader authentication. DNS optional but recommended for peer discovery. Development requires Docker running locally for integration tests. The pre-commit hook in .githooks/pre-commit likely enforces Go formatting/linting — run git config core.hooksPath .githooks after clone.

🏗️Architecture

💡Concepts to learn

  • Pseudo-random regular graph topology — Kraken's core P2P coordination strategy — the tracker builds a graph where each agent connects to a fixed number of peers in pseudo-random order, ensuring scalability without hub-and-spoke bottlenecks. Understanding this is essential to grok tracker/ code.
  • Content-addressable blob storage — Blobs are identified by digest (hash) not mutable names; enables deduplication across images and safe replication. Critical for understanding how Kraken tracks and distributes layers without collision.
  • BitTorrent-inspired piece verification — Kraken likely uses subpieces with per-piece hashing (bencode serialization suggests this); allows peer verification without downloading entire blob. See jackpal/bencode-go dependency.
  • Pluggable storage backends via interface abstraction — Core design pattern allowing S3, GCS, HDFS, ECR, or other registries as blob backends without core code changes. Essential for extending Kraken to new storage systems.
  • Tracker-orchestrated distributed coordination — Single tracker (or replicated trackers) maintains cluster state and assigns peers to each other; avoids DHT complexity while keeping single point of failure avoidable. Critical to understand tracker/ for debugging network formation.
  • Seed/origin vs. peer agent roles — Kraken separates origin (trusted seed host) from agents (transient peers); origin provides initial content, agents form P2P mesh. Asymmetric role design is key to scalability.
  • Bencode serialization — Protocol format for P2P messages between agents and tracker (jackpal/bencode-go); more compact than JSON/Protobuf for peer coordination messages. Need to understand for message marshaling.
  • distribution/distribution — The canonical Docker Registry V2 implementation; Kraken wraps and extends this for P2P distribution, uses docker/distribution v2.7.1
  • containerd/containerd — Container runtime dependency used for blob metadata and image handling; directly required in Kraken (v1.5.7 in go.mod)
  • moby/moby — Docker daemon engine-api used for image pulling/pushing at Kraken's origin and agent layers
  • dragonflyoss/dragonfly — Similar P2P image distribution system for cloud-native environments; direct competitor/alternative approach to the same problem
  • opencontainers/image-spec — OCI image format specification; Kraken must comply with blob layer format and digest standards defined here

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for tracker peer discovery and DHT operations

The repo has a tracker component (config/tracker/base.yaml exists) but there are no visible *_test.go files in the tracker directory structure. Given that Kraken is a P2P system relying on peer discovery, the tracker is critical. Adding comprehensive tests for peer registration, heartbeat handling, and DHT consistency would catch regressions in core P2P functionality.

  • [ ] Create tracker/server_test.go with tests for peer registration/deregistration
  • [ ] Add tracker/client_test.go for client-side peer discovery operations
  • [ ] Test DHT replication and consistency under concurrent peer updates
  • [ ] Add benchmarks for tracker performance under high peer churn scenarios

Add proxy component server and client implementation with tests

The config/proxy/base.yaml and config/proxy/test.template files exist, indicating a proxy component is configured, but there's no visible proxy/ directory with implementation files (server.go, client.go, etc.). This is a gap between configuration and code. Implementing the proxy server and client would complete the architecture and enable the registry proxy functionality shown in the build index.

  • [ ] Create proxy/server/server.go implementing HTTP proxy routes for Docker registry requests
  • [ ] Create proxy/client/client.go for upstream registry communication
  • [ ] Add proxy/server/server_test.go with tests for request forwarding and caching logic
  • [ ] Add proxy/config.go to handle proxy-specific configuration parsing

Add missing test coverage for origin component blob storage operations

The config/origin/base.yaml exists (indicating origin server functionality), but like tracker, there are no visible origin/server_test.go or origin/store_test.go files shown in the file structure. Origin handles blob storage and distribution—critical for registry reliability. Tests for blob upload/download, corruption detection, and storage backend failures would significantly improve reliability.

  • [ ] Create origin/server/server_test.go testing blob upload/download endpoints
  • [ ] Create origin/blobstore/store_test.go for storage backend operations (S3, GCS, local)
  • [ ] Add tests for blob hash validation and integrity checking
  • [ ] Add failure scenario tests (disk full, network timeout, permission denied)

🌿Good first issues

  • Add integration tests for agent-to-agent blob transfer scenarios — currently agent/agentclient/client_test.go and agent/agentserver/server_test.go exist but lack end-to-end P2P scenarios with multiple peers. Good way to learn the P2P flow.
  • Document the tracker topology algorithm and peer graph construction in code comments — tracker/ appears to implement a 'pseudo-random regular graph' mentioned in README but logic is undocumented. Clarify in comments for new contributors.
  • Add missing test coverage for storage backend implementations — storage/ directory exists but test files are not listed; add tests for S3, GCS, and HDFS backends to ensure plugin pattern works correctly.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 70bb577 — build(deps): bump requests from 2.32.4 to 2.33.0 (#582) (dependabot[bot])
  • d6d4ef0 — build(deps): bump pygments from 2.15.0 to 2.20.0 (#583) (dependabot[bot])
  • af5c2d9 — feat(nginx/config): allow overwrite of proxy_read_timeout through yaml (#615) (Anton-Kalpakchiev)
  • f862dd5 — build(deps): bump go.opentelemetry.io/otel from 1.39.0 to 1.41.0 (#608) (dependabot[bot])
  • d1b81b4 — fix(blobrefresh): deduplicate download requests for the same blob under different namespaces (#610) (Anton-Kalpakchiev)
  • 396c1a1 — feat(proxy): add blob download metrics (#606) (Anton-Kalpakchiev)
  • f79a4fa — fix(origin): improve metrics on remote blob downloads (#605) (Anton-Kalpakchiev)
  • db728c9 — feat(agent): Add leeching throughput metric (#604) (Anton-Kalpakchiev)
  • 3609864 — fix(agent): tune download throughput histogram buckets (#602) (Anton-Kalpakchiev)
  • 3a16c1e — feat(proxy): add metrics for data served and download failures (#601) (Anton-Kalpakchiev)

🔒Security observations

  • High · Vulnerable Go Module Dependencies with Known CVEs — go.mod (dependencies section). Multiple dependencies in go.mod contain known security vulnerabilities. Notable issues include: github.com/docker/docker-credential-helpers v0.6.3 (deprecated, multiple CVEs), gopkg.in/yaml.v2 v2.3.0 (YAML unmarshaling vulnerabilities), github.com/jinzhu/gorm v1.9.16 (SQL injection risks, deprecated in favor of v2), and golang.org/x/net v0.48.0 (outdated, missing recent security patches). Fix: Update all dependencies to latest secure versions: Use GORM v2, upgrade gopkg.in/yaml.v2 to v3 or switch to the maintained yaml parser, update golang.org/x/net to latest, and migrate from deprecated docker-credential-helpers if possible or apply security patches.
  • High · Deprecated and Unmaintained Dependencies — go.mod (multiple dependencies). The project uses several deprecated packages: github.com/docker/docker-credential-helpers v0.6.3 (no longer maintained), github.com/jinzhu/gorm v1.9.16 (users directed to migrate to GORM v2), and gopkg.in/validator.v2 (unmaintained). These packages no longer receive security updates. Fix: Migrate to actively maintained alternatives: Replace GORM v1 with GORM v2, use a maintained validation library, and evaluate alternatives to docker-credential-helpers or ensure security patches are manually applied.
  • High · SQL Injection Risk via GORM v1 — build-index/tagstore/store.go, database/migration files. The project uses github.com/jinzhu/gorm v1.9.16, which is vulnerable to SQL injection attacks if raw SQL queries or improperly sanitized user input is passed to query methods. The codebase includes SQL-related files (build-index/tagstore/store.go, migrations) that could be exploited. Fix: Migrate to GORM v2 which has improved security. Ensure all queries use parameterized statements. Audit build-index/tagstore/store.go and any migration scripts for raw SQL queries with user input.
  • Medium · Weak YAML Parser Security — go.mod, config/ directory (all .yaml files). gopkg.in/yaml.v2 v2.3.0 has known vulnerabilities related to arbitrary code execution through YAML unmarshaling (CVE-2021-3129 and related issues). Configuration files are parsed from YAML (config/ directory), potentially exposing the application. Fix: Upgrade to gopkg.in/yaml.v3 or use a secure YAML parsing library. Validate and sanitize all YAML configuration inputs. Implement strict YAML parsing with disabled unsafe features.
  • Medium · Outdated Cryptographic Dependencies — go.mod (golang.org/x/net dependency). golang.org/x/net v0.48.0 is outdated and may lack recent security patches for TLS/network-related vulnerabilities. Given that Kraken is a P2P Docker registry handling network communication, this is particularly concerning. Fix: Update golang.org/x/net to the latest stable version. Review and update all x/crypto and x/sys packages to latest versions. Implement regular dependency scanning with tools like 'go list -u -m all'.
  • Medium · Missing Security Headers Configuration — agent/agentserver/server.go, build-index/tagserver/server.go, proxy components. The codebase includes HTTP servers (agent/agentserver, build-index/tagserver, and proxy components) but no visible implementation of security headers (CSP, X-Frame-Options, X-Content-Type-Options, HSTS, etc.). Fix: Implement middleware to add standard security headers. Use libraries like github.com/unrolled/secure or implement headers manually in the HTTP server initialization.
  • Medium · Potential Credential Exposure in Configuration — config/ directory (agent/base.yaml, build-index/base. Configuration files use .yaml templates (config/*/test.template) and base configurations. If sensitive credentials are stored in these files or environment variables are not properly validated, they could be exposed in logs or error messages. Fix: undefined

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · uber/kraken — RepoPilot