ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit today
- ✓39+ active contributors
- ✓Distributed ownership (top contributor 10% of recent commits)
Show all 6 evidence items →Show less
- ✓Apache-2.0 licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/ray-project/ray)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/ray-project/ray on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: ray-project/ray
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/ray-project/ray shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit today
- 39+ active contributors
- Distributed ownership (top contributor 10% of recent commits)
- Apache-2.0 licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live ray-project/ray
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/ray-project/ray.
What it runs against: a local clone of ray-project/ray — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in ray-project/ray | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of ray-project/ray. If you don't
# have one yet, run these first:
#
# git clone https://github.com/ray-project/ray.git
# cd ray
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of ray-project/ray and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "ray-project/ray(\\.git)?\\b" \\
&& ok "origin remote is ray-project/ray" \\
|| miss "origin remote is not ray-project/ray (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f ".buildkite/base.rayci.yml" \\
&& ok ".buildkite/base.rayci.yml" \\
|| miss "missing critical file: .buildkite/base.rayci.yml"
test -f ".buildkite/build.rayci.yml" \\
&& ok ".buildkite/build.rayci.yml" \\
|| miss "missing critical file: .buildkite/build.rayci.yml"
test -f ".github/CODEOWNERS" \\
&& ok ".github/CODEOWNERS" \\
|| miss "missing critical file: .github/CODEOWNERS"
test -f ".bazelrc" \\
&& ok ".bazelrc" \\
|| miss "missing critical file: .bazelrc"
test -f ".bazelversion" \\
&& ok ".bazelversion" \\
|| miss "missing critical file: .bazelversion"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/ray-project/ray"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Ray is a distributed computing engine that abstracts away the complexity of building scalable AI/ML applications across clusters. It provides three core primitives—Tasks (stateless remote functions), Actors (stateful worker processes), and Objects (immutable distributed references)—plus high-level libraries (Data, Train, Tune, RLlib, Serve) for ML workloads. The engine handles scheduling, fault tolerance, and resource management across heterogeneous infrastructure. Monorepo structure: python/ray/ contains the core distributed runtime and actor/task scheduling; python/ray/data/ implements the Data library; python/ray/air/ houses Train/Tune/RLlib abstractions; python/ray/serve/ is the serving framework. C++ core in unnamed directories likely under cpp/ or integrated via Cython. Bazel (BUILD.bazel, .bazelrc) is the primary build system; .buildkite/ defines comprehensive CI/CD stages (core, data, ml, llm, macos, linux_aarch64).
👥Who it's for
ML engineers and data scientists building production distributed training pipelines, hyperparameter tuning at scale, and real-time serving systems; DevOps engineers deploying Ray on Kubernetes via KubeRay; research teams needing quick distributed experimentation without managing Spark/Dask infrastructure directly.
🌱Maturity & risk
Ray is production-mature with extensive CI/CD (40+ .buildkite/*.rayci.yml pipeline configs), comprehensive test coverage across Python/C++/Java, and an active release automation system (.buildkite/release-automation/). The codebase spans 40M+ lines of Python and 9M+ lines of C++, indicating years of development. It is actively maintained with regular releases and a mature governance model.
Standard open source risks apply.
Active areas of work
Active work evident from: LLM-specific pipeline (.buildkite/llm.rayci.yml), ARM64 Linux support (.buildkite/linux_aarch64.rayci.yml), multi-platform wheel building (.buildkite/release-automation/wheels.rayci.yml), and KubeRay integration testing. Release automation is mature with pre-release validation scripts. The bisect infrastructure suggests ongoing debugging/regression testing.
🚀Get running
git clone https://github.com/ray-project/ray.git
cd ray
pip install -e python/
# Or for development with all dependencies:
pip install -e python/[default]
# Verify with: python -c 'import ray; ray.init()'
Daily commands:
# Start a local Ray cluster (single-machine):
python -c "import ray; ray.init()"
# Or via CLI (if installed):
ray start --head
# Run a simple task:
python -c "
import ray
ray.init()
@ray.remote
def hello():
return 'Hello, Ray!'
print(ray.get(hello.remote()))
ray.shutdown()
"
🗺️Map of the codebase
.buildkite/base.rayci.yml— Core CI/CD pipeline configuration that defines the build, test, and release workflow for all Ray components..buildkite/build.rayci.yml— Primary build pipeline orchestration that coordinates compilation of Ray core and all dependent modules..github/CODEOWNERS— Defines code ownership and review requirements; essential for understanding approval workflows and responsibilities..bazelrc— Bazel build configuration that controls compilation flags, toolchain selection, and platform-specific build behavior..bazelversion— Pinned Bazel version constraint; mismatches here cause widespread build failures across the entire project..pre-commit-config.yaml— Pre-commit hooks enforcing code quality standards, linting, and formatting before commits are accepted..readthedocs.yaml— Documentation build configuration that generates the official Ray API documentation and guides.
🛠️How to make changes
Add a new CI/CD pipeline for a Ray component
- Create a new pipeline YAML file in .buildkite/ (e.g., my-component.rayci.yml) following the structure of existing pipelines like core.rayci.yml (
.buildkite/base.rayci.yml) - Define pipeline steps with test matrix, build targets, and artifact handling using the RayCI DSL (
.buildkite/build.rayci.yml) - Register the pipeline in the main cicd.rayci.yml file by adding a trigger_build step (
.buildkite/cicd.rayci.yml) - Update .buildkite/always.rules.txt or test.rules.txt to control when this pipeline triggers (
.buildkite/always.rules.txt)
Add code ownership and enforce review requirements
- Edit .github/CODEOWNERS and add a pattern (e.g., python/ray/air/*) with the GitHub team or usernames to require (
.github/CODEOWNERS) - Configure branch protection rules in GitHub to require reviews from CODEOWNERS before merging (
.github/CODEOWNERS) - Optionally use the on_pull_request_synchronized workflow to notify reviewers when PR changes require reassessment (
.github/workflows/on_pull_request_synchronized.yml)
Enforce code quality standards for a new language or module
- Add pre-commit hooks in .pre-commit-config.yaml for the appropriate linters (e.g., pylint, flake8 for Python) (
.pre-commit-config.yaml) - For C++: Update .clang-tidy with new checks; for Python: Configure ruff/pylint rules in your module's config (
.clang-tidy) - For documentation: Add Vale rules in .vale.ini or add styles in .vale/styles/Google/ to enforce the project's voice (
.vale.ini) - Run lint.rayci.yml in CI to automatically block PRs that violate the new standards (
.buildkite/lint.rayci.yml)
Update Bazel build configuration for new platform or toolchain
- Modify .bazelrc to add platform-specific build flags, compiler settings, or feature flags for the new target (
.bazelrc) - If a Bazel version upgrade is needed, update .bazelversion to the new pinned version (
.bazelversion) - Test the new configuration locally, then add a new build pipeline in .buildkite/ (e.g., linux_aarch64.rayci.yml) for CI verification (
.buildkite/linux_aarch64.rayci.yml)
🔧Why these technologies
- Bazel build system — Handles monorepo complexity with multiple languages (C++, Python, Java), enabling hermetic builds, remote caching, and parallel compilation across 600+ files
- Buildkite CI/CD — Orchestrates distributed testing and multi-platform builds (Linux x86_64, ARM64, macOS, Windows) with per-branch customization and release automation
- Pre-commit hooks — Enforces code quality before commits reach CI, reducing feedback latency and catching formatting/linting issues locally
- ReadTheDocs + Sphinx + Vale — Automates documentation generation and style checking; Vale enforces consistent voice across Python API docs, tutorials, and guides
⚖️Trade-offs already made
-
Bazel over Make/CMake for the monorepo
- Why: Ray contains interdependent C++ core, Python bindings, Java API, and multiple libraries (Tune, RLlib, Serve, Data); Bazel's fine-grained caching and remote execution scales better
- Consequence: Steeper learning curve for contributors; longer initial setup (Bazel cache warmup); requires .bazelrc discipline to avoid build configuration drift
-
Buildkite over GitHub Actions for primary CI
- Why: Supports complex, multi-stage pipelines with matrix testing, custom hooks, and self-hosted runners for GPU/ARM infrastructure
- Consequence: Less native GitHub integration; logs and artifacts live in Buildkite; team must manage Buildkite organization and secrets separately
-
Distributed pre-commit checks (lint.rayci.yml) instead of GitHub branch protection
- Why: Allows flexible rule files (.buildkite/always.rules.txt) to enable/disable linting per component and track flakiness
- Consequence: Lint failures appear in CI, not on the PR immediately; potential for inconsistent local vs. CI behavior if developers skip pre-commit
🚫Non-goals (don't propose these)
- Does not perform real-time distributed computing itself; Ray is the runtime engine—this repo contains configuration for building and testing it
- Does not manage cloud infrastructure deployment; release artifacts are wheels and Docker images, deployment is handled by downstream tools (KubeRay, Anyscale, etc.)
- Not a monolithic application; rather a modular SDK with optional libraries (Ray Tune, Ray RLlib, Ray Serve
🪤Traps & gotchas
Bazel build system: Not standard pip/setuptools; requires .bazelrc and BUILD.bazel files in every directory. Multiple languages: C++ object store (plasma) must be built before Python bindings work; mixing Cython extensions with pure Python can cause import order issues. Distributed semantics: ray.put() and ray.get() have non-obvious memory lifetime behavior; objects persist in object store until explicit deletion. CI matrix explosion: 40+ .buildkite/*.rayci.yml files mean a single commit triggers dozens of parallel jobs; slow feedback loop. Python version lock: Check .bazelversion for required Bazel version; mismatches cause silent failures.
🏗️Architecture
💡Concepts to learn
- Actor Model — Ray's Actor primitive is the foundation for stateful, fault-tolerant distributed services; understanding actor semantics (at-most-once vs. at-least-once delivery) is critical for writing correct distributed code.
- Lineage-based Fault Tolerance — Ray recovers from failures by replaying the task graph; this means you must avoid non-deterministic side effects in Ray tasks—understanding lineage prevents subtle bugs.
- Plasma Object Store — Ray uses an in-memory object store (plasma) for fast inter-node data transfer; understanding object serialization, memory pinning, and eviction policy is essential for performance tuning.
- Lazy Task Graph Execution — Ray tasks are not executed eagerly; instead, they build a DAG of futures that is scheduled later. This deferred execution model enables optimization and dynamic scheduling.
- Resource-aware Scheduling — Ray's scheduler respects CPU/GPU/memory constraints and custom resource tags; misunderstanding resource specification leads to overcommitment and deadlock.
- Bazel Remote Execution — Ray uses Bazel with remote execution (evident from
.bazelrcconfigs); understanding Bazel's build cache invalidation and sandbox isolation is key to fixing build issues. - Cython C Extensions — Ray bridges Python and C++ via Cython; performance-critical paths (serialization, scheduling) are Cython code. Understanding the Python/C boundary prevents import and segfault issues.
🔗Related repos
dask/dask— Dask is an alternative distributed computing framework for lazy task graphs on Python; Ray users often compare scheduling/performance characteristics.apache/spark— Spark is the industry standard for distributed data processing; Ray.Data and Ray.Tune often position as Spark alternatives with tighter ML integration.ray-project/kuberay— Companion repo providing Kubernetes operators and Helm charts to deploy Ray clusters; essential for production on K8s.ray-project/anyscale— Anyscale is the commercial managed Ray platform; understanding their product helps contextualize Ray's market positioning and feature roadmap.pytorch/pytorch— Ray.Train integrates deeply with PyTorch for distributed training; many Ray users start with PyTorch and scale via Ray.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive CI/CD documentation for .buildkite configuration system
The repo has an extensive .buildkite/ directory with 50+ config files across multiple test suites (core.rayci.yml, ml.rayci.yml, rllib.rayci.yml, serve.rayci.yml, etc.), but there's no centralized documentation explaining how these configs work together, how to add new tests, or how the always.rules.txt vs test.rules.txt files determine test selection. This causes friction for contributors trying to understand the CI/CD pipeline and add new tests correctly.
- [ ] Create .buildkite/CONTRIBUTING.md documenting the config hierarchy and inheritance patterns
- [ ] Document the purpose and format of always.rules.txt, test.rules.txt, and always.rules.test.txt files
- [ ] Add examples of adding a new test to core.rayci.yml, rllib.rayci.yml, and ml.rayci.yml
- [ ] Document how to test changes locally using buildkite-agent or equivalent
- [ ] Reference this new doc from the main CONTRIBUTING.md
Create unified testing guide for multi-platform support (macOS, Linux aarch64, Windows)
The repo has separate CI configs for macos.rayci.yml, linux_aarch64.rayci.yml, and windows.rayci.yml with distinct rules files in .buildkite/macos/ but inconsistent test selection logic across platforms. New contributors don't know which tests run on which platforms or how to ensure cross-platform compatibility. Adding a structured testing matrix document would reduce platform-specific test failures.
- [ ] Create .buildkite/PLATFORM_TESTING.md documenting which test suites run on each platform
- [ ] Map .buildkite/macos/always.rules.txt against .buildkite/always.rules.txt to identify platform-specific overrides
- [ ] Document why certain tests are excluded on specific platforms (e.g., aarch64 constraints)
- [ ] Add a troubleshooting section for platform-specific test failures
- [ ] Include instructions for developers to validate changes locally on macOS/Linux/Windows
Implement test coverage reporting for the .claude/ AI agent configuration system
The repo has a new .claude/ directory with agents/ and rules/ (python-guidelines.md, security.md) that appears to guide AI-assisted development, but there's no mechanism to validate that these rules are being followed in CI/CD, nor any lint checks for Python code style consistency. Adding automated checks would ensure contributors adhere to these guidelines and catch violations early.
- [ ] Create a new .buildkite/claude-lint.rayci.yml that runs before other tests
- [ ] Implement a Python linter in a new lint/ directory that validates .claude/rules/python-guidelines.md compliance
- [ ] Add ruff/pylint configuration (.ruff.toml or .pylintrc) that enforces the security rules from .claude/rules/security.md
- [ ] Add a pre-commit hook configuration in .buildkite/hooks/ to validate .claude rules locally
- [ ] Document in .claude/CONTRIBUTING.md how to add new linting rules and validate them
🌿Good first issues
- Add type hints to
python/ray/core/public API (Tasks, Actors, Objects) and verify with mypy in CI—impacts all downstream libraries but is mechanical work. - Expand
.buildkite/always.rules.txtcoverage: identify critical modules lacking tests (e.g., object store eviction logic) and propose new test rules—direct impact on reliability. - Document the exact wheel-building process by improving comments in
.buildkite/release-automation/wheels.rayci.ymland adding a CONTRIBUTING.md section for maintainers—low code, high value.
⭐Top contributors
Click to expand
Top contributors
- @jeffreywang-anyscale — 10 commits
- @goutamvenkat-anyscale — 9 commits
- @myandpr — 8 commits
- @eicherseiji — 7 commits
- @sai-miduthuri — 6 commits
📝Recent commits
Click to expand
Recent commits
76ce1d1— [Serve] Gate ingress request router body forwarding behind escape hatch (#63183) (eicherseiji)4549a87— [Data] Jail unstructured_data_ingestion release test (#63236) (goutamvenkat-anyscale)189b7a9— [data] V1 _split_predicate_by_columns correctness fix (#63176) (goutamvenkat-anyscale)bcdf33e— [Release / Train] Fix Train tutorials (#63225) (pseudo-rnd-thoughts)11d194a— [Data] Fix JSONL read retry with advanced file cursor (#63233) (goutamvenkat-anyscale)bde05a9— [serve] add max_replicas_per_node to /api/serve/applications response (#63234) (akyang-anyscale)fa4a6f6— [serve][3/N] Introduce experimentalConsistentHashRouterfor session-sticky routing (#62906) (jeffreywang-anyscale)ab877c7— [serve] Fix serve router benchmark client setup (#63147) (jeffreywang-anyscale)22d45be— [ci] Gate release test image annotation on RAYCI_SELECT membership (#63235) (sai-miduthuri)aca711b— [train] Consolidate Train Run Metadata Sanitization and Improve Readability (#63182) (JasonLi1909)
🔒Security observations
The Ray project demonstrates basic security awareness with a published vulnerability reporting process and organized CI/CD infrastructure. However, the analysis is limited by incomplete visibility into actual implementation details. Key concerns include: (1) lack of visible security hardening in build pipelines, (2) no evidence of artifact signing/verification, (3) incomplete security documentation visibility, (4) missing details on secret management practices, and (5) Docker/container security verification needed. The codebase structure appears well-organized for a large distributed computing project, but security controls in CI/CD and release management require verification. Recommendations focus on implementing modern supply chain security practices (SLSA, SBOM, artifact signing) and formalizing the vulnerability disclosure program.
- Medium · Security Policy Redirect to External Email —
SECURITY.md. The SECURITY.md file directs security vulnerability reports to an external email address (security@anyscale.com) without a formal coordinated disclosure process documented. While this is a common practice, there is no evidence of a bug bounty program, SLA for responses, or PGP key for encrypted communications. Fix: Implement a formal vulnerability disclosure policy with: (1) PGP key for encrypted submissions, (2) documented response SLA, (3) safe harbor statement, (4) clear scope definition, (5) consider a bug bounty program for open-source projects of this scale - Medium · Build Pipeline Security Configuration Missing —
.buildkite/. Multiple Buildkite configuration files (.rayci.yml, .rayci.yaml) are present but without visible security controls. No evidence of: (1) secret management practices, (2) artifact signing, (3) build provenance, (4) SBOM generation for releases. Fix: Implement: (1) Secret rotation policies for CI/CD, (2) GPG signing for release artifacts, (3) SLSA framework compliance for build provenance, (4) Artifact integrity verification in release pipelines - Low · Incomplete Security Documentation —
SECURITY.md reference and .claude/rules/security.md. The file structure shows references to security documentation in 'doc/source/ray-security/index.md' but the actual content is not provided for analysis. This prevents verification of security guidance completeness. Fix: Ensure comprehensive security documentation covering: (1) authentication/authorization, (2) data encryption, (3) network security, (4) dependency management, (5) vulnerability reporting process - Low · Multiple Configuration Files Without Security Baseline —
.buildkite/*.rayci.yml files and .github/ workflows. Large number of CI/CD configuration files without visibility into: (1) environment variable handling, (2) secret injection methods, (3) access control policies, (4) audit logging. Fix: Implement: (1) Secret scanning in pre-commit hooks, (2) Least privilege access in CI/CD, (3) Audit logging for sensitive operations, (4) Signed commits requirement for main branches - Low · Docker Configuration Security Not Visible —
.buildkite/release-automation/forge_*.Dockerfile. Release automation includes Dockerfile references (forge_arm64.Dockerfile, forge_x86_64.Dockerfile) but content is not provided. Cannot verify: (1) base image security, (2) privilege escalation prevention, (3) non-root execution, (4) layer caching best practices. Fix: Ensure Dockerfiles: (1) use minimal base images from trusted registries, (2) run as non-root user, (3) use multi-stage builds, (4) scan for vulnerabilities before release, (5) pin specific base image versions
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.