Angel-ML/angel
A Flexible and Powerful Parameter Server for large-scale machine learning
Slowing — last commit 7mo ago
weakest axisnon-standard license (Other); no tests detected
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 7mo ago
- ✓6 active contributors
- ✓Distributed ownership (top contributor 37% of recent commits)
Show all 8 evidence items →Show less
- ✓Other licensed
- ✓CI configured
- ⚠Slowing — last commit 7mo ago
- ⚠Non-standard license (Other) — review terms
- ⚠No test directory detected
What would change the summary?
- →Use as dependency Concerns → Mixed if: clarify license terms
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Forkable" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/angel-ml/angel)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/angel-ml/angel on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: Angel-ML/angel
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/Angel-ML/angel shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
WAIT — Slowing — last commit 7mo ago
- Last commit 7mo ago
- 6 active contributors
- Distributed ownership (top contributor 37% of recent commits)
- Other licensed
- CI configured
- ⚠ Slowing — last commit 7mo ago
- ⚠ Non-standard license (Other) — review terms
- ⚠ No test directory detected
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live Angel-ML/angel
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/Angel-ML/angel.
What it runs against: a local clone of Angel-ML/angel — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in Angel-ML/angel | Confirms the artifact applies here, not a fork |
| 2 | License is still Other | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | Last commit ≤ 237 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of Angel-ML/angel. If you don't
# have one yet, run these first:
#
# git clone https://github.com/Angel-ML/angel.git
# cd angel
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of Angel-ML/angel and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "Angel-ML/angel(\\.git)?\\b" \\
&& ok "origin remote is Angel-ML/angel" \\
|| miss "origin remote is not Angel-ML/angel (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
&& ok "license is Other" \\
|| miss "license drift — was Other at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 237 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~207d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/Angel-ML/angel"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Angel is a high-performance distributed parameter server platform for large-scale machine learning developed by Tencent and Peking University. It partitions model parameters across multiple server nodes and provides flexible synchronization models, efficient model-updating interfaces (PSFunc), and support for both standalone and Spark-on-Angel execution. Built in Java/Scala, it excels at handling high-dimensional models and supports deployment on YARN and Kubernetes. Monorepo structure: angel-ps/ is the core with angel-ps/core containing primary Java/Scala logic under src/main/java/com/tencent/angel/. Key subsystems: client/ (AngelClient, AngelYarnClient, AngelKubernetesClient for deployment modes), common/ (utilities, serialization), api/python/ (Python interop layer). Configuration via angel-default.xml and angel-site.xml. Deployment abstraction allows LOCAL, YARN, and Kubernetes modes via AngelDeployMode.
👥Who it's for
Machine learning engineers and researchers building distributed ML systems at scale, particularly those working on high-dimensional models (sparse embeddings, deep learning) who need a production-grade parameter server alternative to implementing distributed training from scratch. Also relevant for graph computing researchers using the SONA framework.
🌱Maturity & risk
Production-ready and actively maintained. Version 3.2.0+ released, dual language (Java + Scala) codebase with 13M+ LOC, comprehensive CI/CD via .travis.yml, official deployment guides for YARN/Kubernetes, and governance structure (COMMITTERS.md, CODEOWNERS.md). Jointly backed by Tencent and academic institution indicates stability, though commit recency not visible in provided data.
Large monolithic codebase (10M Java LOC) increases maintenance surface; dependency on Hadoop YARN and custom PS protocol may create lock-in. Python bindings exist (PythonGatewayServer.java) suggesting evolving API stability. No evidence of formal security audit visible. Migration complexity is high if moving away from PS architecture.
Active areas of work
Framework supports Spark-on-Angel integration (docs/overview/spark_on_angel_en.md referenced), graph computing via SONA under development, and Kubernetes deployment (AngelKubernetesClient present). Python API expansion evident (PythonGatewayServer, PythonRunner, PythonUtils classes). Version bump to 3.3.0 in pom.xml suggests active release cycle.
🚀Get running
git clone https://github.com/Angel-ML/angel.git && cd angel && mvn clean compile -DskipTests (Maven-based build per pom.xml files). For local testing: mvn test or follow docs/deploy/local_run_en.md. Docker image available (Dockerfile present for containerized deployment).
Daily commands: Local mode: mvn clean compile && java -cp target/classes com.tencent.angel.client.AngelClient (specific entry point varies by task). YARN mode: Configure angel-site.xml with YARN settings, then submit via AngelYarnClient. Docker: Build with provided Dockerfile, run container with Angel config mounted. See docs/deploy/ for detailed environment setup (Hadoop/YARN/Spark prerequisites required).
🗺️Map of the codebase
- angel-ps/core/src/main/java/com/tencent/angel/client/AngelClientFactory.java: Factory pattern entry point for deployment abstraction—determines whether Angel runs in LOCAL, YARN, or Kubernetes mode based on configuration
- angel-ps/core/src/main/java/com/tencent/angel/PartitionKey.java: Core abstraction for model parameter partitioning across PS nodes—fundamental to distributed parameter management
- angel-ps/core/src/main/java/com/tencent/angel/client/PSStartUpConfig.java: Configuration schema for parameter server initialization—controls memory, concurrency, and distributed behavior
- angel-ps/conf/angel-default.xml: System defaults for all configuration parameters—reference for understanding default behavior and tuneable parameters
- angel-ps/core/src/main/java/com/tencent/angel/api/python/PythonGatewayServer.java: Py4J gateway bridging Java PS to Python clients—critical for non-Java application integration
- angel-ps/core/src/main/java/com/tencent/angel/common/ByteBufSerdeUtils.java: Network serialization layer using Netty ByteBuf—handles efficient parameter transfer across PS nodes
🛠️How to make changes
Adding new algorithms: Implement PSFunc in angel-ps/core/src/main/java/com/tencent/angel/psagent/matrix/. New deployment mode: Extend AngelClientInterface and add case to AngelClientFactory.createAngelClient(). Configuration changes: Edit angel-ps/conf/angel-default.xml and AngelContext.java. Client-side logic: Modify angel-ps/core/src/main/java/com/tencent/angel/client/. Matrix operations: Add to com.tencent.angel.ml.math (referenced but not fully shown in file list).
🪤Traps & gotchas
YARN dependency: Requires Hadoop cluster; local mode may not exercise full distributed code paths. JVM tuning critical: Default heap may be insufficient for large models; see PSStartUpConfig for memory thresholds. Python API evolving: PythonGatewayServer suggests Py4J dependency not listed in pom snippet—may require separate Python/Java version coordination. PartitionKey complexity: Partitioning strategy is static per model—repartitioning requires model reload. Distributed deadlocks possible: SyncController synchronization model can deadlock if worker crashes mid-barrier (see design docs). Configuration overrides: angel-site.xml overrides angel-default.xml silently; easy to lose settings during cluster upgrades.
💡Concepts to learn
- Parameter Server Architecture — Angel's entire design centers on PS model where parameters live on dedicated servers and workers pull updates—understanding the push/pull semantics and consistency tradeoffs is mandatory
- Model Partitioning (horizontal/vertical sharding) — PartitionKey.java and the partitioner design (docs/design/model_partitioner_en.md) control how models are split across PS nodes—incorrect partitioning causes load imbalance and network bottlenecks
- Synchronization Barriers and Consistency Models — SyncController (referenced in design/) implements BSP/SSP/ASP consistency—choice affects convergence guarantees and throughput; critical for reproducibility
- PSFunc (Parameter Server Functions) — Distributed matrix operations executed on server-side to reduce network traffic—key optimization for high-dimensional models; design pattern borrowed from MXNet KVStore
- Sparse Vector/Matrix Formats — Angel excels at sparse high-dimensional data; ByteBufSerdeUtils handles efficient encoding of sparse structures—absence of dense matrix ops suggests CSR/COO internal formats
- YARN Resource Negotiation (Hadoop ecosystem) — AngelYarnClient abstracts YARN ApplicationMaster protocol; deploying on production clusters requires understanding YARN container lifecycle and resource requests
- Py4J JNI Bridge (Java-Python Interop) — PythonGatewayServer enables Python client code to invoke Java PS operations; serialization overhead and GIL interactions can become bottleneck for high-throughput workloads
🔗Related repos
apache/spark— Spark-on-Angel (docs/overview/spark_on_angel_en.md) is a first-class integration; understanding Spark's RDD/DataFrame APIs is prerequisite for using Angel as ML backendtensorflow/tensorflow— Alternative distributed training framework using parameter server; Angel differs in model-centric design and support for sparse high-dimensional modelsuber/petastorm— Complementary tool for data loading in Spark; users often combine with Spark-on-Angel for end-to-end ML pipelinespytorch/pytorch— Deep learning framework with distributed training; Angel's graph computing extension (under development) targets PyTorch integrationopenppl-public/ppl.nn— Chinese open-source neural network inference engine; potential deployment target for Angel-trained models in production
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add unit tests for Python API gateway and integration
The codebase has Python API support (PythonGatewayServer.java, PythonRunner.java, PythonUtils.java) but there are no visible test files in angel-ps/core/src/test for these critical Python integration components. Given that Angel supports Python usage, comprehensive tests for the Python gateway server, RPC communication, and data serialization would prevent regressions and ensure reliability for Python users.
- [ ] Create angel-ps/core/src/test/java/com/tencent/angel/api/python/ directory
- [ ] Add PythonGatewayServerTest.java with tests for server startup, shutdown, and request handling
- [ ] Add PythonRunnerTest.java with tests for Python script execution and environment setup
- [ ] Add PythonUtilsTest.java with tests for data type conversion and serialization between Java and Python
- [ ] Update .github/PULL_REQUEST_TEMPLATE.md to highlight Python API test requirements
Add GitHub Actions workflow for multi-platform build verification
The repo uses .travis.yml for CI, but GitHub Actions is now the standard for GitHub repos. Currently there's no .github/workflows/ directory visible. Adding GitHub Actions workflows would provide faster feedback, better integration with PR checks, and support for matrix testing across different JDK versions and Scala versions (given the breeze and Scala dependencies in pom.xml).
- [ ] Create .github/workflows/build.yml with Maven build, test, and code coverage steps
- [ ] Add matrix strategy testing against Java 8, 11, 17 and Scala 2.11, 2.12, 2.13 versions
- [ ] Add .github/workflows/test-python-api.yml specifically for Python API integration tests
- [ ] Configure workflow to run on PR creation/update and merge to main branches
- [ ] Add workflow status badges to README.md
Add integration tests for Kubernetes client and deployment modes
The codebase has AngelKubernetesClient.java, AngelYarnClient.java, and AngelLocalClient.java, but no visible test files for these critical deployment mode implementations. These clients are essential for the different deployment scenarios (local, YARN, Kubernetes), and lack of integration tests makes it difficult to verify correct behavior across modes, especially for the newer Kubernetes support.
- [ ] Create angel-ps/core/src/test/java/com/tencent/angel/client/kubernetes/ with KubernetesClientTest.java
- [ ] Create angel-ps/core/src/test/java/com/tencent/angel/client/yarn/ with YarnClientTest.java
- [ ] Create angel-ps/core/src/test/java/com/tencent/angel/client/local/ with LocalClientTest.java
- [ ] Add tests for client initialization, configuration validation, and job submission flows
- [ ] Add docker-compose test scenario in angle-ps/core/src/test/resources/ for local Kubernetes-like testing
- [ ] Document deployment testing procedures in CONTRIBUTING.md
🌿Good first issues
- Add unit tests for ByteBufSerdeUtils.java covering edge cases (empty arrays, large payloads >1GB, null values)—serialization is critical path for all PS operations but test coverage may be thin.
- Document PartitionKey partitioning strategies and add examples in docs/ showing how to optimize for skewed feature distributions—this is non-obvious and causes performance issues.
- Implement missing deployment mode: add AngelDockerClient for native Docker Compose orchestration (beyond Kubernetes)—infrastructure is ready but Docker Compose variant would lower barrier for local testing.
⭐Top contributors
Click to expand
Top contributors
- @rachelsunrh — 37 commits
- @ouyangwen-it — 33 commits
- @lengfeng343 — 13 commits
- @jyswpp — 10 commits
- @xiaolongwen — 6 commits
📝Recent commits
Click to expand
Recent commits
ca2525e— Merge pull request #1344 from Angel-ML/branch-3.3.0 (rachelsunrh)2a4f48a— Merge pull request #1343 from longkezhe/docs#1342 (paynie)7950322— [#1342]Docs: update sona algo document. (xiaolongwen)9cb63e6— Merge pull request #1341 from longkezhe/algo#1339 (rachelsunrh)2e3b3dd— [#1339]Feat: new features of Bruce Force and HNSW implemented by AnnModel (xiaolongwen)9fba6a0— Merge pull request #1338 from longkezhe/algo#1336 (rachelsunrh)ed6689c— Fix:correct 'pox.xml' license text and angel graph dependency version (xiaolongwen)5a3501f— Feat:new Approximate Nearest Neighbor algorithm of Hierarchical Navigable Small World (xiaolongwen)9e4a81b— Merge pull request #1335 from longkezhe/algo#1334 (rachelsunrh)df7a5a9— Feat/new graph algoritm PageRanKPro (#1334) (xiaolongwen)
🔒Security observations
- High · Outdated Apache Velocity Dependency with Known Vulnerabilities —
angel-ps/core/pom.xml - org.apache.velocity:velocity:1.7. The project uses Apache Velocity version 1.7, which was released in 2010 and contains multiple known security vulnerabilities including remote code execution (RCE) through template injection (CVE-2020-13936 and others). This version is no longer maintained. Fix: Upgrade to Apache Velocity 2.3 or later. If using older versions is required, apply security patches and implement strict input validation for template processing. - High · Insecure Protobuf Download in Dockerfile —
Dockerfile - RUN curl -fsSL --insecure -o /tmp/protobuf-2.5.0.tar.gz. The Dockerfile downloads protobuf 2.5.0 using --insecure flag which disables SSL/TLS certificate verification. This allows potential man-in-the-middle (MITM) attacks during the build process. Additionally, protobuf 2.5.0 is extremely outdated (released in 2012). Fix: Remove the --insecure flag to enforce certificate verification. Upgrade to protobuf 3.x or 4.x series. Use checksums or signatures to verify downloaded artifacts. - High · Outdated Base Image with Known Vulnerabilities —
Dockerfile - FROM maven:3.6.1-jdk-8 as dev. The Dockerfile uses 'maven:3.6.1-jdk-8' which is based on Java 8 and outdated Maven. Java 8 reached end-of-life for most vendors in 2022 and contains numerous known security vulnerabilities. Maven 3.6.1 is also outdated. Fix: Upgrade to a recent Maven image with Java 11 or later (e.g., maven:3.9-eclipse-temurin-17). Ensure base images are regularly updated. - High · Outdated and Vulnerable System Dependencies —
Dockerfile - apt-get install curl=7.52.1-5+deb9u9, g++=4:6.3.0-4, make=4.1-9.1, unzip=6.0-21+deb9u1. The Dockerfile installs outdated versions of system packages: curl 7.52.1 (2017), g++ 6.3.0 (2016), and unzip 6.0 (2015). These versions contain known CVEs. Fix: Remove version pinning or update to latest stable versions. Run 'apt-get upgrade' after apt-get update to get security patches. Use base images with recent Debian releases. - Medium · Potential Python Execution Risk —
angel-ps/core/src/main/java/com/tencent/angel/api/python/PythonRunner.java, PythonGatewayServer.java. The presence of PythonGatewayServer.java and PythonRunner.java suggests the system executes Python code. This introduces risks of arbitrary code execution if Python input is not properly sanitized. Fix: Implement strict input validation and sandboxing for Python code execution. Use restricted execution environments or process isolation. Avoid executing arbitrary user-provided Python code. - Medium · Missing Security Headers and HTTPS Configuration —
angel-ps/conf/ directory. Configuration files (angel-default.xml, angel-site.xml) do not show evidence of security headers, HTTPS enforcement, or TLS configuration for client-server communication in a distributed system. Fix: Implement TLS/SSL for all network communications. Add security headers. Configure authentication and authorization mechanisms. Review and harden all configuration files. - Medium · Breeze Dependency Version Unspecified —
angel-ps/core/pom.xml - org.scalanlp:breeze_${scala.binary.version}:1.2. The breeze_${scala.binary.version} dependency in pom.xml has version 1.2 specified, but the exact Scala version is interpolated from properties. This could lead to inconsistent builds and potential compatibility issues with security patches. Fix: Explicitly specify Scala binary version. Use dependency management to pin versions consistently. Verify compatibility with security updates.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.