RepoPilotOpen in app →

xai-org/x-algorithm

Algorithm powering the For You feed on X

Mixed

Slowing — last commit 4mo ago

weakest axis
Use as dependencyMixed

single-maintainer (no co-maintainers visible); no CI workflows detected

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 4mo ago
  • Apache-2.0 licensed
  • Tests present
Show all 6 evidence items →
  • Slowing — last commit 4mo ago
  • Solo or near-solo (1 contributor active in recent commits)
  • No CI workflows detected
What would change the summary?
  • Use as dependency MixedHealthy if: onboard a second core maintainer

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Forkable
[![RepoPilot: Forkable](https://repopilot.app/api/badge/xai-org/x-algorithm?axis=fork)](https://repopilot.app/r/xai-org/x-algorithm)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/xai-org/x-algorithm on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: xai-org/x-algorithm

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/xai-org/x-algorithm shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Slowing — last commit 4mo ago

  • Last commit 4mo ago
  • Apache-2.0 licensed
  • Tests present
  • ⚠ Slowing — last commit 4mo ago
  • ⚠ Solo or near-solo (1 contributor active in recent commits)
  • ⚠ No CI workflows detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live xai-org/x-algorithm repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/xai-org/x-algorithm.

What it runs against: a local clone of xai-org/x-algorithm — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in xai-org/x-algorithm | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 138 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>xai-org/x-algorithm</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of xai-org/x-algorithm. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/xai-org/x-algorithm.git
#   cd x-algorithm
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of xai-org/x-algorithm and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "xai-org/x-algorithm(\\.git)?\\b" \\
  && ok "origin remote is xai-org/x-algorithm" \\
  || miss "origin remote is not xai-org/x-algorithm (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "home-mixer/main.rs" \\
  && ok "home-mixer/main.rs" \\
  || miss "missing critical file: home-mixer/main.rs"
test -f "home-mixer/candidate_pipeline/mod.rs" \\
  && ok "home-mixer/candidate_pipeline/mod.rs" \\
  || miss "missing critical file: home-mixer/candidate_pipeline/mod.rs"
test -f "home-mixer/scorers/phoenix_scorer.rs" \\
  && ok "home-mixer/scorers/phoenix_scorer.rs" \\
  || miss "missing critical file: home-mixer/scorers/phoenix_scorer.rs"
test -f "thunder/main.rs" \\
  && ok "thunder/main.rs" \\
  || miss "missing critical file: thunder/main.rs"
test -f "phoenix/recsys_model.py" \\
  && ok "phoenix/recsys_model.py" \\
  || miss "missing critical file: phoenix/recsys_model.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 138 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~108d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/xai-org/x-algorithm"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

The core recommendation engine powering X's "For You" feed, which retrieves posts from both in-network (Thunder—accounts you follow) and out-of-network (Phoenix—global corpus) sources, then ranks them using a Grok-based transformer model. It eliminates hand-engineered features in favor of a neural network that learns from your engagement history (likes, replies, shares) to predict relevance and ranking. Modular Rust-heavy monorepo with two main domains: home-mixer/ (orchestration and scoring layer with candidate_pipeline/, candidate_hydrators/, filters/, scorers/, query_hydrators/, and selectors/) and candidate-pipeline/ (generic pipeline abstraction with filter, hydrator, scorer, and selector traits). Entry point is home-mixer/main.rs, with home-mixer/server.rs handling request handling. Python components (99k LOC) likely support data processing or model serving.

👥Who it's for

X platform engineers and ML researchers working on recommendation systems who need to understand or modify the feed ranking logic, candidate retrieval pipelines, and scorer implementations that determine what content appears in users' timelines.

🌱Maturity & risk

Production-ready. This is X's actual recommendation system running at scale on their platform. The codebase shows mature patterns: modular pipeline architecture (candidate-pipeline/, home-mixer/) with clear separation of concerns, multiple hydrators and filters, and an orchestration layer. No public CI/test data visible in the snippet, but the production deployment status and deliberate architectural decisions indicate this is a battle-tested system.

Low operational risk for X internally, but high risk for external contributors due to the closed nature of dependencies (Grok transformer, internal X data services like Gizmoduck, Thunder, Phoenix). Critical dependencies on external services (gizmoduck_hydrator, video_duration_candidate_hydrator) mean local development requires mocked or real access to X infrastructure. No indication of semantic versioning or stable public APIs.

Active areas of work

Unable to determine from provided file structure alone—no git log, PR list, or issue tracker visible. However, the dual-source architecture (in-network + out-of-network) and the recent decision to open-source the Grok transformer suggest active optimization around candidate quality and ranking signal integration.

🚀Get running

Unable to provide exact clone and install commands—no package.json, Cargo.toml snippet, Makefile, or requirements.txt provided in the data. Likely: git clone https://github.com/xai-org/x-algorithm.git && cd x-algorithm, then cargo build (for Rust) and/or pip install -r requirements.txt (for Python). Requires Rust toolchain and likely Docker/Nix for X service mocking.

Daily commands: Unknown from provided data. Likely cargo run --release from home-mixer/ directory, but requires mocked or real connections to Thunder, Phoenix, Gizmoduck, and other X internal services. Local development probably requires Docker Compose setup or environment variables pointing to test/staging infrastructure.

🗺️Map of the codebase

  • home-mixer/main.rs — Entry point for the Home Mixer service that orchestrates the entire feed generation pipeline.
  • home-mixer/candidate_pipeline/mod.rs — Core abstraction defining the candidate pipeline stages that every feed request flows through.
  • home-mixer/scorers/phoenix_scorer.rs — Implements the Grok-based transformer ranking model that scores all candidates for the For You feed.
  • thunder/main.rs — Thunder service entry point handling in-network content retrieval from followed accounts.
  • phoenix/recsys_model.py — Phoenix ML model implementing the transformer-based ranking logic for out-of-network content discovery.
  • home-mixer/sources/phoenix_source.rs — Bridge to Phoenix retrieval service that fetches out-of-network candidate posts for ranking.
  • home-mixer/filters/mod.rs — Orchestrates all filtering stages that remove ineligible, duplicate, and user-blocked content from the feed.

🛠️How to make changes

Add a new post filter

  1. Create new filter file in home-mixer/filters/ following naming convention (e.g., custom_filter.rs) (home-mixer/filters/age_filter.rs)
  2. Implement Filter trait with filter() method returning bool on Candidate (home-mixer/candidate_pipeline/candidate.rs)
  3. Register filter in home-mixer/filters/mod.rs and add to filter chain in pipeline (home-mixer/filters/mod.rs)
  4. Add filter to candidate_pipeline/filter.rs pipeline orchestration (candidate-pipeline/filter.rs)

Add a new scoring signal

  1. Create new scorer file in home-mixer/scorers/ (e.g., engagement_scorer.rs) (home-mixer/scorers/phoenix_scorer.rs)
  2. Implement Scorer trait with score() method returning f32 on Candidate (home-mixer/candidate_pipeline/candidate.rs)
  3. Register scorer in home-mixer/scorers/mod.rs and configure weight in weighted_scorer.rs (home-mixer/scorers/weighted_scorer.rs)
  4. Add scorer execution to candidate_pipeline/scorer.rs pipeline stage (candidate-pipeline/scorer.rs)

Add a new candidate hydrator

  1. Create hydrator file in home-mixer/candidate_hydrators/ (e.g., custom_hydrator.rs) (home-mixer/candidate_hydrators/core_data_candidate_hydrator.rs)
  2. Implement CandidateHydrator trait with hydrate() method enriching Candidate fields (home-mixer/candidate_pipeline/candidate.rs)
  3. Register hydrator in home-mixer/candidate_hydrators/mod.rs (home-mixer/candidate_hydrators/mod.rs)
  4. Add hydrator call to candidate_pipeline/mod.rs hydration stage before filtering (home-mixer/candidate_pipeline/mod.rs)

Integrate a new candidate source

  1. Create source file in home-mixer/sources/ (e.g., custom_source.rs) implementing CandidateSource trait (home-mixer/sources/phoenix_source.rs)
  2. Implement fetch() method returning Vec<Candidate> based on Query context (home-mixer/candidate_pipeline/query.rs)
  3. Register source in home-mixer/sources/mod.rs and configure in phoenix/run_retrieval.py if ML-based (home-mixer/sources/mod.rs)
  4. Add source to candidate gathering in home-mixer/candidate_pipeline/mod.rs source merge step (home-mixer/candidate_pipeline/mod.rs)

🔧Why these technologies

  • Rust (home-mixer, thunder, candidate-pipeline) — Provides memory safety and performance needed for real-time feed serving at X's scale; zero-copy data passing between pipeline stages.
  • Python + PyTorch (Phoenix ML models) — Enables rapid experimentation and iteration on ML retrieval and ranking models; transformer architectures are better expressed in Python frameworks.
  • Grok-1 Transformer — State-of-the-art ranking model ported from xAI's open-source release; provides superior ranking quality over traditional heuristic scorers.
  • Kafka (Thunder event streaming) — Decouples tweet event producers from Thunder indexing service; enables horizontal scaling of post retrieval infrastructure.

🪤Traps & gotchas

Critical: Requires real or mocked connections to X internal services (Thunder, Phoenix, Gizmoduck, Subscription Service). No indication these are public APIs. Environment assumptions: Grok transformer model weights likely fetched from a model serving endpoint (not in repo). Concurrency gotcha: Rust async patterns with Tokio; hydrators and filters run in parallel (see candidate_pipeline.rs's trait design), order of operations matters. Feature engineering: Despite the README's claim of "eliminated hand-engineered features," the multiple specialized hydrators (subscription_hydrator, video_duration_candidate_hydrator, in_network_candidate_hydrator) suggest domain-specific logic is still present. Version fragility: Query and Candidate feature schemas (query_features.rs, candidate_features.rs) are tightly coupled to the Grok transformer checkpoint; model version mismatches will cause silent ranking degradation.

🏗️Architecture

💡Concepts to learn

  • Two-stage retrieval (Thunder + Phoenix) — Fundamental to this system's architecture—understanding why in-network and out-of-network sources are separate, how they're weighted, and when each is triggered is critical to modifying feed behavior
  • Feature hydration / enrichment pipeline — The candidate_hydrators pattern (core_data, gizmoduck, subscription, video_duration, vf) is central to the architecture; understanding when and why each hydrator runs determines query latency and ranking quality
  • Transformer-based ranking (Grok-1 adapted) — The phoenix_scorer uses a Grok transformer to predict engagement probabilities; understanding attention mechanisms and how user history is encoded as input is essential for debugging ranking issues
  • Multi-stage filtering with early exit — The filter chain (age_filter, author_socialgraph_filter, dedup, mute, etc.) is applied before scoring; understanding filter ordering and drop rates is crucial for feed quality and latency optimization
  • Async trait-based architecture (Rust) — The entire pipeline uses async traits (Filter, Hydrator, Scorer) with Tokio runtime; understanding how parallel hydration and filtering works is essential for extending the system
  • Deduplication strategies (conversation, retweet, previously-served) — Three separate dedup filters suggest domain-specific logic for avoiding feed fatigue; understanding when and why each fires informs feed freshness and diversity
  • Score-based selection with top-K truncation — The top_k_score_selector is the final stage that determines feed size; understanding how ties are broken and how K is chosen is critical for user experience and latency SLAs
  • xai-org/grok-1 — The original open-source Grok transformer model that powers the phoenix_scorer; this repo adapts that model for recommendation ranking
  • twitter/the-algorithm — X's older public recommendation algorithm; provides historical context on feed ranking approaches before the Grok-based transformer era
  • facebookresearch/faiss — Dense vector search library likely used in Phoenix retrieval and candidate ranking for efficient similarity computation over large corpora
  • pytorch/pytorch — Foundational ML framework; Grok transformer and any inference serving in this pipeline depends on PyTorch
  • huggingface/transformers — If the Grok implementation uses Hugging Face abstractions or model formats, this is the reference for transformer architecture and evaluation

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for candidate pipeline filters

The home-mixer/filters directory contains 9 filter implementations (age_filter.rs, author_socialgraph_filter.rs, core_data_hydration_filter.rs, etc.) but there is no visible test module. Given the critical role filters play in content moderation and feed quality, each filter needs unit tests covering happy paths, edge cases, and filter interaction scenarios. This is high-impact because filter bugs directly affect user experience and content integrity.

  • [ ] Create home-mixer/filters/tests/ directory with test modules
  • [ ] Add unit tests for age_filter.rs covering boundary conditions (age thresholds, missing age data)
  • [ ] Add unit tests for author_socialgraph_filter.rs covering graph relationship edge cases
  • [ ] Add integration tests in home-mixer/candidate_pipeline/tests/ verifying filter chain behavior
  • [ ] Document test coverage expectations in home-mixer/filters/mod.rs

Add scorer implementations documentation and examples for phoenix_scorer.rs and oon_scorer.rs

The scorers directory contains critical ranking components (phoenix_scorer.rs, oon_scorer.rs, author_diversity_scorer.rs, weighted_scorer.rs) but the README only mentions 'ranks everything using a Grok-based transformer model' without explaining how individual scorers work, their inputs, or configuration. New contributors cannot understand scoring logic without detailed documentation. Add a SCORING.md explaining each scorer's purpose, algorithm, and tuning parameters.

  • [ ] Create home-mixer/SCORING.md documenting scorer architecture and signal flow
  • [ ] Add inline documentation comments to home-mixer/scorers/phoenix_scorer.rs explaining model integration
  • [ ] Add inline documentation comments to home-mixer/scorers/oon_scorer.rs explaining out-of-network scoring logic
  • [ ] Document weighted_scorer.rs configuration format and weight tuning guidance
  • [ ] Add example usage patterns for each scorer in phoenix/run_ranker.py

Create CI/CD workflow for Rust linting, formatting, and unit tests

The repo contains substantial Rust code across candidate-pipeline/, home-mixer/, and phoenix/ directories but has no visible GitHub Actions workflow (.github/workflows/) for continuous integration. This means contributors cannot validate Rust code quality (clippy, rustfmt) or run tests automatically on PRs. Add a workflow to enforce code standards and prevent regressions.

  • [ ] Create .github/workflows/rust-ci.yml with clippy linting on all Rust code
  • [ ] Add rustfmt format checking step to enforce consistent style
  • [ ] Add cargo test step for candidate-pipeline/ and home-mixer/ test suites
  • [ ] Configure workflow to run on push to main and all PRs
  • [ ] Add passing CI badge to README.md

🌿Good first issues

  • Add unit tests for home-mixer/filters/ subdirectory. Currently no test files visible in the structure. Start with age_filter.rs and self_tweet_filter.rs—write parametric tests for boundary conditions (e.g., edge cases on tweet age thresholds, author mute lists).
  • Document the feature schema contract between home-mixer/candidate_pipeline/candidate_features.rs and the Grok transformer model. Create a markdown file explaining which features are required, their types, and normalization ranges. This unblocks external model experimentation.
  • Implement a metrics/observability module (e.g., home-mixer/metrics.rs) that tracks: candidate retrieval latency per source (Thunder vs. Phoenix), filter drop rates, scorer inference time. Add prometheus-style counters and histograms. Start with the pipeline orchestration layer (phoenix_candidate_pipeline.rs).

Top contributors

Click to expand
  • [@CI agent](https://github.com/CI agent) — 1 commits

📝Recent commits

Click to expand
  • aaa167b — Open-source X Recommendation Algorithm (CI agent)

🔒Security observations

  • High · Missing Dependency Lock File Verification — Repository root - missing dependency manifest analysis. The Dependencies/Package file content is empty or not provided. Without visibility into actual dependencies (Cargo.toml, Cargo.lock for Rust; pyproject.toml/uv.lock for Python), it's impossible to verify for known security vulnerabilities in transitive dependencies. This is a critical gap in supply chain security assessment. Fix: Provide complete Cargo.toml, Cargo.lock, pyproject.toml, and uv.lock files for review. Implement automated dependency scanning using tools like 'cargo audit', 'safety check', or Dependabot in CI/CD pipeline.
  • High · Potential Unsafe Rust Code in ML Pipeline — home-mixer/, candidate-pipeline/, phoenix/. The codebase includes Rust components handling complex data structures (candidate hydrators, scorers, filters) and Python ML models (phoenix/recsys_model.py). Unsafe Rust blocks and FFI boundaries between Rust and Python components could introduce memory safety vulnerabilities if not properly validated. Fix: Audit all unsafe Rust blocks for correctness. Use #![forbid(unsafe_code)] where possible. Implement proper error handling at Rust-Python FFI boundaries. Use tools like 'clippy' and 'miri' for static analysis.
  • High · Network Service Exposure Without Documented Security Controls — home-mixer/server.rs, home-mixer/main.rs, thunder/main.rs. The codebase includes server components (home-mixer/server.rs, thunder/kafka/tweet_events_listener.rs) that listen on network ports and process external data. No security headers, TLS configuration, or authentication mechanism documentation is visible. Fix: Implement TLS/SSL encryption for all network communications. Add authentication and authorization checks. Document API endpoints with security requirements. Implement rate limiting and input validation on all external-facing endpoints.
  • High · Unvalidated Data Deserialization — thunder/deserializer.rs, thunder/kafka/. The thunder/deserializer.rs file suggests handling of untrusted serialized data from Kafka streams. Deserialization vulnerabilities can lead to remote code execution if the deserializer accepts arbitrary types without validation. Fix: Use whitelist-based deserialization. Validate all incoming data types against a strict schema. Avoid using generic deserialization (e.g., pickle in Python, unsafe_deserialize patterns). Consider using schema validation (Avro, Protobuf).
  • Medium · Missing Input Validation in Filtering Components — home-mixer/filters/. Multiple filter components (age_filter.rs, author_socialgraph_filter.rs, vf_filter.rs, etc.) process user-provided and system data. Without explicit input validation logic visible, there's risk of invalid data propagating through the pipeline. Fix: Implement explicit input validation for all filter inputs. Define boundaries for acceptable values (age ranges, user IDs, etc.). Add logging for rejected/suspicious inputs. Use strongly-typed data structures to enforce constraints at compile time.
  • Medium · Potential Information Disclosure via Logging — home-mixer/side_effects/cache_request_info_side_effect.rs, home-mixer/scorers/. The codebase includes side effects (cache_request_info_side_effect.rs) and multiple scorers that could log sensitive information (user actions, ranking scores, internal state). If logs are not properly secured, they could leak user privacy data. Fix: Implement log redaction for sensitive data (user IDs, scores, interactions). Use structured logging with appropriate log levels. Ensure logs are transmitted and stored securely (encrypted, access-controlled). Implement log retention policies.
  • Medium · Hardcoded Configuration Risk — Repository root and service configuration files. No .env or config management documentation visible. Configuration for database connections, API endpoints, and service credentials may be hardcoded in source files or configuration files not shown. Fix: Use environment variables or secure secrets management (AWS Secrets Manager, HashiCorp Vault, Kubernetes Secrets) for all sensitive configuration. Never commit credentials or API keys to version control.
  • Medium · Weak Cache Security in Request Info Side Effect — undefined. The cache_request_info_side_ Fix: undefined

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Mixed signals · xai-org/x-algorithm — RepoPilot