RepoPilotOpen in app →

kyegomez/OpenMythos

A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.

Mixed

Solo project — review before adopting

weakest axis
Use as dependencyMixed

single-maintainer (no co-maintainers visible); no CI workflows detected

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 1w ago
  • MIT licensed
  • Tests present
Show all 5 evidence items →
  • Solo or near-solo (1 contributor active in recent commits)
  • No CI workflows detected
What would change the summary?
  • Use as dependency MixedHealthy if: onboard a second core maintainer

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Forkable
[![RepoPilot: Forkable](https://repopilot.app/api/badge/kyegomez/openmythos?axis=fork)](https://repopilot.app/r/kyegomez/openmythos)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/kyegomez/openmythos on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: kyegomez/OpenMythos

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/kyegomez/OpenMythos shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Solo project — review before adopting

  • Last commit 1w ago
  • MIT licensed
  • Tests present
  • ⚠ Solo or near-solo (1 contributor active in recent commits)
  • ⚠ No CI workflows detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live kyegomez/OpenMythos repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/kyegomez/OpenMythos.

What it runs against: a local clone of kyegomez/OpenMythos — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in kyegomez/OpenMythos | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 40 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>kyegomez/OpenMythos</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of kyegomez/OpenMythos. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/kyegomez/OpenMythos.git
#   cd OpenMythos
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of kyegomez/OpenMythos and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "kyegomez/OpenMythos(\\.git)?\\b" \\
  && ok "origin remote is kyegomez/OpenMythos" \\
  || miss "origin remote is not kyegomez/OpenMythos (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "open_mythos/main.py" \\
  && ok "open_mythos/main.py" \\
  || miss "missing critical file: open_mythos/main.py"
test -f "open_mythos/moda.py" \\
  && ok "open_mythos/moda.py" \\
  || miss "missing critical file: open_mythos/moda.py"
test -f "open_mythos/tokenizer.py" \\
  && ok "open_mythos/tokenizer.py" \\
  || miss "missing critical file: open_mythos/tokenizer.py"
test -f "open_mythos/__init__.py" \\
  && ok "open_mythos/__init__.py" \\
  || miss "missing critical file: open_mythos/__init__.py"
test -f "pyproject.toml" \\
  && ok "pyproject.toml" \\
  || miss "missing critical file: pyproject.toml"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 40 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~10d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/kyegomez/OpenMythos"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

OpenMythos is an open-source PyTorch implementation of a Recurrent-Depth Transformer (RDT) architecture, theoretically reconstructed from publicly available research about Claude's internal architecture. It implements a three-stage model (Prelude transformer blocks → looped Recurrent block → Coda) with switchable attention mechanisms (MLA or GQA) and sparse mixture-of-experts routing, designed for exploring compute-adaptive reasoning with variable inference depth. Flat structure: core implementation in open_mythos/ (main.py contains the primary OpenMythos class, moda.py is the MoE router, variants.py holds architecture variants, tokenizer.py wraps tokenization); examples/ provides executable demos (moda_example.py, variants_example.py); training/ has fine-tuning scripts; tests/ has unit tests and benchmarks; docs/ has architectural docs.

👥Who it's for

ML researchers and engineers experimenting with transformer architectures, recurrent reasoning patterns, and mixture-of-experts routing; developers interested in Claude's potential design principles who want a reproducible, hackable codebase to probe attention mechanisms and expert specialization without proprietary constraints.

🌱Maturity & risk

Early-stage experimental project: the codebase is ~186KB of Python with basic test coverage (tests/test_main.py, test_tokenizer.py, benchmark scripts) but lacks CI/CD setup and shows no public commit history visible in the description. It is actively maintained (pip package published) but explicitly marked as theoretical/speculative, not production-grade.

High risk for production use: it's a reverse-engineered theoretical model with no official validation, depends on cutting-edge libraries (torch>=2.1.0, flash-attn>=2.8.3 for optional acceleration), and the single-maintainer model (kyegomez) presents maintenance uncertainty. The codebase appears to prioritize exploration over stability, with recurrent loop mechanisms that could diverge under different configs.

Active areas of work

Active development toward a functional RDT reference: the project publishes to PyPI and includes Flash Attention 2 integration as optional; training scripts in training/3b_fine_web_edu.py suggest ongoing fine-tuning work on web/education datasets. The examples and docs suggest iterative refinement of the MoE and attention variants.

🚀Get running

git clone https://github.com/kyegomez/OpenMythos && cd OpenMythos && pip install -e . && python example.py

Daily commands: Minimal execution: python example.py (loads MythosConfig, instantiates OpenMythos model, forward pass on random tokens). For training: python training/3b_fine_web_edu.py (requires datasets, training hyperparams configured in the script). Benchmarking: python tests/small_benchmark.py or tests/bench_vs_transformer.py.

🗺️Map of the codebase

  • open_mythos/main.py — Core model architecture implementation; defines the main Transformer-based Claude Mythos model with attention mechanisms and token processing
  • open_mythos/moda.py — Implements Mixture of Depth Architecture (MoDA), a load-bearing optimization for conditional computation across layers
  • open_mythos/tokenizer.py — Tokenization pipeline; essential for input preprocessing and vocabulary management across all inference/training paths
  • open_mythos/__init__.py — Public API surface; defines what contributors and users import from the package
  • pyproject.toml — Package metadata and dependency declarations; controls torch/transformers/flash-attn integration
  • docs/open_mythos.md — Architecture documentation explaining the theoretical reconstruction of Claude Mythos from research literature
  • open_mythos/variants.py — Model variants and configuration presets; allows flexible model instantiation with different hyperparameters

🧩Components & responsibilities

  • Model (open_mythos/main.py) (torch.nn.Module, multi-head attention, RoPE positional encoding, group query attention (optional)) — Transformer-based language model with attention, MLP, and MoDA routing; forward pass computes token embeddings and logits
    • Failure mode: OOM on long contexts (no batching optimizations

🛠️How to make changes

Add a new model variant

  1. Define a new hyperparameter configuration in open_mythos/variants.py as a function or class (e.g., MYTHOS_13B with dim, heads, depth) (open_mythos/variants.py)
  2. Update the variants mapping dictionary to register the new configuration (open_mythos/variants.py)
  3. Add an example instantiation in examples/variants_example.py to document usage (examples/variants_example.py)
  4. Run tests/test_main.py to verify the new variant initializes and runs forward pass without errors (tests/test_main.py)

Add a custom attention mechanism

  1. Implement a new attention class in open_mythos/main.py inheriting from torch.nn.Module (e.g., RotaryPositionalAttention) (open_mythos/main.py)
  2. Integrate the attention layer into the TransformerBlock or similar component by adding a parameter to select attention type (open_mythos/main.py)
  3. Create a unit test in tests/test_main.py to verify forward/backward compatibility (tests/test_main.py)
  4. Document the attention mechanism in docs/open_mythos.md with mathematical notation and performance notes (docs/open_mythos.md)

Create a new training script

  1. Copy training/3b_fine_web_edu.py as a template and modify dataset loading and model configuration (training/3b_fine_web_edu.py)
  2. Update model instantiation to use a variant from open_mythos/variants.py (open_mythos/variants.py)
  3. Import and use the tokenizer from open_mythos/tokenizer.py for preprocessing (open_mythos/tokenizer.py)
  4. Add the training script to training/ folder with a clear naming convention (e.g., training/7b_pretrain_wiki.py) (training/3b_fine_web_edu.py)

Benchmark a performance optimization

  1. Modify tests/small_benchmark.py or create a new test file to measure baseline inference speed for batch sizes and sequence lengths (tests/small_benchmark.py)
  2. Apply optimization to open_mythos/main.py (e.g., enable flash-attn via conditional import or kernel fusion) (open_mythos/main.py)
  3. Run tests/bench_vs_transformer.py to compare against baseline Transformers implementation (tests/bench_vs_transformer.py)
  4. Document performance gains in docs/open_mythos.md with benchmark numbers and hardware specifications (docs/open_mythos.md)

🔧Why these technologies

  • PyTorch 2.1+ — Core deep learning framework; enables torch.compile optimizations and modern autograd; supports distributed training via DDP
  • HuggingFace Transformers 4.40+ — Provides standard model interfaces, pre-trained weights, and integration with generative_utils for sampling
  • Flash Attention 2 (optional) — Reduces attention complexity from O(n²) memory/compute to O(n) via IO-aware kernel; 2-4x speedup on long contexts
  • HuggingFace Datasets — Efficient streaming dataset loading for web-scale training; supports distributed sampling and caching

⚖️Trade-offs already made

  • Theoretical reconstruction from research papers vs. official closed-source model

    • Why: Claude Mythos architecture is not publicly released; this is a best-effort open-source approximation
    • Consequence: May have lower accuracy/capabilities than actual Claude; useful for research/experimentation but not production parity
  • Mixture of Depth Architecture (MoDA) for conditional compute

    • Why: Reduces inference latency and memory by routing some tokens through fewer layers
    • Consequence: Adds routing complexity and training overhead; requires careful tuning of router loss to avoid collapse
  • Optional Flash Attention 2 (requires CUDA build tools)

    • Why: Significant speedup for long-context inference; not available on all platforms
    • Consequence: CPU/non-CUDA users cannot use the optimization; graceful fallback to standard attention required
  • Tokenizer implemented from scratch vs. using pretrained (e.g., tiktoken)

    • Why: Full control over vocabulary and BPE merges; reproducibility
    • Consequence: May not tokenize identically to proprietary Claude; requires careful validation against benchmarks

🚫Non-goals (don't propose these)

  • Production deployment at Claude scale (3B/7B models are research scale only)
  • Perfect accuracy parity with proprietary Claude Mythos (theoretical reconstruction may differ)
  • Real-time multi-user serving infrastructure (single-GPU inference only, no distributed API)

🪤Traps & gotchas

No explicit environment variable requirements listed, but Flash Attention 2 (optional) requires CUDA + build tools and will silently fall back if unavailable. The max_loop_iters parameter in RDT can cause unbounded compute if not carefully tuned; no built-in iteration budget or early-exit logic is documented. The tokenizer.py wraps an implicit default tokenizer (likely Tiktoken for Claude compatibility) but no explicit tokenizer model path is exposed in examples, so custom tokenizers may require code changes. LoRA rank configuration affects parameter count non-intuitively (lora_rank=8 with dim=256 is a small adapter); no validation warns on mismatches.

🏗️Architecture

💡Concepts to learn

  • Recurrent-Depth Transformer (RDT) — Core innovation in OpenMythos; allows dynamic reasoning depth per token by looping recurrent blocks, enabling adaptive compute allocation
  • Mixture of Experts (MoE) with Sparse Gating — OpenMythos uses sparse MoE routing (n_experts_per_tok << n_experts) to scale model capacity without proportional compute; critical for understanding the MODA router
  • Multi-Head Latent Attention (MLA) — One of two pluggable attention modes; a theoretical variant combining benefits of multi-head and latent-space compression; core to OpenMythos's experimental design space
  • Grouped-Query Attention (GQA) — Alternative attention mode in OpenMythos; reduces KV cache size and compute by sharing K,V heads across query groups; widely adopted in modern LLMs
  • Rotary Positional Embeddings (RoPE) — Encoding method for absolute position in transformer; test_rope_debug.py suggests custom RoPE tuning in OpenMythos; important for long-context stability
  • Low-Rank Adaptation (LoRA) — Parameter-efficient fine-tuning via lora_rank config; enables cheap adaptation of pretrained OpenMythos models without full weight updates
  • Token Router / Load Balancing Auxiliary Loss — MoE routers use auxiliary losses (implied in moda.py) to prevent expert collapse and balance load; critical for stable training of sparse expert models
  • huggingface/transformers — Reference implementation of standard Transformers; OpenMythos extends these patterns with recurrence and MoE, so users often cross-reference HF for baseline layer implementations
  • google-deepmind/gemini — Gemini also uses mixture-of-experts and multi-stage architectures; inspires and contextualizes OpenMythos's sparse routing design
  • lm-sys/FastChat — Training and serving framework for LLMs; relevant for practitioners who want to deploy and benchmark OpenMythos models at scale
  • EleutherAI/gpt-neox — Large-scale open LLM training codebase with distributed training primitives; provides battle-tested patterns for training OpenMythos variants
  • meta-llama/llama — Meta's open LLM with grouped-query attention (GQA); OpenMythos's GQA mode is partially inspired by and compatible with Llama's design

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for MoDA and RoPE implementations in open_mythos/moda.py

The repo has test_rope_debug.py and test_main.py, but tests/bench_vs_transformer.py and tests/small_benchmark.py appear to be benchmarks rather than unit tests. There are no dedicated integration tests validating that MoDA (Mixture of Depth Attention) and the custom RoPE (Rotary Position Embeddings) implementations work correctly end-to-end with various input shapes and configurations. This is critical for a theoretical architecture reconstruction where correctness is paramount.

  • [ ] Create tests/test_moda.py with tests for MoDA forward pass with different batch sizes, sequence lengths, and head configurations
  • [ ] Add tests/test_rope.py validating RoPE frequency computation, angle application, and numerical stability across different dimensions
  • [ ] Test interaction between MoDA and RoPE together in tests/test_integration_moda_rope.py
  • [ ] Verify tests pass with pytest and add to CI pipeline

Document the theoretical basis and architectural decisions in docs/architecture.md

The repo has docs/datasets.md and docs/open_mythos.md, but lacks detailed documentation of the architectural components. Files like open_mythos/moda.py and open_mythos/variants.py contain implementations without corresponding design docs. Contributors need to understand the research literature backing each component (MoDA, variant models, tokenizer choices) to make informed contributions.

  • [ ] Create docs/architecture.md with sections for: MoDA design, RoPE implementation rationale, GQAttention mechanism, and how variants differ
  • [ ] Add references to research papers in each section (link papers justifying design choices)
  • [ ] Include diagrams or pseudocode for the main forward pass flow
  • [ ] Link this doc from README.md in the Architecture section

Add GitHub Actions CI workflow to run tests and benchmarks on push/PR

The repo has pytest dependencies and test files but no .github/workflows/ directory visible. Without CI, contributors can't validate that their changes don't break the theoretical architecture or degrade performance. Given this is a research implementation, automated testing on multiple Python/PyTorch versions is essential.

  • [ ] Create .github/workflows/tests.yml to run pytest tests/test_*.py on Python 3.9, 3.10, 3.11 with torch>=2.1.0
  • [ ] Create .github/workflows/benchmarks.yml to run tests/small_benchmark.py and tests/bench_vs_transformer.py on PR, logging results as artifacts
  • [ ] Add status badges to README.md linking to workflow runs
  • [ ] Document in CONTRIBUTING.md (new file) that all PRs must pass CI before merge

🌿Good first issues

  • Add integration tests for all attention_type configurations (mla, gqa) in tests/test_main.py; currently test_main.py likely covers only one variant; would validate both code paths
  • Write docstrings and type hints for open_mythos/moda.py MODARouter class; the MoE routing logic is complex and underdocumented, making it hard to understand expert load-balancing and auxiliary losses
  • Extend docs/open_mythos.md with a 'Configuration Guide' section listing all MythosConfig parameters, their ranges, and trade-offs (e.g., n_experts vs n_experts_per_tok, prelude_layers vs coda_layers); currently lacks applied guidance for practitioners

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 8c68c1f — Update README.md (kyegomez)
  • f261645 — Update README.md (kyegomez)
  • 227dbb1 — new examples folder (kyegomez)
  • 963e112 — tiny tests (kyegomez)
  • 7d78ebe — flash attn (kyegomez)
  • eae0f04 — fix training (kyegomez)
  • 289981b — [bugf][act-halting][gate halted positions from weight accumulation][bugf][moe-router-bias][stop (kyegomez)
  • 7ba6907 — [improvement][loguru-logging][replace print with loguru in training script][feat][ckpt-logging][add checkpoint start and (kyegomez)
  • 18cca89 — [fix][rope Every decode token was stuck at position 0, so <q_decoded, k_cached> lost the (n - m) term entirely] (kyegomez)
  • 537b116 — just use adam for now in training maybe add muon later (kyegomez)

🔒Security observations

OpenMythos presents a moderate security posture typical of research/educational ML projects. Primary concerns are dependency management practices (unspecified upper bounds) and lack of formal security processes. The codebase appears to be a research implementation without production deployment patterns that would require stricter security controls. Training scripts and data handling capabilities should be audited for injection vulnerabilities. Recommendations focus on dependency pinning, documentation, and establishing a vulnerability disclosure process. No critical hardcoded secrets, SQL injection, or infrastructure misconfigurations detected in available files.

  • Medium · Unspecified Dependency Versions — requirements.txt, pyproject.toml. Dependencies in requirements.txt use minimum version pinning (>=) without upper bounds. This allows installation of major versions with potentially breaking changes or security vulnerabilities introduced in future releases. Example: torch>=2.1.0 could install torch 3.x or later with unknown compatibility implications. Fix: Use version pinning with upper bounds (e.g., torch>=2.1.0,<3.0.0) or lock dependencies using pip-compile or poetry.lock to ensure reproducible and secure builds.
  • Medium · Optional Dependency Not Documented — requirements.txt (commented line). Flash Attention 2 (flash-attn>=2.8.3) is listed as optional but commented out. Users may unknowingly install insecure or outdated versions if they manually add this dependency without version constraints. The comment suggests CUDA + build tools requirement, which could lead to compilation vulnerabilities. Fix: Create an optional dependencies group in pyproject.toml with proper version constraints and document the security implications of building from source (CUDA compatibility, compiler versions).
  • Low · Training Data Scripts Not Isolated — training/ directory, training/requirements.txt. Training scripts in the 'training/' directory with separate requirements.txt suggest data processing capabilities. If these scripts accept external input (datasets, configuration), they could be vulnerable to injection attacks or unsafe deserialization. Fix: Review training scripts (especially 3b_fine_web_edu.py) for: unsafe pickle/pickle-like deserialization, unvalidated dataset paths, command injection vectors, and implement input validation. Add documentation on secure usage.
  • Low · No SECURITY.md or Security Policy — Repository root. Repository lacks a SECURITY.md file or documented security vulnerability reporting process. This makes it difficult for security researchers to responsibly disclose vulnerabilities. Fix: Add a SECURITY.md file with: vulnerability reporting contact, expected response timeline, security update policy, and acknowledgment process. Consider adding to GitHub security policies.
  • Low · Incomplete README Security Context — README.md. The provided README snippet is truncated, limiting visibility into security disclaimers, usage warnings, or trust considerations that should accompany a theoretical ML architecture implementation. Fix: Ensure README includes: security assumptions, limitations of the theoretical implementation, warnings about production use, and guidance on responsible deployment.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Mixed signals · kyegomez/OpenMythos — RepoPilot