tekaratzas/RustGPT
An transformer based LLM. Written completely in Rust
Healthy across all four use cases
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 7mo ago
- ✓9 active contributors
- ✓MIT licensed
Show all 7 evidence items →Show less
- ✓CI configured
- ✓Tests present
- ⚠Slowing — last commit 7mo ago
- ⚠Concentrated ownership — top contributor handles 51% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/tekaratzas/rustgpt)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/tekaratzas/rustgpt on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: tekaratzas/RustGPT
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/tekaratzas/RustGPT shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across all four use cases
- Last commit 7mo ago
- 9 active contributors
- MIT licensed
- CI configured
- Tests present
- ⚠ Slowing — last commit 7mo ago
- ⚠ Concentrated ownership — top contributor handles 51% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live tekaratzas/RustGPT
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/tekaratzas/RustGPT.
What it runs against: a local clone of tekaratzas/RustGPT — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in tekaratzas/RustGPT | Confirms the artifact applies here, not a fork |
| 2 | License is still MIT | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 239 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of tekaratzas/RustGPT. If you don't
# have one yet, run these first:
#
# git clone https://github.com/tekaratzas/RustGPT.git
# cd RustGPT
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of tekaratzas/RustGPT and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "tekaratzas/RustGPT(\\.git)?\\b" \\
&& ok "origin remote is tekaratzas/RustGPT" \\
|| miss "origin remote is not tekaratzas/RustGPT (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
&& ok "license is MIT" \\
|| miss "license drift — was MIT at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 4. Critical files exist
test -f "src/lib.rs" \\
&& ok "src/lib.rs" \\
|| miss "missing critical file: src/lib.rs"
test -f "src/transformer.rs" \\
&& ok "src/transformer.rs" \\
|| miss "missing critical file: src/transformer.rs"
test -f "src/self_attention.rs" \\
&& ok "src/self_attention.rs" \\
|| miss "missing critical file: src/self_attention.rs"
test -f "src/llm.rs" \\
&& ok "src/llm.rs" \\
|| miss "missing critical file: src/llm.rs"
test -f "src/main.rs" \\
&& ok "src/main.rs" \\
|| miss "missing critical file: src/main.rs"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 239 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~209d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/tekaratzas/RustGPT"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
RustGPT is a from-scratch transformer-based large language model written entirely in Rust using only ndarray for matrix operations—no external ML frameworks. It implements pre-training on factual text (data/pretraining_data.json) and instruction tuning on chat data (data/chat_training_data.json), with full backpropagation, gradient clipping, and an interactive chat mode. Single-package architecture: src/ contains modular layers (embeddings.rs, self_attention.rs, feed_forward.rs, layer_norm.rs, transformer.rs) composed into a unified LLM in llm.rs; main.rs orchestrates training pipeline and CLI; data/ holds JSON training corpora; tests/ mirrors src structure with one test file per module.
👥Who it's for
Rust systems programmers and ML enthusiasts who want to understand transformer internals without PyTorch/TensorFlow abstractions; educators teaching how LLMs work at the mathematical level; contributors interested in pure Rust ML implementations without external neural network libraries.
🌱Maturity & risk
Experimental / educational stage: single maintainer (tekaratzas), comprehensive test suite (8 test files covering all major modules), CI/CD configured (GitHub Actions workflows for check.yml and test.yml), but explicitly labeled as a toy project 'so far away from the larger models.' Not production-ready; suitable for learning and demonstration only.
Moderate risk: only 5 direct dependencies (ndarray, rand, serde stack, bincode, csv), all widely-used; single-maintainer project with no visible recent activity date; no native performance optimizations (pure Rust without SIMD/GPU acceleration means training is slow); lacks distributed training infrastructure. Good for learning; risky for any real inference workload.
Active areas of work
No recent activity visible in file list; repo appears stable rather than actively developed. Architecture and core components appear complete: full transformer stack, Adam optimizer, dataset loading, and bidirectional training (pre-training + instruction tuning) are all implemented. No open PRs or milestones indicated.
🚀Get running
git clone https://github.com/tekaratzas/RustGPT.git
cd RustGPT
cargo build
cargo run --release
Daily commands:
cargo run --release
Enters interactive chat mode; for training on data/pretraining_data.json and data/chat_training_data.json, edit main.rs training loop parameters and rebuild. No command-line training flags visible—configuration is code-driven.
🗺️Map of the codebase
src/lib.rs— Library root that exports all core transformer components; every contributor must understand the public API surface.src/transformer.rs— Core transformer block implementation with self-attention and feed-forward layers; the heart of the model architecture.src/self_attention.rs— Multi-head self-attention mechanism implementation; critical for understanding how the model attends to different positions.src/llm.rs— Main LLM struct orchestrating the full model pipeline including embeddings, transformer blocks, and output projection.src/main.rs— Entry point demonstrating training loop, inference, and interactive chat mode; shows how all components integrate.src/adam.rs— Adam optimizer implementation for backpropagation; essential for understanding training mechanics.src/dataset_loader.rs— Data loading and preprocessing pipeline; required for understanding how training data flows into the model.
🧩Components & responsibilities
- Self-Attention — Computes scaled dot-product attention
🛠️How to make changes
Add a new Transformer component (e.g., new attention variant)
- Create new module file in src/ with implementation of the component (
src/new_component.rs) - Add pub mod declaration and pub use in lib.rs to export the component (
src/lib.rs) - Create integration test importing the component and verifying forward/backward pass (
tests/new_component_test.rs) - Update Transformer struct or LLM struct in transformer.rs/llm.rs to use new component (
src/transformer.rs)
Add a new training dataset
- Create new JSON or CSV file in data/ directory with samples in the expected format (
data/new_dataset.json) - Update dataset_loader.rs to add a loader function for the new dataset format if needed (
src/dataset_loader.rs) - Extend vocab.rs if new tokens not in existing vocabulary (
src/vocab.rs) - Add training branch in main.rs with DatasetLoader call for the new dataset (
src/main.rs)
Modify training hyperparameters or optimization
- Adjust learning rate, batch size, or gradient clip value in main.rs training loops (
src/main.rs) - Modify Adam optimizer state or momentum factors in adam.rs if changing optimization strategy (
src/adam.rs) - Update LLM initialization in llm.rs if changing model depth or hidden dimensions (
src/llm.rs) - Add test cases in tests/ to validate convergence with new hyperparameters (
tests/llm_test.rs)
Add inference optimization (e.g., caching, quantization)
- Create new module src/inference_utils.rs with optimization logic (
src/inference_utils.rs) - Export from lib.rs and integrate into LLM or create wrapper struct (
src/lib.rs) - Update main.rs chat loop to use optimized inference path (
src/main.rs) - Add performance benchmarks in tests/ (
tests/inference_bench_test.rs)
🔧Why these technologies
- Rust + ndarray — Provides memory safety, type safety, and reasonable performance for dense matrix operations without external ML framework dependencies. ndarray enables efficient N-dimensional tensor operations.
- serde + serde_json + bincode — Enables serialization of model weights to disk (bincode for efficiency), and JSON for human-readable training data formats.
- rand + rand_distr — Critical for weight initialization (normal distribution), dropout, and stochastic sampling during inference.
⚖️Trade-offs already made
-
Implement transformer from scratch in Rust rather than wrapping TensorFlow/PyTorch
- Why: Educational clarity: showing exactly how backpropagation and attention work; pure Rust ecosystem
- Consequence: No GPU acceleration, significantly slower training than optimized frameworks; limited to toy-scale models
-
Manual gradient computation via backpropagation rather than automatic differentiation
- Why: Demonstrates gradient flow mechanics explicitly; easier to understand and modify
- Consequence: Risk of gradient bugs; each component must manually implement backward pass; harder to add new layer types
-
Single-threaded, CPU-only computation
- Why: Simplifies memory safety and gradient synchronization; focuses on architecture clarity
- Consequence: Slow training; unsuitable for real-world datasets; no distributed training
-
Character/token-level vocabulary with simple tokenization
- Why: Minimal dependencies; demonstrates core model mechanics without subword tokenization complexity
- Consequence: Less efficient sequence compression; poor handling of out-of-vocabulary words; requires longer sequences
🚫Non-goals (don't propose these)
- Production-grade LLM: explicitly stated as 'toy project' in README
- GPU acceleration or distributed training
- Inference optimization (quantization, pruning, distillation)
- Handling of very large models (billions of parameters)
- Real-time streaming inference
- Support for multi-modal inputs (images, audio)
- Context windows larger than model architecture allows
🪤Traps & gotchas
No environment variables required. No external services needed. Rust edition 2024 is specified in Cargo.toml (very recent, bleeding-edge—may cause compatibility issues with older Rust toolchains; rustup update required). Training is CPU-only and very slow on large datasets—expect hours for meaningful convergence. No checkpoint auto-loading in main.rs; model weights are not persisted between runs by default (must serialize manually or edit main.rs to enable saving/loading via bincode). Dataset format is strictly JSON with specific key names (check data/ examples to match structure).
🏗️Architecture
💡Concepts to learn
- Multi-Head Self-Attention — Core mechanism in self_attention.rs that lets the model attend to different parts of input in parallel; understanding query-key-value projections and attention weights is essential to modifying model behavior.
- Transformer Architecture (Encoder-Decoder) — The overall design pattern in transformer.rs combining attention, feed-forward, residual connections, and layer norm; foundational to understanding how layers compose.
- Backpropagation & Gradient Descent — llm.rs implements backward pass manually; understanding chain rule and how gradients flow through layers is essential to debugging training issues.
- Adam Optimizer (Adaptive Moment Estimation) — adam.rs implements momentum and adaptive learning rates per-parameter; tweaking beta1, beta2, learning rate in this file directly impacts training stability and convergence speed.
- Tokenization & Byte-Pair Encoding (BPE) — vocab.rs handles converting text to token IDs; understanding token vocabulary size and encoding scheme affects embeddings.rs dimension and output_projection.rs logit output shape.
- Layer Normalization — layer_norm.rs stabilizes training by normalizing hidden states; without it, gradients explode or vanish. Critical to understanding why training is stable or crashes.
- Gradient Clipping — llm.rs applies gradient clipping to prevent exploding gradients during backprop; understanding this prevents NaN losses in early training iterations.
🔗Related repos
huggingface/transformers— Industry-standard transformer implementations in Python/PyTorch; if you understand this RustGPT codebase, reading transformers' attention.py will show how the same concepts scale to production.karpathy/nanoGPT— Minimal GPT in PyTorch (similar educational intent); RustGPT is the Rust equivalent—compare architectures side-by-side to see language idioms vs. algorithmic differences.burn-rs/burn— Rust deep learning framework; if you outgrow RustGPT's educational scope, Burn provides higher-level abstractions while staying in Rust ecosystem.ndarray-rs/ndarray— The linear algebra backbone of RustGPT; understanding ndarray's Array API deeply improves matrix operation optimization and debugging in this project.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add integration test for end-to-end model training pipeline
The repo has unit tests for individual components (adam.rs, embeddings.rs, self_attention.rs, etc.) but lacks an integration test that validates the full training loop from data loading through model saving/loading. This would catch breaking changes across module boundaries and demonstrate the complete workflow.
- [ ] Create tests/integration_test.rs to test the full pipeline: dataset_loader → model initialization → forward pass → backward pass → weight updates
- [ ] Use a small subset of data/pretraining_data.json or data/chat_training_data.json to keep test runtime reasonable
- [ ] Verify model weights change after training steps and loss decreases over iterations
- [ ] Test model serialization/deserialization using bincode to catch breaking changes in serde implementations
Add benchmarking suite for performance-critical modules
A Rust LLM implemented without external ML frameworks relies heavily on efficient ndarray operations. There's no benchmarking infrastructure (benches/ directory missing) to track performance regressions in hot paths like self_attention.rs, feed_forward.rs, and embeddings.rs. This would help contributors avoid performance degradations.
- [ ] Create benches/attention_bench.rs to benchmark self_attention forward/backward passes with varying sequence lengths and hidden dimensions
- [ ] Create benches/feedforward_bench.rs to benchmark feed_forward operations
- [ ] Add [[bench]] entries to Cargo.toml pointing to these files
- [ ] Document baseline performance metrics in a BENCHMARKS.md section of README.md
Add property-based tests for gradient computation correctness
The repo implements custom backpropagation (adam.rs, self_attention.rs, feed_forward.rs, layer_norm.rs). Without gradient checking tests, subtle numerical errors could accumulate. This would use proptest to validate gradients numerically against finite differences.
- [ ] Add proptest to [dev-dependencies] in Cargo.toml
- [ ] Create tests/gradient_check_test.rs with numerical gradient verification for: self_attention.rs, feed_forward.rs, and layer_norm.rs modules
- [ ] Implement a generic gradient_check() function that computes finite differences and compares against backpropagated gradients
- [ ] Test with random input dimensions and values to catch edge cases (zero values, extreme scales, etc.)
🌿Good first issues
- Add GPU acceleration via
ndarray-linalgwith BLAS backend: currently all matrix ops are pure Rust, making training 100x+ slower than necessary. Create a feature flag in Cargo.toml for optional GPU support in self_attention.rs and feed_forward.rs matrix multiplications. - Implement model checkpoint saving/loading: add a
save_model()andload_model()function in src/llm.rs using bincode serialization, and integrate into src/main.rs so training can resume. Currently weights are lost after each run. - Write integration tests that train a minimal model end-to-end: tests/ has unit tests for each module, but no test in tests/llm_test.rs that actually trains on a tiny dataset and validates loss decreases. Add a 100-token test corpus and verify training convergence.
⭐Top contributors
Click to expand
Top contributors
- @tekaratzas — 51 commits
- @anshumanpatil — 16 commits
- @ben1009 — 15 commits
- [@Ari Kaufman](https://github.com/Ari Kaufman) — 6 commits
- @Theo- — 6 commits
📝Recent commits
Click to expand
Recent commits
772c469— Merge pull request #19 from ben1009/readme (tekaratzas)e977fe5— chore: fix readme workflow badge (ben1009)cf1f7d3— Merge pull request #11 from ben1009/housekeeping (tekaratzas)51e79eb— Merge branch 'main' into housekeeping (ben1009)0861377— update (ben1009)4f407f4— comment out the cov gha (ben1009)1159247— Merge pull request #17 from tomschelsen/dry (tekaratzas)cc45a4e— fix formatting (Thomas Roques)d23da20— Refactor: Eliminate repeated vocabulary processing code by moving to vocab module (Thomas Roques)2ec33c7— chore: fix lints (ben1009)
🔒Security observations
The RustGPT codebase demonstrates reasonable security practices as a research project written in memory-safe Rust. The primary concerns are: (1) an invalid Rust edition specification that must be corrected immediately, (2) use of pre-release dependency versions which lack stability guarantees, and (3) lack of explicit security hardening configurations. The project does not appear to handle sensitive data or expose network services based on visible structure. For a non-production educational project, this posture is acceptable, but moving toward production would require addressing dependency stability, adding security lints, and implementing comprehensive input validation patterns.
- High · Invalid Rust Edition in Cargo.toml —
Cargo.toml. The Cargo.toml specifies edition = '2024', which is not a valid Rust edition. Valid editions are 2015, 2018, and 2021. This could cause build failures or unexpected behavior, potentially masking security issues during compilation. Fix: Change edition to a valid value such as '2021' (the latest stable edition). Update: edition = '2021' - Medium · Outdated Dependency Versions —
Cargo.toml [dependencies]. Several dependencies may have known vulnerabilities or be significantly outdated: rand = '0.9.2' (pre-release version), ndarray = '0.16.1'. Pre-release versions should not be used in production as they lack stability guarantees and may have unpatched security issues. Fix: Use stable, well-maintained versions. Review each dependency with 'cargo audit' to identify known vulnerabilities. Consider using version constraints like '0.8' instead of pre-release versions. - Medium · No Dependency Pinning or Lock File Management —
Cargo.toml. While Cargo.lock is present, the Cargo.toml uses flexible version constraints without explicit pinning. This could lead to inconsistent builds if dependencies release minor updates with behavioral changes or security issues. Fix: Use exact version pinning for critical dependencies or tighter version constraints. For example, use '=1.0' or '~1.0' instead of '1.0'. Ensure Cargo.lock is committed to version control. - Low · Missing Security-Related Configuration —
Cargo.toml. No explicit security configurations found in Cargo.toml such as rustflags for overflow checks, debug symbols handling, or lint configurations. The project lacks explicit security-hardening settings. Fix: Add security-focused configurations: [profile.release] with overflow-checks = true, strip = false, lto = true. Consider adding [lints] section with security-related clippy warnings. - Low · No Input Validation Framework —
src/dataset_loader.rs. The dataset loader (dataset_loader.rs) loads JSON data from files without explicit mention of input validation patterns. If data comes from untrusted sources, malformed JSON could cause panics or unexpected behavior. Fix: Implement robust error handling for JSON deserialization. Use serde's features like #[serde(deny_unknown_fields)] to reject unexpected data. Validate array/tensor dimensions before processing. - Low · Limited Test Coverage Visibility —
tests/ directory. While tests exist, there is no visible code coverage configuration or security-focused test cases. Critical numerical operations in neural network components should have validated bounds checking. Fix: Add security-focused unit tests for: buffer bounds, numerical overflow scenarios, malformed input handling. Configure coverage reporting with codecov (already configured in .github/codecov.yml).
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.