rustformers/llm

Item: rustformers/llm
Rating: 5
Author: RepoPilot

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

Healthy

Healthy across all four use cases

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓16 active contributors
✓Distributed ownership (top contributor 35% of recent commits)
✓Apache-2.0 licensed

Show all 6 evidence items →

✓CI configured
✓Tests present
⚠Stale — last commit 2y ago

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/rustformers/llm)](https://repopilot.app/r/rustformers/llm)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/rustformers/llm on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: rustformers/llm

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/rustformers/llm shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

16 active contributors
Distributed ownership (top contributor 35% of recent commits)
Apache-2.0 licensed
CI configured
Tests present
⚠ Stale — last commit 2y ago

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live rustformers/llm repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/rustformers/llm.

What it runs against: a local clone of rustformers/llm — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in rustformers/llm | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 712 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>rustformers/llm</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of rustformers/llm. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/rustformers/llm.git
#   cd llm
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of rustformers/llm and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "rustformers/llm(\\.git)?\\b" \\
  && ok "origin remote is rustformers/llm" \\
  || miss "origin remote is not rustformers/llm (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "crates/llm-base/src/lib.rs" \\
  && ok "crates/llm-base/src/lib.rs" \\
  || miss "missing critical file: crates/llm-base/src/lib.rs"
test -f "crates/ggml/src/lib.rs" \\
  && ok "crates/ggml/src/lib.rs" \\
  || miss "missing critical file: crates/ggml/src/lib.rs"
test -f "crates/llm/src/lib.rs" \\
  && ok "crates/llm/src/lib.rs" \\
  || miss "missing critical file: crates/llm/src/lib.rs"
test -f "binaries/llm-cli/src/main.rs" \\
  && ok "binaries/llm-cli/src/main.rs" \\
  || miss "missing critical file: binaries/llm-cli/src/main.rs"
test -f "crates/ggml/sys/build.rs" \\
  && ok "crates/ggml/sys/build.rs" \\
  || miss "missing critical file: crates/ggml/sys/build.rs"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 712 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~682d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/rustformers/llm"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

llm is a Rust ecosystem for running quantized large language models locally via GGML bindings, enabling CPU/GPU inference of models like Llama, MPT, and Bloom without external APIs. It provides low-level GGML FFI bindings (crates/ggml/sys) plus a high-level inference API (crates/llm) with model-specific implementations and a CLI tool for interactive prompting. Monorepo with workspace structure: crates/ggml contains low-level C FFI bindings (crates/ggml/sys) + safe Rust wrapper (context.rs, accelerator/), crates/llm provides the high-level inference API, crates/models/* contain architecture-specific implementations (GPTJ, Llama, MPT, Bloom, GPT-NeoX), and binaries/ contain the CLI (llm-cli), test suite (llm-test with JSON configs per model), and utility binaries.

👥Who it's for

Rust developers building local LLM inference applications who want direct control over model loading, tokenization, and sampling without relying on cloud APIs. Also targets ML engineers experimenting with quantized model architectures (GGML format) in pure Rust.

🌱Maturity & risk

ARCHIVED AND UNMAINTAINED (as stated in README). The repo has substantial code (541KB Rust) and CI/CD workflows, but is no longer actively developed. The README explicitly directs users to alternatives like mistral.rs, Candle, and llama.cpp wrappers. Historical production-quality codebase, but no longer safe for new projects.

Critical risk: The project is archived with no maintenance. Last activity unknown but likely 12+ months old (not stated in data, but archival status is explicit). Zero ongoing support; the maintainer (@philpax) has moved resources elsewhere. Dependency risk: locked to Rust 1.67.1 and older ecosystem versions. Not recommended for production; use alternatives listed in README instead.

Active areas of work

Nothing. The repository is archived. No new commits, PRs, or development. The last activity likely predates this snapshot significantly. Development halted after the maintainer decided to sunset the project in favor of newer alternatives.

🚀Get running

Clone the repo: git clone https://github.com/rustformers/llm.git && cd llm. Install Rust 1.67.1+: rustup toolchain install 1.67.1. Build the default binary: cargo build --release. Note: this project is archived; consider using mistral.rs, Candle, or llama.cpp wrappers instead per the README.

Daily commands: CLI inference: cargo run --release --bin llm-cli -- prompt "Hello world" (see binaries/llm-cli/src/cli_args.rs for full options). Interactive mode: cargo run --release --bin llm-cli -- interactive <model-path>. Test suite: cargo test --release --bin llm-test with model configs from binaries/llm-test/configs/*.json.

🗺️Map of the codebase

crates/llm-base/src/lib.rs — Core entry point defining the inference session API, model traits, and loader abstractions that all model implementations depend on.
crates/ggml/src/lib.rs — Low-level GGML tensor computation bindings and context management; foundational for all inference operations.
crates/llm/src/lib.rs — High-level public API re-exporting models and base functionality; the main entry point for end users.
binaries/llm-cli/src/main.rs — Reference implementation showing how to integrate the llm crate for practical inference tasks (quantization, generation, interactive chat).
crates/ggml/sys/build.rs — C/C++ build configuration for GGML system bindings; must be updated when upgrading GGML or adding accelerator support (CUDA, Metal, OpenCL).
crates/llm-base/src/model/mod.rs — Model trait definitions and architecture-agnostic utilities; every new model architecture must implement the Model trait here.
Cargo.toml — Workspace root defining all member crates, shared dependencies, and feature flags (cuda, metal, opencl) that control backend selection.

🛠️How to make changes

Add Support for a New Model Architecture

Create a new crate under crates/models/<architecture>/ with Cargo.toml and src/lib.rs following the pattern of existing models (e.g., gptj). (crates/models/gptj/Cargo.toml)
Implement the Model trait from llm-base, defining forward pass logic, tensor operations, and state management. (crates/llm-base/src/model/mod.rs)
Register the new model in the loader's format detection by extending the match statement in loader.rs to recognize the model format. (crates/llm-base/src/loader.rs)
Re-export the new model from crates/llm/src/lib.rs so it is accessible to end users. (crates/llm/src/lib.rs)
Add model-specific test configuration in binaries/llm-test/configs/<architecture>.json and extend binaries/llm-test/src/inference.rs to test the new architecture. (binaries/llm-test/configs/llama.json)

Add a New Hardware Accelerator Backend

Create a new backend module in crates/ggml/src/accelerator/ (e.g., accelerator/opencl.rs) implementing initialization and memory management. (crates/ggml/src/accelerator/metal.rs)
Add FFI bindings in crates/ggml/sys/src/opencl.rs with raw C function declarations. (crates/ggml/sys/src/metal.rs)
Update crates/ggml/sys/build.rs to compile the C/C++ backend library conditionally and link it when the feature is enabled. (crates/ggml/sys/build.rs)
Add a feature flag in Cargo.toml (root workspace) and conditionally export the accelerator module in crates/ggml/src/lib.rs. (Cargo.toml)
Test the accelerator integration by running binaries/llm-cli with the new backend (e.g., --accelerator opencl). (binaries/llm-cli/src/cli_args.rs)

Implement a Custom Token Sampler

Define a new struct and implement the Sampler trait from crates/llm-base/src/samplers.rs with a custom sample_token method. (crates/llm-base/src/samplers.rs)
Expose the sampler in the public API by re-exporting from crates/llm-base/src/lib.rs. (crates/llm-base/src/lib.rs)
Update InferenceSession in crates/llm-base/src/inference_session.rs to accept the new sampler type in its configuration. (crates/llm-base/src/inference_session.rs)
Add example usage in crates/llm/examples/inference.rs demonstrating the custom sampler. (crates/llm/examples/inference.rs)

Add Model Quantization Support

Extend the quantization logic in crates/llm-base/src/quantize.rs to define new quantization schemes and conversion methods. (crates/llm-base/src/quantize.rs)
Update the GGML format loader/saver in crates/ggml/src/format/loader.rs to recognize and deserialize the new quantization format. (crates/ggml/src/format/loader.rs)
Add CLI support in binaries/llm-cli/src/main.rs for the quantize command with the new schema. (binaries/llm-cli/src/main.rs)
Test quantization in binaries/llm-test/src/inference.rs by loading and inferencing quantized models. (binaries/llm-test/src)

🪤Traps & gotchas

Build system: requires Rust 1.67.1 exactly (from cargo-dist config); mismatched toolchain may fail. GGML bindings: crates/ggml/sys requires a C compiler and CMake to build ggml from source; ensure these are installed. Model format: expects GGML-format quantized models (.gguf); other formats will fail silently or with cryptic errors. No async by default: inference is synchronous; CLI/integration tests do not leverage tokio despite workspace depending on tracing. Metal acceleration: Apple Silicon only; Linux/Windows default to CPU despite workspace config suggesting CUDA/SYCL support (not visible in file list).

💡Concepts to learn

GGML (Generative Graph Machine Learning) — llm's entire runtime is built on GGML's C library for fast quantized tensor inference; understanding tensor allocation, computation graphs, and quantization formats (int4, int8) is essential
Quantization (int4/int8 weight compression) — llm loads .gguf quantized models to run on CPU; bit-width choice (Q4_0, Q8_0, etc.) directly impacts speed vs. accuracy tradeoff in inference
Token streaming / Sampler plugins — llm uses llm-samplers crate for temperature/top-p/top-k token selection during generation; understanding sampling strategy affects output quality and reproducibility
Memory-mapped file I/O (memmap2) — llm loads multi-GB model files via memmap2 to avoid full in-memory copies; critical for inference on memory-constrained systems
FFI (Foreign Function Interface) and unsafe Rust bindings — crates/ggml/sys wraps C library GGML in unsafe Rust; understanding how unsafe blocks, CString marshaling, and lifetime management work is required to modify core inference
BPE / Byte-Pair Encoding tokenization — Models in llm use BPE tokenizers (embedded in model files) to convert text to token IDs; mismatch between tokenizer and model causes silent inference failures
Hardware acceleration (Metal API, CUDA, SYCL) — crates/ggml/src/accelerator/ conditionally enables GPU compute; Metal on Apple Silicon, CUDA on NVIDIA—fallback is CPU-only, orders of magnitude slower

EricLBuehler/mistral.rs — Modern maintained alternative using Candle backend; recommended replacement per llm's own README for quantized LLM inference
huggingface/candle — Pure Rust ML framework underlying mistral.rs and candle-transformers; llm's successor ecosystem
ggerganov/llama.cpp — Original C++ GGML-based inference engine that llm wraps via FFI; gold-standard quantized LLM inference (unmaintained llm recommended llama.cpp wrappers instead)
rustformers/models — Likely companion repo containing pre-quantized model weights and format specs for llm's model architectures (not in workspace, check if exists)
huggingface/ratchet — wgpu-based ML inference library with web support; alternative GPU acceleration path mentioned in llm's README for different use cases

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for model format loader/saver in crates/ggml/src/format/

The crates/ggml/src/format/ directory has loader.rs and saver.rs but crates/ggml/src/tests.rs appears minimal. Given this is a core serialization concern for an LLM inference library, comprehensive tests for loading/saving GGML format files are critical to prevent regressions. This is especially important since the repo is unmaintained and needs stability.

[ ] Review crates/ggml/src/format/loader.rs and saver.rs to understand the format specification
[ ] Create test fixtures (small .ggml files) in crates/ggml/tests/ directory
[ ] Add roundtrip tests in crates/ggml/src/tests.rs or new crates/ggml/tests/format_tests.rs that load and save models, verifying byte-for-byte consistency
[ ] Add tests for edge cases (corrupt files, unsupported versions, empty tensors)
[ ] Verify tests run in CI via .github/workflows/rust.yml

Add missing model configuration validation tests in binaries/llm-test/

The binaries/llm-test/configs/ directory contains JSON config files for bloom, gptj, gptneox, llama, and mpt models, but there's no validation test suite. Since these configs are critical for model inference, a dedicated test binary should validate schema correctness, required fields, and compatibility with the model loaders to catch configuration issues early.

[ ] Review crates/llm/src/ to identify model configuration structs and their required fields
[ ] Create binaries/llm-test/src/config_validation.rs with tests for each config JSON in binaries/llm-test/configs/
[ ] Add schema validation tests (verify required keys like model_type, dimensions, vocab_size exist)
[ ] Add tests in binaries/llm-test/src/main.rs to run config validation as a CI step
[ ] Document expected schema in binaries/llm-test/README.md (if it exists) or create one

Add accelerator feature gate tests in crates/ggml/src/accelerator/

The crates/ggml/src/accelerator/ module supports metal.rs and references cuda/opencl in crates/ggml/sys/. However, there's no visible test coverage for accelerator initialization and fallback behavior. Adding tests ensures that feature gates (metal, cuda, opencl) compile and initialize correctly without breaking the base CPU implementation.

[ ] Review crates/ggml/Cargo.toml to identify accelerator feature flags (likely metal, cuda, opencl)
[ ] Create crates/ggml/tests/accelerator_tests.rs with tests for each feature combination
[ ] Add tests that verify correct accelerator is selected based on features and platform
[ ] Add tests for graceful fallback when accelerator is unavailable
[ ] Update .github/workflows/rust.yml to run tests with different feature combinations (--features metal, --no-default-features, etc.)

🌿Good first issues

Add missing integration tests for GPT-NeoX model (binaries/llm-test/src/inference.rs handles GPTJ, Llama, MPT, Bloom, but GPT-NeoX config exists but may lack test coverage; verify and fill gaps)
Document the GGML quantization format and supported bit-widths in crates/ggml/README.md; currently vague on what .gguf variants are loadable
Add Dockerfile optimization: current setup lacks multi-stage build; reduce image size by separating build (Rust 1.67.1 + build-essential) from runtime (minimal binary + libc)

⭐Top contributors

Click to expand

@LLukas22 — 35 commits
@philpax — 30 commits
@KerfuffleV2 — 8 commits
@AmineDiro — 8 commits
@chris-ha458 — 3 commits

📝Recent commits

Click to expand

b11ffb1 — Archival notice (philpax)
9376078 — docs(readme): further detail current branches (philpax)
581b2a0 — Merge pull request #443 from JRazek/elided_lifetimes_in_associated_constant-fixup (philpax)
039cd41 — fix &'static lifetime warning (JRazek)
5bbab04 — docs(readme): mention current state (philpax)
8d4a696 — Merge pull request #441 from rustformers/update-vuln-deps (philpax)
8a3aeec — chore: update rustix 0.38 (philpax)
b4ca924 — chore: update vulnerable deps (philpax)
23c3047 — fix(ggml): don't use Neon on macOS aarch64 (philpax)
23e4b46 — Merge pull request #440 from KerfuffleV2/feat-llm-samplers-0.0.7 (philpax)

🔒Security observations

This codebase presents moderate security concerns primarily due to its archived/unmaintained status, which means security vulnerabilities will not be patched. The primary security risks include: (1) Unvalidated model file loading which could be exploited, (2) Potential input validation gaps in the CLI interface, (3) Use of outdated Rust toolchain and dependencies, and (4) Lack of security infrastructure. The Rust language provides memory safety guarantees that mitigate some attack vectors, but the project should NOT be used in production environments. Users should migrate to actively maintained alternatives as recommended in the repository's README.

Medium · Archived/Unmaintained Codebase — README.md, Repository Status. The repository is explicitly marked as archived and unmaintained. This means security vulnerabilities discovered in dependencies will not be patched, and the codebase will not receive updates for new security issues. Fix: Do not use this codebase in production. Migrate to actively maintained alternatives as recommended in the README (Ratchet, Candle, mistral.rs, etc.).
Medium · Potential Unvalidated Model Loading — crates/llm-base/src/loader.rs, crates/ggml/src/format/loader.rs. The codebase includes model loading functionality (crates/llm-base/src/loader.rs) without visible input validation mechanisms. Loading untrusted GGML model files could potentially lead to arbitrary code execution or denial of service if the format parser has vulnerabilities. Fix: Implement strict validation of model file formats. Verify file signatures, checksums, and structure before loading. Consider sandboxing model loading operations and limiting resource consumption (memory, compute time).
Medium · No Visible Input Sanitization in CLI — binaries/llm-cli/src/main.rs, binaries/llm-cli/src/cli_args.rs. The CLI binary (binaries/llm-cli) accepts user input for prompts and model paths without visible sanitization. While Rust's memory safety prevents some attacks, insufficient validation could lead to information disclosure or unintended behavior. Fix: Implement comprehensive input validation for all user-supplied data. Validate file paths, prompt lengths, and other parameters against strict whitelists.
Low · Outdated Rust Toolchain Version — Cargo.toml, workspace.metadata.dist. The project specifies Rust toolchain 1.67.1 in workspace metadata, which is from January 2023. While Rust's release cycle is predictable, using older toolchains may miss important security patches and compiler improvements. Fix: Update rust-toolchain-version to the latest stable release (1.75+). Regularly update Rust and all dependencies to receive security patches.
Low · Dependency Version Pinning Strategy — Cargo.toml, workspace.dependencies. Several workspace dependencies use broad version ranges (e.g., 'anyhow = "1.0"', 'serde = "1.0"') without upper bounds. This allows any minor/patch version within major versions, which could introduce breaking changes or vulnerabilities. Fix: Consider using more restrictive version constraints. For critical dependencies, use '1.0.0' format to pin to specific patch versions. At minimum, review dependency updates regularly.
Low · No Visible Security Audit Trail — Repository root. There is no evidence of security audits, vulnerability scanning, or security policy in the repository. The archival status compounds this concern. Fix: If this codebase were to be maintained: establish a security policy (SECURITY.md), enable dependabot/renovate for automated dependency scanning, and conduct regular security audits.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

rustformers/llm

Embed the "Healthy" badge

Onboarding doc

Onboarding: rustformers/llm

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

Add Support for a New Model Architecture

Add a New Hardware Accelerator Backend

Implement a Custom Token Sampler

Add Model Quantization Support

🪤Traps & gotchas

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add integration tests for model format loader/saver in crates/ggml/src/format/

Add missing model configuration validation tests in binaries/llm-test/

Add accelerator feature gate tests in crates/ggml/src/accelerator/

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next