EricLBuehler/mistral.rs

Item: EricLBuehler/mistral.rs
Rating: 5
Author: RepoPilot

Fast, flexible LLM inference

Healthy

Healthy across the board

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 3w ago
✓14 active contributors
✓MIT licensed

Show all 6 evidence items →

✓CI configured
✓Tests present
⚠Concentrated ownership — top contributor handles 79% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/ericlbuehler/mistral.rs)](https://repopilot.app/r/ericlbuehler/mistral.rs)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/ericlbuehler/mistral.rs on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: EricLBuehler/mistral.rs

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/EricLBuehler/mistral.rs shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 3w ago
14 active contributors
MIT licensed
CI configured
Tests present
⚠ Concentrated ownership — top contributor handles 79% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live EricLBuehler/mistral.rs repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/EricLBuehler/mistral.rs.

What it runs against: a local clone of EricLBuehler/mistral.rs — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in EricLBuehler/mistral.rs | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 53 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>EricLBuehler/mistral.rs</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of EricLBuehler/mistral.rs. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/EricLBuehler/mistral.rs.git
#   cd mistral.rs
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of EricLBuehler/mistral.rs and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "EricLBuehler/mistral.rs(\\.git)?\\b" \\
  && ok "origin remote is EricLBuehler/mistral.rs" \\
  || miss "origin remote is not EricLBuehler/mistral.rs (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "Cargo.toml" \\
  && ok "Cargo.toml" \\
  || miss "missing critical file: Cargo.toml"
test -f "mistralrs-core/src/lib.rs" \\
  && ok "mistralrs-core/src/lib.rs" \\
  || miss "missing critical file: mistralrs-core/src/lib.rs"
test -f "mistralrs-server-core/src/lib.rs" \\
  && ok "mistralrs-server-core/src/lib.rs" \\
  || miss "missing critical file: mistralrs-server-core/src/lib.rs"
test -f "mistralrs-cli/src/main.rs" \\
  && ok "mistralrs-cli/src/main.rs" \\
  || miss "missing critical file: mistralrs-cli/src/main.rs"
test -f "mistralrs-pyo3/src/lib.rs" \\
  && ok "mistralrs-pyo3/src/lib.rs" \\
  || miss "missing critical file: mistralrs-pyo3/src/lib.rs"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 53 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~23d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/EricLBuehler/mistral.rs"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

mistral.rs is a high-performance LLM inference engine written in Rust that runs any Hugging Face model with zero config. It supports true multimodality (text, vision, video, audio), offers precise quantization control (MXFP4, ISQ, UQFF), and includes built-in web UI, server capabilities, and agentic features like tool-use and web search—all optimized for CPU, CUDA, and Metal backends. Monorepo with 13 workspace members: mistralrs-core (tensor inference engine), mistralrs-server-core (HTTP/gRPC server logic), mistralrs-cli (command-line interface), mistralrs-pyo3 (Python bindings), mistralrs-quant (quantization tools), mistralrs-server (main binary), plus specialized crates for vision, audio, MCP, and web chat. Chat templates live in /chat_templates/ as Jinja2 and JSON for prompt formatting.

👥Who it's for

ML engineers and AI practitioners who need fast, flexible local LLM inference without vendor lock-in; developers building agentic AI systems; teams running multimodal inference on heterogeneous hardware (laptops, servers, GPUs); anyone using Hugging Face models who wants Python SDK or CLI tooling instead of learning multiple inference frameworks.

🌱Maturity & risk

Production-ready and actively maintained. The project shows 8.6M LOC (82% Rust), organized CI/CD across CPU/CUDA/Metal platforms (.github/workflows/), comprehensive documentation, Python/Rust SDKs, and recent feature releases (Gemma 4, MXFP4 quantization, Qwen 3.5). Version 0.8.1 suggests stable API; no major red flags in structure.

Single-maintainer risk (EricLBuehler) with broad responsibility across Rust core, Python bindings, CUDA/Metal kernels, and web UI. Heavy dependency on Candle (HuggingFace) for tensor ops—breaking updates there cascade here. Complexity of supporting 5 compute backends (CPU, CUDA, Metal, flash-attn, flash-attn-v3) increases maintenance burden. Active but not massive community (use Discord for support).

Active areas of work

Recent focus on multimodal expansion: Gemma 4 (text/image/video/audio), MXFP4 ISQ quantization with optimized kernels, Qwen 3.5 vision support. GitHub workflows show active CI for CPU and CUDA builds. Docs expanding (GEMMA4.html, VIDEO.html, QUANTS.html added recently). MCP integration added for tool dispatch. Tool-use and agentic features actively developed.

🚀Get running

# macOS/Linux
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.sh | sh

# Windows PowerShell
irm https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.ps1 | iex

# Or build from source
git clone https://github.com/EricLBuehler/mistral.rs
cd mistral.rs
cargo build --release

# Run a model
./target/release/mistralrs run -m Qwen/Qwen3-4B

Daily commands:

# Interactive chat
mistralrs run -m Qwen/Qwen3-4B

# One-shot inference
mistralrs run -m Qwen/Qwen3-4B -i "What is the capital of France?"

# With image input
mistralrs run -m google/gemma-4-E4B-it --image photo.jpg -i "Describe this image"

# Web server with UI
mistralrs serve --ui -m google/gemma-4-E4B-it
# Visit http://localhost:1234/ui

# Python SDK
pip install mistralrs
# See Python docs at https://ericlbuehler.github.io/mistral.rs/PYTHON_SDK.html

🗺️Map of the codebase

Cargo.toml — Workspace root definition with all member crates (core, server, CLI, PyO3) and shared dependencies; essential for understanding project structure and build configuration.
mistralrs-core/src/lib.rs — Core inference engine entry point; primary abstraction for model loading, tokenization, and generation logic that all frontends depend on.
mistralrs-server-core/src/lib.rs — HTTP server abstraction layer; bridges between REST/streaming API and core inference engine, handles request routing and response serialization.
mistralrs-cli/src/main.rs — CLI entry point; demonstrates how to configure and invoke the inference engine programmatically, provides quickest onboarding path.
mistralrs-pyo3/src/lib.rs — Python bindings via PyO3; exposes Rust core to Python ecosystem, critical for multi-language support and accessibility.
docs/CONFIGURATION.md — Comprehensive configuration reference for all model types and backend options; essential reference for understanding supported architectures and tuning parameters.
docs/GETTING_STARTED.md — Primary onboarding documentation covering installation, basic usage, and common workflows across all interfaces.

🛠️How to make changes

Add Support for a New LLM Architecture

Create model-specific inference struct in mistralrs-core/src/models/{model_name}/mod.rs following existing architecture (e.g., mistralrs-core/src/models/mistral/mod.rs) (mistralrs-core/src/models/{model_name}/mod.rs)
Implement the Model trait with forward() and cache requirements in the new module (mistralrs-core/src/models/{model_name}/mod.rs)
Register the new model in the pipeline loader/factory in mistralrs-core/src/pipeline.rs (mistralrs-core/src/pipeline.rs)
Add model variant to the SelectedModel enum if creating a new model family (mistralrs-core/src/lib.rs)
Create documentation file docs/{MODEL_NAME}.md with architecture details and example configurations (docs/{MODEL_NAME}.md)
Add configuration examples to chat_templates/ for chat formatting if applicable (chat_templates/{model_name}.json)

Add a New HTTP API Endpoint

Define the request/response structs in mistralrs-server-core/src/request.rs or mistralrs-server-core/src/response.rs (mistralrs-server-core/src/request.rs)
Implement handler function in mistralrs-server-core/src/handlers.rs following axum patterns (mistralrs-server-core/src/handlers.rs)
Register the route in mistralrs-server/src/main.rs within the axum Router setup (mistralrs-server/src/main.rs)
Add endpoint documentation to docs/HTTP.md with curl examples (docs/HTTP.md)
Update OpenAPI/schema definitions if present or add to API specification (mistralrs-server-core/src/lib.rs)

Add a New Quantization Method

Create quantization implementation in mistralrs-quant/src/{quantization_name}.rs (mistralrs-quant/src/{quantization_name}.rs)
Implement the Quantizer trait with encode/decode and dtype methods (mistralrs-quant/src/lib.rs)
Register the quantization type in the quantization factory/dispatcher (mistralrs-quant/src/lib.rs)
Add configuration variant to QuantizationConfig enum in mistralrs-core/src/lib.rs (mistralrs-core/src/lib.rs)
Document the method in docs/QUANTS.md with performance characteristics and usage examples (docs/QUANTS.md)

Add Python SDK Features

Extend the PyO3 module struct in mistralrs-pyo3/src/lib.rs with new #[pymethods] (mistralrs-pyo3/src/lib.rs)
Define Python class wrapper around Rust types using #[pyclass] and PyO3 macros (mistralrs-pyo3/src/lib.rs)
Update type conversions and error handling in the PyO3 bridge layer (mistralrs-pyo3/src/lib.rs)
Add Python examples and documentation to docs/PYTHON_SDK.md (docs/PYTHON_SDK.md)

🔧Why these technologies

Rust + Candle (HuggingFace) — Memory-safe systems language with zero-cost abstractions; Candle provides GPU kernels optimized for transformers, eliminating Python GIL and C++ complexity for inference
Axum (HTTP framework) — Async, composable router with minimal overhead; native Rust error handling and type safety for OpenAI-compatible REST API
PyO3 (Python bindings) — Zero-copy bindings between Rust and Python; allows researchers/practitioners to use Rust performance with Python familiarity
CUDA/cuDNN backend support — Leverages Candle's GPU kernels for 10-100x speedup on inference; optional at compile-time to support CPU fallback
Paged attention + quantization — undefined

🪤Traps & gotchas

Quantization format: Models require UQFF (mistralrs universal quantized format)—can't directly use safetensors quantized weights; mistralrs quantize command converts them first. 2. CUDA/Metal optional: Default builds for CPU only; enable with cargo features (cuda, metal). Missing CUDA driver will fall back to CPU silently but slowly. 3. Chat template matching: Model template is auto-selected by HuggingFace model ID regex; custom models may require manual template specification via --chat-template flag. 4. Device mapping: For multi-GPU setups, device-mapper logic is implicit in Candle; not easily configurable via CLI flags. 5. Python version: PyO3 builds require Python 3.8+ development headers; on some systems you need python3-dev or python3-devel package. 6. Streaming latency: Token streaming uses SSE; client must handle Connection: keep-alive properly or timeouts occur.

🏗️Architecture

💡Concepts to learn

Paged Attention — mistral.rs uses paged attention (mistralrs-paged-attn crate) to reduce memory fragmentation during long sequence generation; critical for multi-user server scenarios and large batch sizes
Quantization (ISQ, MXFP4, UQFF) — Mistral.rs lets you choose exact quantization strategy (int8, int4, MXFP4 with custom kernels, UQFF format); understanding tradeoffs between model size, speed, and accuracy is core to using this tool effectively
Token Streaming (Server-Sent Events) — Long-running inference responses are streamed via SSE /v1/chat/completions; client-side handling of partial tokens and connection management differs from batch inference
Chat Templates (Jinja2 Prompt Engineering) — Different models expect different prompt formats; mistral.rs uses Jinja2 templates to auto-format user messages, tool-use, system prompts without manual engineering per-model
Device Abstraction Layer (CPU/CUDA/Metal) — Candle tensors are device-agnostic; mistral.rs runtime selects CPU/CUDA/Metal automatically; understanding this trait boundary helps when optimizing for specific hardware or debugging device-mismatch panics
Agentic Tool Loop (Server-Side Agent) — Mistral.rs can handle tool-use in a server-side loop (agent retries until tool_calls terminate); differs from client-side parsing of tool_use tokens; see AGENTS.md for MCP and HTTP tool dispatch
Multimodal Embedding (Vision/Audio/Video) — Gemma 4 and Qwen 3.5 support multimodal input; mistral.rs encodes images/video/audio to embeddings before feeding to LLM; architecture differs from text-only and affects memory/latency budgets

huggingface/candle — Core tensor computation engine and model definitions that mistral.rs builds on; breaking changes here cascade directly
openai/gpt-4 — API compatibility target—mistral.rs exposes /v1/chat/completions endpoint to be a drop-in OpenAI replacement
oobabooga/text-generation-webui — Competing web UI for local LLM inference; mistral.rs offers lighter-weight single-binary alternative with quantization tuning
LM-Sys/FastChat — Related inference serving framework with chat templates; mistral.rs borrowed chat template architecture (JSON/Jinja2) from FastChat ecosystem
ollama/ollama — Simplest competitor for 'mistralrs run -m modelname'; ollama emphasizes ease while mistral.rs emphasizes control and multimodality

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive integration tests for chat template rendering across all supported models

The repo has 15+ chat templates (chatml.json, llama3.json, mistral.json, gemma3n.jinja, etc.) in chat_templates/ but there's no visible test suite validating that each template correctly formats prompts for their respective models. This is critical for ensuring chat compatibility across different LLMs and preventing silent formatting bugs that degrade inference quality.

[ ] Create mistralrs-core/tests/chat_templates_test.rs with test cases for each .json and .jinja template
[ ] Add parametrized tests validating template rendering for model families: Llama, Mistral, Gemma, Phi, DeepSeek
[ ] Test edge cases: multi-turn conversations, tool calls (mistral_small_tool_call.jinja, deepseek_tool_call.jinja), system prompts
[ ] Add integration with CI pipeline (GitHub Actions already exists in .github/workflows/) to run these tests on each commit

Add feature flag documentation and validation tests for conditional compilation paths

The workspace has many specialized crates (mistralrs-paged-attn, mistralrs-quant, mistralrs-vision, mistralrs-audio, mistralrs-mcp) that are likely gated behind Cargo features. docs/CARGO_FEATURES.md likely documents these, but there's no test coverage ensuring that builds with different feature combinations don't break. This is essential for users choosing minimal vs. full installations.

[ ] Create a new test file mistralrs-core/tests/feature_combinations_test.rs using conditional compilation (#[cfg(feature = '...')]) to validate key feature interactions
[ ] Add a new GitHub Actions workflow .github/workflows/feature_test.yaml that matrix-tests common feature combinations (e.g., cuda+vision, cpu-only, quant-only)
[ ] Document in docs/CARGO_FEATURES.md which features are mutually exclusive and which are safe to combine
[ ] Update Makefile with a test-features target to support local feature testing

Implement missing device mapping tests for multi-device inference scenarios

docs/DEVICE_MAPPING.md exists but likely lacks test coverage validating that models correctly distribute across CPU/GPU/Metal devices as documented. Given the workspace includes DISTRIBUTED infrastructure (docs/DISTRIBUTED/NCCL.md, RING.md), this is critical for production deployments.

[ ] Create mistralrs-core/tests/device_mapping_test.rs with tests for: single GPU, multi-GPU NCCL ring topology (referencing docs/DISTRIBUTED/RING.md), CPU offloading, heterogeneous device setups
[ ] Add platform-specific tests: cuda tests (already has build_cuda_all.yaml), metal tests (for Apple Silicon given candle-metal-kernels dependency), CPU fallback tests
[ ] Add integration tests validating distributed inference with mock multi-node setup or documented local NCCL ring topology
[ ] Update docs/DEVICE_MAPPING.md with tested example configurations and expected performance characteristics

🌿Good first issues

Add missing chat template for DeepSeek-Coder variants: chat_templates/ has deepseek_tool_call.jinja but no standard deepseek.jinja2; contribute a template by examining DeepSeek tokenizer docs and testing against mistralrs serve
Expand test coverage for mistralrs-quant: quantization pipeline has no dedicated unit tests; add tests in mistralrs-quant/src/tests/ for UQFF round-trip encoding/decoding and ISQ quantization correctness on toy tensors
Document device-mapper behavior for multi-GPU inference: docs/CONFIGURATION.md lacks examples of how to shard models across GPUs; add a new file docs/DEVICE_MAPPING.md with concrete mistralrs run examples for 2+ GPU setups

⭐Top contributors

Click to expand

@EricLBuehler — 79 commits
@glaziermag — 4 commits
@setoelkahfi — 4 commits
@haricot — 2 commits
@dependabot[bot] — 2 commits

📝Recent commits

Click to expand

2d4ba4f — Add fast CUDA MMQ GGUF kernels (#2109) (EricLBuehler)
8415ec4 — Add fast CUDA MMVQ GGUF kernels (#2104) (EricLBuehler)
2b838b2 — fix(gemma4): tweak quantization pattern for lm head (EricLBuehler)
9196ed3 — fix(gemma4): no paged-attn cache cases (#2091) (EricLBuehler)
93605b8 — feat(gemma4,cuda): optimized fused moe decode path (#2090) (EricLBuehler)
a6633f4 — feat(gemma4): ~10% faster moe decode through fused moe decode kernels (#2080) (EricLBuehler)
f2fe3a4 — feat(gemma4): 3.5-5.5x faster moe prefill for quantized cuda case (#2077) (EricLBuehler)
49ba0e9 — fix(gemma4): tool call and masking fix (#2073) (EricLBuehler)
7f1c985 — fix(tools): fixes and cleanup for tool agentic (#2069) (EricLBuehler)
d1df29e — feat(docs): improved docs, guides, correctness (#2065) (EricLBuehler)

🔒Security observations

The mistral.rs codebase demonstrates reasonable security hygiene with a multi-stage Docker build, minimal runtime dependencies, and organized structure. However, there are notable concerns: (1) the use of 'rust:latest' in the Dockerfile undermines reproducibility and version control, (2) the incomplete Dockerfile prevents full verification, (3) the container runs as root without a non-root user directive, and (4) dependency versions are not pinned to exact versions, allowing for uncontrolled updates. No SQL injection, XSS, or hardcoded credentials were identified in the provided file structure. Recommendations focus on stricter version pinning, proper container user configuration, and completing the Dockerfile for analysis.

High · Using 'rust:latest' Base Image in Docker Build Stage — Dockerfile, line 3. The Dockerfile uses 'rust:latest' as the base image for the builder stage. Using 'latest' tags means the image is not pinned to a specific version, which can lead to unexpected changes, security patches being missed, or breaking changes in future builds. This violates the principle of reproducible and deterministic builds. Fix: Pin the Rust version to a specific release tag, e.g., 'FROM rust:1.88-bookworm' or the appropriate stable version matching workspace.rust-version.
Medium · Incomplete Dockerfile - Truncated COPY Command — Dockerfile, final lines. The Dockerfile appears to be truncated with an incomplete COPY command at the end ('COPY --chmod=755 --from=builder /mistralrs/target/re'). This suggests the file may be incomplete or corrupted, making it impossible to verify all security configurations in the runtime stage. Fix: Complete the Dockerfile and verify all binaries are properly copied with appropriate permissions. Ensure the full build context is provided for security analysis.
Medium · Missing Security Headers and Health Checks — Dockerfile. The Dockerfile does not include HEALTHCHECK instructions or document exposed ports. For a server application (mistralrs-server), this can make container orchestration and security monitoring more difficult. Fix: Add HEALTHCHECK directive and explicitly document EXPOSE directives. Include appropriate health check commands relevant to the mistralrs-server functionality.
Low · Overly Permissive Binary Permissions — Dockerfile, COPY commands. Binaries are copied with --chmod=755 permissions, which grants execute permissions to all users (owner, group, and others). While necessary for execution, this is more permissive than strictly required. Fix: Consider using --chmod=750 or --chmod=755 with proper user/group ownership (using USER directive) to restrict execution to only necessary users.
Low · Missing USER Directive in Docker Image — Dockerfile, runtime stage. The Dockerfile does not specify a USER directive, meaning containers will run as root by default. This violates the principle of least privilege. Fix: Add 'RUN useradd -m -u 1000 mistralrs' and 'USER mistralrs' directives to run the container as a non-root user.
Low · No SBOM or Security Scanning in CI/CD — .github/workflows/. While the repository has CI/CD workflows, there's no evidence of container image scanning, dependency vulnerability scanning, or SBOM generation in the provided workflow files or Dockerfile. Fix: Integrate tools like Trivy, Snyk, or docker scout into the CI/CD pipeline to scan for vulnerabilities in dependencies and container images.
Low · Broad Workspace Build Without Pinned Versions — Cargo.toml, Dockerfile line 9. The Dockerfile builds all workspace members with '--workspace --exclude mistralrs-pyo3', and dependency versions in Cargo.toml use flexible version specifications (e.g., '0.10.2', '0.8.8'). This can allow minor/patch version updates that may introduce vulnerabilities. Fix: Pin dependencies to exact versions using '=' operator (e.g., '=0.10.2') for critical security-sensitive dependencies. Review and update dependencies regularly through a controlled process.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

EricLBuehler/mistral.rs

Embed the "Healthy" badge

Onboarding doc

Onboarding: EricLBuehler/mistral.rs

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

Add Support for a New LLM Architecture

Add a New HTTP API Endpoint

Add a New Quantization Method

Add Python SDK Features

🔧Why these technologies

🪤Traps & gotchas

🏗️Architecture

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive integration tests for chat template rendering across all supported models

Add feature flag documentation and validation tests for conditional compilation paths

Implement missing device mapping tests for multi-device inference scenarios

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next