rust-ml/linfa
A Rust machine learning framework.
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 1d ago
- ✓42+ active contributors
- ✓Distributed ownership (top contributor 33% of recent commits)
Show all 6 evidence items →Show less
- ✓Apache-2.0 licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/rust-ml/linfa)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/rust-ml/linfa on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: rust-ml/linfa
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/rust-ml/linfa shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 1d ago
- 42+ active contributors
- Distributed ownership (top contributor 33% of recent commits)
- Apache-2.0 licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live rust-ml/linfa
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/rust-ml/linfa.
What it runs against: a local clone of rust-ml/linfa — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in rust-ml/linfa | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of rust-ml/linfa. If you don't
# have one yet, run these first:
#
# git clone https://github.com/rust-ml/linfa.git
# cd linfa
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of rust-ml/linfa and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "rust-ml/linfa(\\.git)?\\b" \\
&& ok "origin remote is rust-ml/linfa" \\
|| miss "origin remote is not rust-ml/linfa (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/rust-ml/linfa"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Linfa is a Rust machine learning framework providing a scikit-learn-inspired toolkit for classical ML algorithms and preprocessing. It includes Naive Bayes, K-Means, DBSCAN, Gaussian Mixture Models, ensemble methods, elastic net regression, and FTRL optimization—all with native Rust performance and ndarray-based matrix operations. Monorepo structured as workspace with core linfa crate at root (provides traits, hyperparams, utils) and algorithm-specific subcrates under algorithms/ (linfa-bayes, linfa-clustering, linfa-ensemble, etc.). Each algorithm crate contains src/, examples/, and benches/ directories. Root Cargo.toml defines workspace members and shared dependencies (ndarray, num-traits, rand). Feature flags control optional BLAS backends (openblas-static, intel-mkl-system, netlib-system).
👥Who it's for
Rust systems engineers and data scientists building ML pipelines who need production-grade classical algorithms without Python dependencies; contributors are ML algorithm implementers targeting the Rust-ML ecosystem.
🌱Maturity & risk
Actively maintained (version 0.8.1, recent CI/CD workflows present). Multiple tested/benchmarked algorithm categories, comprehensive GitHub Actions setup (testing, linting, benching, docs), and workspace organization across 10+ algorithm crates indicate solid foundation. Still pre-1.0, suggesting API stability not yet guaranteed.
Moderate risk: monorepo with many interdependent crates (linfa-core depends on ndarray/ndarray-linalg, which require BLAS/LAPACK via feature flags). Dependency on external linear algebra libraries (OpenBLAS, Intel MKL, netlib) adds platform-specific build complexity. Single-maintainer risk typical of open-source ML projects; breaking changes possible before 1.0.
Active areas of work
No specific recent commits visible in file list, but CI/CD pipeline (.github/workflows/) shows active testing, benching, code quality, and documentation workflows. README indicates status tracking per algorithm (tested/benchmarked/partial-fit). The codebase appears in steady-state maintenance with focus on correctness and performance benchmarking rather than rapid feature development.
🚀Get running
git clone https://github.com/rust-ml/linfa.git
cd linfa
cargo build
cargo test
cargo test --workspace # Test all algorithm crates
For specific algorithms: cargo build -p linfa-clustering or cargo test -p linfa-bayes.
Daily commands:
No server to start; this is a library. Run examples: cargo run --example kmeans --manifest-path algorithms/linfa-clustering/Cargo.toml. Run tests: cargo test --all. Run benchmarks: cargo bench --features benchmarks --manifest-path algorithms/linfa-clustering/Cargo.toml. Documentation: cargo doc --open.
🗺️Map of the codebase
- Cargo.toml: Workspace definition; specifies all algorithm crates, shared dependencies, and feature gates for BLAS/LAPACK backends.
- algorithms/linfa-clustering/src: Largest algorithm module; demonstrates architectural patterns used across other algorithms (hyperparams, error handling, example implementations like K-Means, DBSCAN, OPTICS).
- algorithms/linfa-bayes/src/base_nb.rs: Shared base implementation for Naive Bayes variants; shows trait composition pattern likely reused in other algorithm crates.
- .github/workflows/testing.yml: Defines test matrix (multiple Rust versions, all workspace members); indicates expected compatibility and CI requirements.
- CONTRIBUTE.md: Developer contribution guidelines; essential before submitting PRs to understand code standards and workflow.
🛠️How to make changes
Start in algorithms/[algorithm-name]/src/ for algorithm logic. Core trait definitions and hyperparameter patterns are in root linfa/src/ (inferred from workspace structure). Add tests alongside implementations (src/lib.rs typically has #[cfg(test)] mod tests). Add examples in algorithms/[algorithm-name]/examples/[name].rs. Run cargo fmt and cargo clippy before committing (enforced by CI). See CONTRIBUTE.md for contribution guidelines.
🪤Traps & gotchas
BLAS/LAPACK feature flags are mutually exclusive (only one of netlib-static, openblas-static, intel-mkl-static, or none can be enabled). Missing a feature flag or choosing wrong one causes linking errors. ndarray-linalg version (0.17) may have breaking changes relative to ndarray (0.16). Windows requires careful BLAS setup (note in Cargo.toml: pprof not available on Windows). Examples assume datasets available via linfa-datasets crate with features enabled (see dev-dependencies).
💡Concepts to learn
- Naive Bayes classification (Bernoulli, Gaussian, Multinomial variants) — Linfa provides three implementations; understanding when to use each (text classification vs. continuous features) is critical for text/categorical data tasks.
- K-Means clustering and initialization strategies — Core unsupervised algorithm in linfa; implementation must handle k-means++ initialization and convergence criteria correctly to avoid poor local optima.
- DBSCAN density-based clustering — Linfa includes both standard DBSCAN and approximated variants (appx_dbscan with spatial grid); understanding epsilon and minPts hyperparameters is essential for arbitrary-shaped cluster detection.
- Elastic Net regularization (L1+L2 penalties) — Linfa's elasticnet crate combines Lasso and Ridge regression; balancing alpha/l1_ratio prevents overfitting better than either alone.
- FTRL (Follow The Regularized Leader) for online learning — Linfa's ftrl crate enables incremental/partial-fit updates; critical for streaming data or models that must update without retraining on full dataset.
- Gaussian Mixture Models (GMM) and EM algorithm — Probabilistic clustering in linfa; EM convergence and covariance matrix conditioning directly impact clustering quality.
- Hyperparameter builder pattern in Rust — Linfa uses fluent builder API across all algorithm crates (hyperparams.rs files); understanding this pattern is essential for configuring any algorithm.
🔗Related repos
rust-ml/ndarray— Foundational dependency; provides n-dimensional array abstractions that linfa algorithms depend on for matrix operations.rust-lang/packed_simd— Sibling project for vectorized computation; relevant for future performance optimization of matrix operations in linfa.huggingface/candle— Alternative Rust ML framework with GPU support (via Candle backend); direct competitor for similar ML workloads but with deep learning focus.rust-ml/linfa-datasets— Companion repo (listed in workspace) providing benchmark datasets (iris, winequality, diabetes) used in linfa examples and tests.polars-rs/polars— DataFrame library often used alongside linfa for data preprocessing and feature engineering before ML pipeline execution.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive benchmarking suite for clustering algorithms with criterion integration
The repo has benching.yml workflow and criterion dependency configured, but only 3 benchmark files exist (k_means.rs, dbscan.rs, gaussian_mixture.rs) while OPTICS algorithm lacks benchmarks. New contributors can add criterion benchmarks for OPTICS and expand existing benchmarks to cover edge cases (empty datasets, single points, high dimensions). This directly supports the 'benchmarks' feature flag in Cargo.toml and helps track performance regressions.
- [ ] Create algorithms/linfa-clustering/benches/optics.rs with criterion benchmarks for OPTICS algorithm
- [ ] Expand algorithms/linfa-clustering/benches/dbscan.rs to include appx_dbscan variant benchmarks
- [ ] Add benchmark comparison documentation in algorithms/linfa-clustering/README.md with results and performance notes
- [ ] Ensure benchmarks run in CI via .github/workflows/benching.yml
Implement missing unit tests for OPTICS algorithm and AppxDBSCAN clustering module
While other clustering modules have test files (appx_dbscan/cells_grid/tests.rs, appx_dbscan/clustering/tests.rs, etc.), the OPTICS module at algorithms/linfa-clustering/src/optics/ lacks a tests.rs file. Given the complexity of the OPTICS algorithm and its reachability-distance calculations, comprehensive unit tests are critical. This improves code reliability and serves as documentation.
- [ ] Create algorithms/linfa-clustering/src/optics/tests.rs with unit tests for reachability distance calculations
- [ ] Add test cases for edge cases: single point, identical points, sparse clusters, noise detection
- [ ] Verify clustering consistency between OPTICS and DBSCAN results on shared test datasets
- [ ] Run tests locally and ensure they pass via cargo test in the linfa-clustering crate
Add serialization (serde) support documentation and feature-gated tests for all algorithm modules
The root Cargo.toml has a 'serde' feature that gates serde_crate dependency, but there's no evidence of serde derive implementations or tests across algorithm crates (linfa-bayes, linfa-clustering, etc.). Contributing serde support with proper feature gating and tests would enable ML models to be serialized/deserialized, a common production requirement. This requires updating multiple Cargo.tomls and adding comprehensive tests.
- [ ] Add #[derive(Serialize, Deserialize)] to key fitted model structs in algorithms/linfa-bayes/src/gaussian_nb.rs, algorithms/linfa-clustering/src/k_means/algorithm.rs
- [ ] Add 'serde' feature to each algorithm crate's Cargo.toml with conditional serde dependency
- [ ] Create integration tests in each algorithm crate testing model serialization/deserialization round-trips
- [ ] Document serde feature usage in algorithms/linfa-bayes/README.md and algorithms/linfa-clustering/README.md with examples
🌿Good first issues
- Add missing tests for appx_dbscan/cells_grid/tests.rs and appx_dbscan/counting_tree/tests.rs, which are module-level but likely have incomplete coverage; run
cargo tarpaulinto identify gaps. - Document hyperparameter tuning patterns: algorithms/linfa-*/src/hyperparams.rs files use a builder pattern but lack examples in algorithm-specific READMEs; add example code showing hyperparameter selection workflow.
- Create integration examples for multi-step pipelines (e.g., normalize data → cluster → evaluate silhouette score); currently only single-algorithm examples exist in examples/ directories.
⭐Top contributors
Click to expand
Top contributors
- @relf — 33 commits
- @YuhanLiin — 13 commits
- @oojo12 — 10 commits
- @bytesnake — 3 commits
- @oglego — 2 commits
📝Recent commits
Click to expand
Recent commits
8167a62— Fix docs action: build only when not on linfa repo (relf)e69cd94— Linting (relf)1abc88f— feat: add symmetric mean absolute percentage error (sMAPE) (#437) (oglego)12c6c73— fix: realign PreprocessingError variants with error strings (#434) (oglego)17f8696— Fixes #393. label ordering in binary logistic regression,(#432) (espenloov)c7c2af5— Add genericResidualChaincomposing method (#430) (Feiyang472)b1f9ddb— Relax required test score (relf)2197362— Update to Zola 0.22 (relf)1de164b— feat(linfa-tsne): update to bhtsne 0.5.4 (#429) (AnthonyMichaelTDM)4484a55— Bump all the crates' version to 0.8.1 (relf)
🔒Security observations
The linfa Rust ML framework demonstrates a generally solid security posture. No critical vulnerabilities or hardcoded secrets were identified in the provided configuration. The main concerns are around optional features (BLAS, serde, datasets) that require careful validation at usage boundaries, and the lack of a SECURITY.md policy file for vulnerability reporting. Dependency versions are reasonable, though the recent thiserror 2.0 adoption and exact pin
- Medium · Outdated thiserror Dependency —
Cargo.toml (root) - dependencies.thiserror. The codebase uses thiserror version 2.0, which is a major version that may contain breaking changes or deprecations. This could introduce subtle bugs or maintenance issues. Version 2.0 is relatively new and less battle-tested than 1.x versions. Fix: Consider evaluating the stability of thiserror 2.0 in your use cases. If stability is critical, consider pinning to a tested 1.x version like 1.0 or investigating any breaking changes that may affect error handling. - Low · Pinned Dependency Version for sprs —
Cargo.toml (root) - dependencies.sprs. The sprs dependency is pinned to an exact version (=0.11.2) with default-features disabled. While pinning can ensure reproducibility, overly strict pinning may prevent security updates. An exact pin without a range limits flexibility for patches. Fix: Evaluate whether the exact pin is necessary. Consider using a range like '0.11.2' or '>=0.11.2, <0.12' to allow patch updates while maintaining stability. - Low · Optional BLAS Dependencies Not Validated —
Cargo.toml (root) - features section. Multiple BLAS feature flags (netlib-static, netlib-system, openblas-static, openblas-system, intel-mkl-static, intel-mkl-system) depend on external native libraries. If these are compiled from source or downloaded, there is a risk of supply chain attacks or version mismatches if not properly validated. Fix: Document which BLAS versions are tested and supported. Use feature flags to require specific versions when external dependencies are critical. Consider verifying checksums of downloaded native libraries in CI/CD pipelines. - Low · Serde Deserialization Without Validation —
Cargo.toml (root) - features.serde and [dependencies.serde_crate]. The codebase includes optional serde support for serialization/deserialization of ML models and data structures. Without custom validation, deserialization could be exploited to instantiate malicious objects or bypass security checks. Fix: If serde is enabled, implement custom deserialization logic (serde(rename, deserialize_with)) for sensitive types. Validate all deserialized data before use. Document security implications of serde in README. - Low · Development Dependencies Include Datasets —
Cargo.toml (root) - dev-dependencies.linfa-datasets. The dev-dependencies include linfa-datasets with features like 'winequality', 'iris', 'diabetes'. If these datasets are fetched from external sources at compile-time or runtime, they could be susceptible to man-in-the-middle attacks or malicious data injection. Fix: Verify that linfa-datasets validates downloaded data (checksums/signatures). For production use, embed datasets or fetch from verified sources only. Document data provenance. - Low · No Security Policy Defined —
Repository root (missing SECURITY.md). The repository does not appear to have a SECURITY.md file (not listed in file structure). This makes it difficult for security researchers to responsibly report vulnerabilities. Fix: Create a SECURITY.md file in the repository root with instructions on how to report security vulnerabilities privately, preferred contact methods, and expected response timeframes. - Informational · Workspace Member Dependencies Not Fully Visible —
Cargo.toml - workspace members. The workspace includes algorithms/* and datasets subdirectories. Individual Cargo.toml files for these members are not provided for review, potentially hiding dependency chain issues. Fix: Review all member crate Cargo.toml files for consistent dependency versions, unused dependencies, and known vulnerabilities. Use 'cargo audit' to scan the entire workspace for CVEs.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.