rust-ml/linfa

Item: rust-ml/linfa
Rating: 5
Author: RepoPilot

A Rust machine learning framework.

Healthy

Healthy across the board

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 1d ago
✓42+ active contributors
✓Distributed ownership (top contributor 33% of recent commits)

Show all 6 evidence items →

✓Apache-2.0 licensed
✓CI configured
✓Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/rust-ml/linfa)](https://repopilot.app/r/rust-ml/linfa)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/rust-ml/linfa on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: rust-ml/linfa

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/rust-ml/linfa shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 1d ago
42+ active contributors
Distributed ownership (top contributor 33% of recent commits)
Apache-2.0 licensed
CI configured
Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live rust-ml/linfa repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/rust-ml/linfa.

What it runs against: a local clone of rust-ml/linfa — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in rust-ml/linfa | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>rust-ml/linfa</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of rust-ml/linfa. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/rust-ml/linfa.git
#   cd linfa
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of rust-ml/linfa and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "rust-ml/linfa(\\.git)?\\b" \\
  && ok "origin remote is rust-ml/linfa" \\
  || miss "origin remote is not rust-ml/linfa (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/rust-ml/linfa"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Linfa is a Rust machine learning framework providing a scikit-learn-inspired toolkit for classical ML algorithms and preprocessing. It includes Naive Bayes, K-Means, DBSCAN, Gaussian Mixture Models, ensemble methods, elastic net regression, and FTRL optimization—all with native Rust performance and ndarray-based matrix operations. Monorepo structured as workspace with core linfa crate at root (provides traits, hyperparams, utils) and algorithm-specific subcrates under algorithms/ (linfa-bayes, linfa-clustering, linfa-ensemble, etc.). Each algorithm crate contains src/, examples/, and benches/ directories. Root Cargo.toml defines workspace members and shared dependencies (ndarray, num-traits, rand). Feature flags control optional BLAS backends (openblas-static, intel-mkl-system, netlib-system).

👥Who it's for

Rust systems engineers and data scientists building ML pipelines who need production-grade classical algorithms without Python dependencies; contributors are ML algorithm implementers targeting the Rust-ML ecosystem.

🌱Maturity & risk

Actively maintained (version 0.8.1, recent CI/CD workflows present). Multiple tested/benchmarked algorithm categories, comprehensive GitHub Actions setup (testing, linting, benching, docs), and workspace organization across 10+ algorithm crates indicate solid foundation. Still pre-1.0, suggesting API stability not yet guaranteed.

Moderate risk: monorepo with many interdependent crates (linfa-core depends on ndarray/ndarray-linalg, which require BLAS/LAPACK via feature flags). Dependency on external linear algebra libraries (OpenBLAS, Intel MKL, netlib) adds platform-specific build complexity. Single-maintainer risk typical of open-source ML projects; breaking changes possible before 1.0.

Active areas of work

No specific recent commits visible in file list, but CI/CD pipeline (.github/workflows/) shows active testing, benching, code quality, and documentation workflows. README indicates status tracking per algorithm (tested/benchmarked/partial-fit). The codebase appears in steady-state maintenance with focus on correctness and performance benchmarking rather than rapid feature development.

🚀Get running

git clone https://github.com/rust-ml/linfa.git
cd linfa
cargo build
cargo test
cargo test --workspace  # Test all algorithm crates

For specific algorithms: cargo build -p linfa-clustering or cargo test -p linfa-bayes.

Daily commands: No server to start; this is a library. Run examples: cargo run --example kmeans --manifest-path algorithms/linfa-clustering/Cargo.toml. Run tests: cargo test --all. Run benchmarks: cargo bench --features benchmarks --manifest-path algorithms/linfa-clustering/Cargo.toml. Documentation: cargo doc --open.

🗺️Map of the codebase

Cargo.toml: Workspace definition; specifies all algorithm crates, shared dependencies, and feature gates for BLAS/LAPACK backends.
algorithms/linfa-clustering/src: Largest algorithm module; demonstrates architectural patterns used across other algorithms (hyperparams, error handling, example implementations like K-Means, DBSCAN, OPTICS).
algorithms/linfa-bayes/src/base_nb.rs: Shared base implementation for Naive Bayes variants; shows trait composition pattern likely reused in other algorithm crates.
.github/workflows/testing.yml: Defines test matrix (multiple Rust versions, all workspace members); indicates expected compatibility and CI requirements.
CONTRIBUTE.md: Developer contribution guidelines; essential before submitting PRs to understand code standards and workflow.

🛠️How to make changes

Start in algorithms/[algorithm-name]/src/ for algorithm logic. Core trait definitions and hyperparameter patterns are in root linfa/src/ (inferred from workspace structure). Add tests alongside implementations (src/lib.rs typically has #[cfg(test)] mod tests). Add examples in algorithms/[algorithm-name]/examples/[name].rs. Run cargo fmt and cargo clippy before committing (enforced by CI). See CONTRIBUTE.md for contribution guidelines.

🪤Traps & gotchas

BLAS/LAPACK feature flags are mutually exclusive (only one of netlib-static, openblas-static, intel-mkl-static, or none can be enabled). Missing a feature flag or choosing wrong one causes linking errors. ndarray-linalg version (0.17) may have breaking changes relative to ndarray (0.16). Windows requires careful BLAS setup (note in Cargo.toml: pprof not available on Windows). Examples assume datasets available via linfa-datasets crate with features enabled (see dev-dependencies).

💡Concepts to learn

Naive Bayes classification (Bernoulli, Gaussian, Multinomial variants) — Linfa provides three implementations; understanding when to use each (text classification vs. continuous features) is critical for text/categorical data tasks.
K-Means clustering and initialization strategies — Core unsupervised algorithm in linfa; implementation must handle k-means++ initialization and convergence criteria correctly to avoid poor local optima.
DBSCAN density-based clustering — Linfa includes both standard DBSCAN and approximated variants (appx_dbscan with spatial grid); understanding epsilon and minPts hyperparameters is essential for arbitrary-shaped cluster detection.
Elastic Net regularization (L1+L2 penalties) — Linfa's elasticnet crate combines Lasso and Ridge regression; balancing alpha/l1_ratio prevents overfitting better than either alone.
FTRL (Follow The Regularized Leader) for online learning — Linfa's ftrl crate enables incremental/partial-fit updates; critical for streaming data or models that must update without retraining on full dataset.
Gaussian Mixture Models (GMM) and EM algorithm — Probabilistic clustering in linfa; EM convergence and covariance matrix conditioning directly impact clustering quality.
Hyperparameter builder pattern in Rust — Linfa uses fluent builder API across all algorithm crates (hyperparams.rs files); understanding this pattern is essential for configuring any algorithm.

rust-ml/ndarray — Foundational dependency; provides n-dimensional array abstractions that linfa algorithms depend on for matrix operations.
rust-lang/packed_simd — Sibling project for vectorized computation; relevant for future performance optimization of matrix operations in linfa.
huggingface/candle — Alternative Rust ML framework with GPU support (via Candle backend); direct competitor for similar ML workloads but with deep learning focus.
rust-ml/linfa-datasets — Companion repo (listed in workspace) providing benchmark datasets (iris, winequality, diabetes) used in linfa examples and tests.
polars-rs/polars — DataFrame library often used alongside linfa for data preprocessing and feature engineering before ML pipeline execution.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive benchmarking suite for clustering algorithms with criterion integration

The repo has benching.yml workflow and criterion dependency configured, but only 3 benchmark files exist (k_means.rs, dbscan.rs, gaussian_mixture.rs) while OPTICS algorithm lacks benchmarks. New contributors can add criterion benchmarks for OPTICS and expand existing benchmarks to cover edge cases (empty datasets, single points, high dimensions). This directly supports the 'benchmarks' feature flag in Cargo.toml and helps track performance regressions.

[ ] Create algorithms/linfa-clustering/benches/optics.rs with criterion benchmarks for OPTICS algorithm
[ ] Expand algorithms/linfa-clustering/benches/dbscan.rs to include appx_dbscan variant benchmarks
[ ] Add benchmark comparison documentation in algorithms/linfa-clustering/README.md with results and performance notes
[ ] Ensure benchmarks run in CI via .github/workflows/benching.yml

Implement missing unit tests for OPTICS algorithm and AppxDBSCAN clustering module

While other clustering modules have test files (appx_dbscan/cells_grid/tests.rs, appx_dbscan/clustering/tests.rs, etc.), the OPTICS module at algorithms/linfa-clustering/src/optics/ lacks a tests.rs file. Given the complexity of the OPTICS algorithm and its reachability-distance calculations, comprehensive unit tests are critical. This improves code reliability and serves as documentation.

[ ] Create algorithms/linfa-clustering/src/optics/tests.rs with unit tests for reachability distance calculations
[ ] Add test cases for edge cases: single point, identical points, sparse clusters, noise detection
[ ] Verify clustering consistency between OPTICS and DBSCAN results on shared test datasets
[ ] Run tests locally and ensure they pass via cargo test in the linfa-clustering crate

Add serialization (serde) support documentation and feature-gated tests for all algorithm modules

The root Cargo.toml has a 'serde' feature that gates serde_crate dependency, but there's no evidence of serde derive implementations or tests across algorithm crates (linfa-bayes, linfa-clustering, etc.). Contributing serde support with proper feature gating and tests would enable ML models to be serialized/deserialized, a common production requirement. This requires updating multiple Cargo.tomls and adding comprehensive tests.

[ ] Add #[derive(Serialize, Deserialize)] to key fitted model structs in algorithms/linfa-bayes/src/gaussian_nb.rs, algorithms/linfa-clustering/src/k_means/algorithm.rs
[ ] Add 'serde' feature to each algorithm crate's Cargo.toml with conditional serde dependency
[ ] Create integration tests in each algorithm crate testing model serialization/deserialization round-trips
[ ] Document serde feature usage in algorithms/linfa-bayes/README.md and algorithms/linfa-clustering/README.md with examples

🌿Good first issues

Add missing tests for appx_dbscan/cells_grid/tests.rs and appx_dbscan/counting_tree/tests.rs, which are module-level but likely have incomplete coverage; run cargo tarpaulin to identify gaps.
Document hyperparameter tuning patterns: algorithms/linfa-*/src/hyperparams.rs files use a builder pattern but lack examples in algorithm-specific READMEs; add example code showing hyperparameter selection workflow.
Create integration examples for multi-step pipelines (e.g., normalize data → cluster → evaluate silhouette score); currently only single-algorithm examples exist in examples/ directories.

⭐Top contributors

Click to expand

@relf — 33 commits
@YuhanLiin — 13 commits
@oojo12 — 10 commits
@bytesnake — 3 commits
@oglego — 2 commits

📝Recent commits

Click to expand

8167a62 — Fix docs action: build only when not on linfa repo (relf)
e69cd94 — Linting (relf)
1abc88f — feat: add symmetric mean absolute percentage error (sMAPE) (#437) (oglego)
12c6c73 — fix: realign PreprocessingError variants with error strings (#434) (oglego)
17f8696 — Fixes #393. label ordering in binary logistic regression,(#432) (espenloov)
c7c2af5 — Add generic ResidualChain composing method (#430) (Feiyang472)
b1f9ddb — Relax required test score (relf)
2197362 — Update to Zola 0.22 (relf)
1de164b — feat(linfa-tsne): update to bhtsne 0.5.4 (#429) (AnthonyMichaelTDM)
4484a55 — Bump all the crates' version to 0.8.1 (relf)

🔒Security observations

The linfa Rust ML framework demonstrates a generally solid security posture. No critical vulnerabilities or hardcoded secrets were identified in the provided configuration. The main concerns are around optional features (BLAS, serde, datasets) that require careful validation at usage boundaries, and the lack of a SECURITY.md policy file for vulnerability reporting. Dependency versions are reasonable, though the recent thiserror 2.0 adoption and exact pin

Medium · Outdated thiserror Dependency — Cargo.toml (root) - dependencies.thiserror. The codebase uses thiserror version 2.0, which is a major version that may contain breaking changes or deprecations. This could introduce subtle bugs or maintenance issues. Version 2.0 is relatively new and less battle-tested than 1.x versions. Fix: Consider evaluating the stability of thiserror 2.0 in your use cases. If stability is critical, consider pinning to a tested 1.x version like 1.0 or investigating any breaking changes that may affect error handling.
Low · Pinned Dependency Version for sprs — Cargo.toml (root) - dependencies.sprs. The sprs dependency is pinned to an exact version (=0.11.2) with default-features disabled. While pinning can ensure reproducibility, overly strict pinning may prevent security updates. An exact pin without a range limits flexibility for patches. Fix: Evaluate whether the exact pin is necessary. Consider using a range like '0.11.2' or '>=0.11.2, <0.12' to allow patch updates while maintaining stability.
Low · Optional BLAS Dependencies Not Validated — Cargo.toml (root) - features section. Multiple BLAS feature flags (netlib-static, netlib-system, openblas-static, openblas-system, intel-mkl-static, intel-mkl-system) depend on external native libraries. If these are compiled from source or downloaded, there is a risk of supply chain attacks or version mismatches if not properly validated. Fix: Document which BLAS versions are tested and supported. Use feature flags to require specific versions when external dependencies are critical. Consider verifying checksums of downloaded native libraries in CI/CD pipelines.
Low · Serde Deserialization Without Validation — Cargo.toml (root) - features.serde and [dependencies.serde_crate]. The codebase includes optional serde support for serialization/deserialization of ML models and data structures. Without custom validation, deserialization could be exploited to instantiate malicious objects or bypass security checks. Fix: If serde is enabled, implement custom deserialization logic (serde(rename, deserialize_with)) for sensitive types. Validate all deserialized data before use. Document security implications of serde in README.
Low · Development Dependencies Include Datasets — Cargo.toml (root) - dev-dependencies.linfa-datasets. The dev-dependencies include linfa-datasets with features like 'winequality', 'iris', 'diabetes'. If these datasets are fetched from external sources at compile-time or runtime, they could be susceptible to man-in-the-middle attacks or malicious data injection. Fix: Verify that linfa-datasets validates downloaded data (checksums/signatures). For production use, embed datasets or fetch from verified sources only. Document data provenance.
Low · No Security Policy Defined — Repository root (missing SECURITY.md). The repository does not appear to have a SECURITY.md file (not listed in file structure). This makes it difficult for security researchers to responsibly report vulnerabilities. Fix: Create a SECURITY.md file in the repository root with instructions on how to report security vulnerabilities privately, preferred contact methods, and expected response timeframes.
Informational · Workspace Member Dependencies Not Fully Visible — Cargo.toml - workspace members. The workspace includes algorithms/* and datasets subdirectories. Individual Cargo.toml files for these members are not provided for review, potentially hiding dependency chain issues. Fix: Review all member crate Cargo.toml files for consistent dependency versions, unused dependencies, and known vulnerabilities. Use 'cargo audit' to scan the entire workspace for CVEs.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

rust-ml/linfa

Embed the "Healthy" badge

Onboarding doc

Onboarding: rust-ml/linfa

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

🪤Traps & gotchas

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive benchmarking suite for clustering algorithms with criterion integration

Implement missing unit tests for OPTICS algorithm and AppxDBSCAN clustering module

Add serialization (serde) support documentation and feature-gated tests for all algorithm modules

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next