RepoPilotOpen in app →

rushter/MLAlgorithms

Minimal and clean examples of machine learning algorithms implementations

Healthy

Healthy across the board

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit today
  • 24+ active contributors
  • MIT licensed
Show all 6 evidence items →
  • CI configured
  • Tests present
  • Concentrated ownership — top contributor handles 62% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/rushter/mlalgorithms)](https://repopilot.app/r/rushter/mlalgorithms)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/rushter/mlalgorithms on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: rushter/MLAlgorithms

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/rushter/MLAlgorithms shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit today
  • 24+ active contributors
  • MIT licensed
  • CI configured
  • Tests present
  • ⚠ Concentrated ownership — top contributor handles 62% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live rushter/MLAlgorithms repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/rushter/MLAlgorithms.

What it runs against: a local clone of rushter/MLAlgorithms — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in rushter/MLAlgorithms | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>rushter/MLAlgorithms</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of rushter/MLAlgorithms. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/rushter/MLAlgorithms.git
#   cd MLAlgorithms
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of rushter/MLAlgorithms and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "rushter/MLAlgorithms(\\.git)?\\b" \\
  && ok "origin remote is rushter/MLAlgorithms" \\
  || miss "origin remote is not rushter/MLAlgorithms (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "mla/__init__.py" \\
  && ok "mla/__init__.py" \\
  || miss "missing critical file: mla/__init__.py"
test -f "mla/base/base.py" \\
  && ok "mla/base/base.py" \\
  || miss "missing critical file: mla/base/base.py"
test -f "mla/neuralnet/nnet.py" \\
  && ok "mla/neuralnet/nnet.py" \\
  || miss "missing critical file: mla/neuralnet/nnet.py"
test -f "mla/ensemble/tree.py" \\
  && ok "mla/ensemble/tree.py" \\
  || miss "missing critical file: mla/ensemble/tree.py"
test -f "mla/linear_models.py" \\
  && ok "mla/linear_models.py" \\
  || miss "missing critical file: mla/linear_models.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/rushter/MLAlgorithms"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

MLAlgorithms is a collection of 15+ minimal, cleanly-written Python implementations of machine learning algorithms from scratch—including deep learning (MLP, CNN, RNN, LSTM), tree ensembles (Random Forest, GBM), clustering (K-Means, GMM), dimensionality reduction (PCA, t-SNE), and reinforcement learning (Deep Q-learning)—designed for educational understanding rather than production performance. Flat monolithic structure: core algorithms live directly in mla/ (e.g., mla/linear_models.py, mla/kmeans.py, mla/gaussian_mixture.py); neural networks separated into mla/neuralnet/; ensembles in mla/ensemble/; metrics in mla/metrics/; runnable examples mirroring each algorithm in examples/ (e.g., examples/kmeans.py calls mla/kmeans.py). Base classes in mla/base/, datasets in mla/datasets/.

👥Who it's for

ML students, researchers, and engineers who want to understand algorithm internals by reading clear, uncommented reference implementations without the abstraction layers of scikit-learn or TensorFlow; people implementing algorithms from scratch in interviews or building intuition before using optimized libraries.

🌱Maturity & risk

Early-stage educational project: no visible release tags, CI pipeline exists (.github/workflows/python-app.yml) but health unknown, examples are runnable but repo lacks formal test coverage (no tests/ directory shown in top 60 files), last activity unknown but project structure is stable. Verdict: actively maintained for teaching, not production-ready.

Moderate risk: thin dependency stack (numpy, scipy, autograd, gym) but no lock file visible; single maintainer (rushter) with unknown availability; no visible issue tracker data or PR velocity; educational code may have subtle numerical bugs not caught by tests; autograd is unmaintained since ~2018. Not suitable for production ML pipelines.

Active areas of work

No PR, milestone, or recent activity data visible in provided file list. Last commit date unknown. Project appears stable but dormant—no evidence of active development, though CI workflow is configured.

🚀Get running

git clone https://github.com/rushter/MLAlgorithms && cd MLAlgorithms && pip install scipy numpy matplotlib scikit-learn seaborn autograd gym && python setup.py develop

Daily commands: python -m examples.linear_models (or substitute any example file: kmeans, gaussian_mixture, nnet_mlp, rl_deep_q_learning, etc.). Or: docker build -t mlalgorithms . && docker run --rm -it mlalgorithms bash && python -m examples.linear_models

🗺️Map of the codebase

  • mla/__init__.py — Package entry point that exposes all algorithm implementations; understand this to know what's available.
  • mla/base/base.py — Abstract base class defining the fit/predict interface used across all estimators; all algorithms inherit from here.
  • mla/neuralnet/nnet.py — Core neural network framework with layer composition, forward/backward pass, and training loop; heaviest implementation.
  • mla/ensemble/tree.py — Decision tree implementation used by Random Forest and Gradient Boosting; foundational for ensemble methods.
  • mla/linear_models.py — Linear and logistic regression implementations; simplest algorithms that establish the coding conventions.
  • examples/linear_models.py — Example usage of core algorithms showing fit/predict patterns; templates for understanding API.
  • requirements.txt — NumPy, SciPy, autograd dependencies; understanding these is essential for following gradient computations.

🛠️How to make changes

Add a new supervised learning algorithm

  1. Create a new module in mla/ (e.g., mla/svm/svm.py for SVM or mla/linear_models.py for regression) (mla/your_algorithm.py)
  2. Import BaseModel and implement fit() and predict() methods following the same interface (mla/base/base.py)
  3. Add hyperparameter initialization in init() and use numpy/autograd for computations (mla/your_algorithm.py)
  4. Export the class in mla/init.py so it's available as mla.YourAlgorithm (mla/__init__.py)
  5. Create an example in examples/your_algorithm.py demonstrating usage with synthetic or real data (examples/your_algorithm.py)
  6. Add unit tests in mla/tests/ to verify accuracy on toy datasets (mla/tests/test_your_algorithm.py)

Add a new neural network layer

  1. Create the layer class in mla/neuralnet/layers/basic.py (or convnet.py/recurrent/*.py for specialized layers) (mla/neuralnet/layers/basic.py)
  2. Implement forward(X) for inference and backward(dL_dY) for gradient computation using autograd or manual derivatives (mla/neuralnet/layers/basic.py)
  3. Store parameters (weights, biases) and gradients as attributes; initialize in get_parameters() (mla/neuralnet/layers/basic.py)
  4. Register the layer in mla/neuralnet/init.py and add it to the NNet class in nnet.py (mla/neuralnet/nnet.py)
  5. Create or update an example in examples/ (e.g., examples/nnet_mlp.py) to show the layer in action (examples/nnet_mlp.py)

Add a new distance metric or kernel function

  1. Add the function to mla/metrics/distance.py for distance-based algorithms (KNN, K-means) or mla/svm/kernerls.py for SVM (mla/metrics/distance.py)
  2. Implement the metric as a function taking (X1, X2) and returning a distance/similarity matrix (mla/metrics/distance.py)
  3. Update the algorithm (e.g., mla/knn.py or mla/svm/svm.py) to accept the metric as a parameter in init() (mla/knn.py)
  4. Add test cases in mla/metrics/tests/test_metrics.py to verify the metric on known values (mla/metrics/tests/test_metrics.py)

Add a new ensemble algorithm

  1. Create a new file in mla/ensemble/ (e.g., mla/ensemble/your_ensemble.py) inheriting from BaseModel (mla/ensemble/your_ensemble.py)
  2. Implement aggregation logic using mla/ensemble/tree.py as a base learner or combine existing algorithms (mla/ensemble/tree.py)
  3. Follow the fit() and predict() pattern, storing a list of base learners and weights/voting rules (mla/ensemble/your_ensemble.py)
  4. Export in mla/ensemble/init.py and mla/init.py (mla/ensemble/__init__.py)
  5. Create an example in examples/ demonstrating the ensemble on a classification or regression task (examples/your_ensemble.py)

🪤Traps & gotchas

Autograd (dependency for gradient computation) is unmaintained since 2018—may break on newer Python versions. No explicit Python version constraint in provided data. Docker build requires docker daemon. Examples expect matplotlib display capability (may fail in headless environments). mla/datasets contains pre-downloaded MNIST binary files (t10k-images-idx3-ubyte, etc.)—importing mla.datasets loads these; large binary files not shown in git history. No validation that algorithms handle edge cases (empty arrays, singular matrices, etc.).

🏗️Architecture

💡Concepts to learn

  • Backpropagation — Central to training neural networks in mla/neuralnet/; understanding chain rule and gradient flow is essential to modifying RNN and CNN implementations
  • Gradient Boosting (stage-wise additive models) — mla/ensemble/gbm.py implements this powerful ensemble technique; core concept for understanding how weak learners are iteratively combined via residual fitting
  • K-Means clustering (Lloyd's algorithm) — mla/kmeans.py is a canonical unsupervised learning example; demonstrates convergence criteria, centroid updates, and distance metric choice
  • Expectation-Maximization (EM) — Underlying algorithm for mla/gaussian_mixture.py and mla/rbm.py; learning probabilistic models via latent variable optimization
  • t-Distributed Stochastic Neighbor Embedding (t-SNE) — mla/tsne.py implements this nonlinear dimensionality reduction; demonstrates probability distribution matching and gradient descent on a non-convex objective
  • Support Vector Machines (SVM) with kernel methods — mla/svm/ implements RBF, poly, and linear kernels; shows how kernel trick enables nonlinear classification without explicit feature mapping
  • Recurrent Neural Networks (RNN/LSTM) — mla/neuralnet/ contains RNN and LSTM cells with backpropagation-through-time (BPTT); essential for understanding sequential data and vanishing gradient problem
  • scikit-learn/scikit-learn — Production-grade ML library with same algorithm coverage; this repo is the 'understand the guts' alternative
  • d2l-ai/d2l-en — Dive into Deep Learning textbook with interactive code examples; similar educational philosophy but broader DL focus
  • numpy/numpy — Core dependency; understanding NumPy array operations is essential to reading MLAlgorithms source code
  • HIPS/autograd — Automatic differentiation library used for gradient-based algorithms in this repo; unmaintained but still functional
  • fchollet/deep-learning-with-python-notebooks — Companion notebooks teaching deep learning concepts with Keras; complementary high-level view to MLAlgorithms' from-scratch approach

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for mla/ensemble/ (random_forest.py and gbm.py)

The repo has minimal test coverage. mla/metrics/tests/test_metrics.py exists but there are no tests for the ensemble methods despite them being core implementations. Random Forest and GBM are complex algorithms where bugs easily hide. This would catch regressions and serve as executable documentation for contributors learning these algorithms.

  • [ ] Create mla/ensemble/tests/init.py and mla/ensemble/tests/test_ensemble.py
  • [ ] Add tests for RandomForest: initialization, fit(), predict(), feature importance calculation with synthetic datasets
  • [ ] Add tests for GBM: initialization, fit(), predict(), loss function convergence with simple 1D regression cases
  • [ ] Add tests for tree.py DecisionTree edge cases: max_depth limiting, min_samples_split, pure node handling
  • [ ] Run tests in python-app.yml workflow to ensure CI integration

Add missing unit tests for mla/neuralnet/ layer implementations (basic.py, convnet.py, recurrent/)

The neural network module is the most complex in the repo with multiple layer types (Dense, Conv, LSTM, RNN) but has zero test coverage despite forward/backward pass correctness being critical. Tests should validate gradient computations and shape transformations, which are common failure points in educational implementations.

  • [ ] Create mla/neuralnet/tests/init.py and mla/neuralnet/tests/test_layers.py
  • [ ] Add tests for mla/neuralnet/layers/basic.py: Dense layer forward pass shape, backward pass gradients using numerical gradient checking
  • [ ] Add tests for mla/neuralnet/layers/convnet.py: Conv2D forward/backward with known small inputs, stride/padding handling
  • [ ] Add tests for mla/neuralnet/layers/recurrent/: LSTM and RNN cell state management, sequence length handling
  • [ ] Add gradient checking tests using autograd to validate backprop implementations

Implement missing SVM implementation (mla/svm/ module referenced in README but not in file structure)

The README explicitly lists 'Support vector machine (SVM) with kernels (Linear, Poly, RBF)' as implemented and references 'mla/svm' in the path, but this module is completely missing from the codebase. This is a gap between documentation and implementation, blocking users trying to learn SVM internals from this repo.

  • [ ] Create mla/svm/init.py and mla/svm/svm.py with SVM class
  • [ ] Implement SVM using quadratic programming (scipy.optimize) with support for Linear, Polynomial, and RBF kernels in mla/svm/kernels.py
  • [ ] Add fit() method using SMO algorithm or cvxopt for the quadratic program
  • [ ] Add predict() method with kernel transformations
  • [ ] Create examples/svm.py demonstrating usage on a toy classification dataset (similar to other examples in the repo)
  • [ ] Add unit tests in mla/svm/tests/test_svm.py validating kernel computations and predictions

🌿Good first issues

  • Add comprehensive unit tests for mla/ensemble/tree.py and mla/ensemble/random_forest.py in a new mla/tests/ directory (currently only mla/metrics/tests/ exists); start with test_tree_split_criterion and test_forest_predict_shape
  • Document the mathematical derivation of gradient updates in mla/ensemble/gbm.py (add docstrings showing loss function, gradient formula, and update rule); currently code has no math comments
  • Add mla/naive_bayes.py implementation (README lists it as implemented but no file visible in top 60); implement fit() / predict() for Gaussian Naive Bayes with examples/naive_bayes.py example

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 3fe473a — Merge pull request #96 from Adi-474u/add-gelu-activation (rushter)
  • 1fe46f6 — Add GELU activation function (Adi-474u)
  • 8fb8bb4 — Format using ruff (rushter)
  • 4e93532 — Format using ruff (rushter)
  • edf9a13 — Update github actions (rushter)
  • 035e489 — Merge pull request #88 from paulinder/patch-1 (rushter)
  • d20d313 — fix gradient descent optimization algorithm link (paulinder)
  • 3c8e16b — Merge pull request #79 from Shihab-Shahriar/master (rushter)
  • ef5aca0 — Apply suggestions from code review (rushter)
  • 2378011 — Style fixes (rushter)

🔒Security observations

This is an educational machine learning algorithms repository with a reasonable security posture for its intended purpose. The primary concerns are outdated dependencies containing known vulnerabilities and lack of input validation in data loading mechanisms. The Docker configuration could be more specific, and CI/CD pipeline lacks automated security scanning. No critical vulnerabilities such as hardcoded secrets, injection points, or exposed credentials were detected. The codebase appears to follow secure development practices overall, but dependency management and automated security checks should be prioritized.

  • Medium · Outdated Dependency Versions — setup.py or requirements.txt (dependencies specification). The dependency file specifies very old minimum versions (numpy>=1.11.1, scipy>=0.18.0, scikit-learn>=0.18, matplotlib>=1.5.1). These versions are from 2016-2017 and contain known security vulnerabilities. Modern versions should be used. Fix: Update all dependencies to current stable versions. Use pip-audit or similar tools to identify and resolve known CVEs. Consider using dependency pinning with specific versions rather than loose version constraints.
  • Medium · Missing Input Validation in ML Pipeline — mla/datasets/base.py and mla/datasets/data loading modules. The codebase implements machine learning algorithms without apparent input sanitization or validation. While this is an educational project, datasets loaded from files (MNIST, Nietzsche text) could be exploited if malicious data is provided, potentially causing denial of service or unexpected behavior. Fix: Implement input validation for all data loading functions. Validate data shapes, types, and ranges. Add checks for file integrity and size limits before processing.
  • Low · Permissive Docker Base Image — Dockerfile (line 1: FROM python:3). The Dockerfile uses 'python:3' without specifying a specific Python version. This means the base image tag will be updated unpredictably, potentially introducing breaking changes or security issues in the runtime environment. Fix: Specify a specific Python version like 'FROM python:3.11-slim' or 'FROM python:3.11-alpine'. Use slim or alpine variants to reduce attack surface. Regularly update to the latest patch version.
  • Low · No Security Scanning in CI/CD — .github/workflows/python-app.yml. The GitHub Actions workflow (python-app.yml) does not include security scanning, dependency vulnerability checks, or SAST (Static Application Security Testing) steps. Fix: Add security scanning steps to the CI/CD pipeline: use 'pip-audit' for dependency vulnerabilities, add Bandit for static security analysis, and consider using GitHub's built-in security features like Dependabot.
  • Low · Missing MANIFEST.in Validation — MANIFEST.in. The presence of MANIFEST.in file without visibility into its contents could potentially include or exclude sensitive files inadvertently during package distribution. Fix: Review and validate MANIFEST.in contents. Ensure it explicitly excludes sensitive files like .env, credentials, or test data. Document what files are included in distributions.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · rushter/MLAlgorithms — RepoPilot