sjwhitworth/golearn

Item: sjwhitworth/golearn
Rating: 5
Author: RepoPilot

Machine Learning for Go

Healthy

Healthy across all four use cases

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓18 active contributors
✓Distributed ownership (top contributor 42% of recent commits)
✓MIT licensed

Show all 6 evidence items →

✓CI configured
✓Tests present
⚠Stale — last commit 2y ago

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/sjwhitworth/golearn)](https://repopilot.app/r/sjwhitworth/golearn)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/sjwhitworth/golearn on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: sjwhitworth/golearn

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/sjwhitworth/golearn shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

18 active contributors
Distributed ownership (top contributor 42% of recent commits)
MIT licensed
CI configured
Tests present
⚠ Stale — last commit 2y ago

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live sjwhitworth/golearn repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/sjwhitworth/golearn.

What it runs against: a local clone of sjwhitworth/golearn — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in sjwhitworth/golearn | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | Last commit ≤ 873 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>sjwhitworth/golearn</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of sjwhitworth/golearn. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/sjwhitworth/golearn.git
#   cd golearn
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of sjwhitworth/golearn and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "sjwhitworth/golearn(\\.git)?\\b" \\
  && ok "origin remote is sjwhitworth/golearn" \\
  || miss "origin remote is not sjwhitworth/golearn (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 873 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~843d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/sjwhitworth/golearn"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

GoLearn is a batteries-included machine learning library for Go that implements supervised and unsupervised learning algorithms (KNN, decision trees, clustering via DBSCAN and EM) with a scikit-learn-compatible Fit/Predict interface. It provides core data structures like Instances (data frames) in base/data.go, CSV/ARFF parsing, and evaluation metrics, enabling end-to-end ML workflows without leaving Go. Monolithic structure with base/ as the core (data structures, serialization, utilities), clustering/ (DBSCAN, EM, Gaussian Mixture), evaluation/ (metrics), and knn/ (classifier). Data flows: CSV/ARFF → Instances (base/data.go) → Estimators (e.g., knn.KnnClassifier) → Predict → evaluation.GetConfusionMatrix. Examples in examples/ provide runnable entry points.

👥Who it's for

Go developers building ML pipelines or data science applications who want native Go ML without cgo bindings to Python/C++ libraries. Users range from practitioners prototyping classifiers on datasets like Iris (examples/knnclassifier) to those needing embedded ML in Go services.

🌱Maturity & risk

Actively maintained with CI via Travis, code coverage tracking via Codecov, and comprehensive test coverage (all base/*.go files have corresponding *_test.go files). The project is production-capable but relatively niche—it has lower GitHub visibility than scikit-learn or TensorFlow Go bindings, though the core ML implementations are solid.

Low dependency count (gonum/matrix, gonum/blas, gonum/lapack for linear algebra, rocketlaunchr/dataframe-go for DataFrames) with mature upstream libraries. Risk factors: single primary maintainer (sjwhitworth), potential API instability as the library is still adding features (no v1.0 tag visible), and some algorithms may lack optimization compared to C++/Python counterparts. Last activity not visible from provided data, so currency is uncertain.

Active areas of work

No recent commit history visible from provided data. The repo structure suggests active development in clustering algorithms (DBSCAN, EM with test data files like gaussian_mixture.csv) and continued maintenance of core base/ utilities, but current status is unclear.

🚀Get running

git clone https://github.com/sjwhitworth/golearn.git && cd golearn && go mod download && go test ./... to verify the build.

Daily commands: cd examples/knnclassifier && go run knnclassifier_iris.go (per README) to run a KNN classifier on Iris data.

🗺️Map of the codebase

base/data.go: Defines the core Instances interface and DenseInstances struct—the fundamental data structure for all ML workflows in GoLearn.
base/classifier.go: Defines the Classifier interface (Fit/Predict)—the contract all estimators must follow, mirroring scikit-learn's design.
base/attributes.go: Handles attribute metadata (feature names, types like Categorical, Float)—essential for interpreting data loaded from ARFF/CSV.
base/csv.go: CSV parsing logic (ParseCSVToInstances function shown in README)—primary user entry point for loading datasets.
base/arff.go: ARFF format parsing—standard format for Weka/ML datasets, with parallel test coverage in base/arff_test.go.
clustering/dbscan.go: DBSCAN clustering implementation—one of the main unsupervised algorithms with supporting test file dbscan_test.go.
base/serialize.go: Model serialization logic—critical for saving/loading trained estimators across sessions.
base/mat.go: Matrix abstraction wrapping Gonum—bridges high-level ML code to low-level linear algebra.

🛠️How to make changes

Start in base/ for data structure changes (attributes, instances, serialization). To add an ML algorithm, create a new package (e.g., svm/) and implement the Classifier interface (base/classifier.go). Add tests next to implementation files (*_test.go pattern). Examples in examples/ show how to wire algorithms together.

🪤Traps & gotchas

No explicit env vars or config files found in the structure. Ensure gonum/blas and gonum/lapack are properly installed (handled by go mod, but on systems without Fortran/BLAS libraries, performance may degrade). ARFF parsing expects specific format compliance (base/arff.go). The Instances struct is mutable—concurrent access across goroutines requires external synchronization. Categorical attributes require pre-defined domain (base/categorical.go) before fitting estimators.

💡Concepts to learn

ARFF format (Attribute-Relation File Format) — GoLearn natively parses ARFF (base/arff.go) for compatibility with Weka datasets; understanding this format is essential for loading ML benchmarks into GoLearn.
Sparse vs. Dense matrix representations — GoLearn implements both dense (base/dense.go) and sparse (base/mat.go) matrices; choosing the right representation affects memory usage and algorithm performance in high-dimensional data.
DBSCAN (Density-Based Spatial Clustering) — Implemented in clustering/dbscan.go as a key unsupervised learning algorithm; understanding density-based clustering helps distinguish it from K-means for non-convex clusters.
Expectation-Maximization (EM) algorithm — GoLearn includes EM-based Gaussian Mixture clustering (clustering/em.go); essential for probabilistic clustering and understanding latent variable models.
K-Nearest Neighbors (KNN) with distance metrics — Core algorithm in knn/ accepting 'euclidean' and 'linear' distance metrics (per README example); understanding metric choice is critical for classification performance.
Confusion matrix and precision/recall metrics — GoLearn's evaluation/ package (GetConfusionMatrix, GetSummary) is the standard way to assess classifier performance; foundational for understanding ML model quality.
Scikit-learn's Estimator interface (Fit/Predict pattern) — GoLearn mirrors scikit-learn's design (base/classifier.go); adopting this interface makes swapping algorithms trivial and familiar to Python ML practitioners moving to Go.

go-echarts/go-echarts — Visualization library for Go ML results; GoLearn outputs metrics (confusion matrices) that pair well with echarts for dashboards.
gota-frame/gota — Alternative tabular data library for Go; users comparing data frame implementations may evaluate both Gota and rocketlaunchr/dataframe-go used here.
gonum/gonum — Core dependency providing matrix/linear algebra primitives (gonum/matrix, gonum/blas); GoLearn is a high-level abstraction over Gonum.
tensorflow/tensorflow — Competing ML library with Go bindings; GoLearn targets classical algorithms while TensorFlow handles deep learning, different use cases.
kniren/gota — Historical data frame library for Go ML; comparison point for understanding why GoLearn chose rocketlaunchr/dataframe-go.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive test coverage for base/conversion.go

The base/conversion.go file lacks a corresponding _test.go file. This module appears to handle data type conversions which are critical for ML pipelines. Given the presence of conversion-related utilities in base/util_attributes.go and base/util_instances.go, conversion.go likely contains important transformation logic that needs test coverage to prevent regressions.

[ ] Create base/conversion_test.go with unit tests for all exported functions in conversion.go
[ ] Test edge cases for type conversions (nil values, empty datasets, type mismatches)
[ ] Add benchmarks for conversion operations to track performance
[ ] Ensure test coverage reaches >80% for the conversion package

Migrate from Travis CI to GitHub Actions with coverage reporting

The repo uses .travis.yml (legacy Travis CI) and contains coverage.sh but lacks a modern GitHub Actions workflow. This prevents: proper integration with GitHub's native PR checks, dependency update automation, and consistent multi-version Go testing. The go.mod shows go 1.15 support but no CI validates against newer Go versions (1.17+).

[ ] Create .github/workflows/test.yml with matrix testing for Go 1.15, 1.18, 1.19, 1.20
[ ] Configure codecov integration in the workflow to replace/supplement current coverage reporting
[ ] Add separate workflow for dependency updates using dependabot or renovate
[ ] Update README.md to reference GitHub Actions badge instead of Travis CI

Add integration tests for clustering algorithms with validation datasets

The clustering/ directory has test CSV files (dbscan.csv, gaussian_mixture.csv, gaussian_mixture_labels.csv) and even a Python reference implementation (gaussian_mixture.py, gen_test.py) but lacks comprehensive integration tests that validate outputs against known results. Current tests (cluster_test.go, dbscan_test.go, em_test.go) appear to lack golden-file or reference output validation.

[ ] Create clustering/integration_test.go with golden-file testing against CSV datasets
[ ] Add validation that DBSCAN clustering output matches expected labels in dbscan_labels.csv
[ ] Add Gaussian Mixture Model output validation against gaussian_mixture_labels.csv
[ ] Cross-validate results against the Python reference implementation in gen_test.py

🌿Good first issues

Add unit test coverage for base/conversion.go—no matching *_test.go file exists, and this handles type conversions critical to data pipeline reliability.: Medium
Implement cross-validation helpers beyond train/test split (e.g., k-fold CV in base/util_instances.go)—multiple examples in examples/ would benefit from this, and scikit-learn users expect it.: Medium
Add a linear regression estimator (e.g., new package regression/) following the Classifier interface pattern—currently clustering/ and knn/ exist, but no regression suite, a common ML requirement.: High

⭐Top contributors

Click to expand

@Yushgoel — 42 commits
@Sentimentron — 28 commits
@5teven1in — 6 commits
@AlekSi — 3 commits
@Soypete — 3 commits

📝Recent commits

Click to expand

74ae077 — Merge pull request #284 from sshehata/master (Sentimentron)
0f33e2f — feat: allow missing values when parsing csvs (samyshehata)
a8b69c2 — Merge pull request #269 from EliDavis3D/patch-1 (Sentimentron)
0ae13fe — Example now pulls from correct filepath for dataset (EliDavis3D)
947ee72 — Merge pull request #268 from louisguitton/fix-typo-hello-world (Sentimentron)
d0cad66 — Fix typo in hello world example (louisguitton)
00d4cfd — Merge pull request #266 from wonyonyon/patch-1 (Sentimentron)
093beec — Update go.sum (wonyonyon)
cde96fa — Merge pull request #261 from Oliveirakun/fix-model-load (Sentimentron)
d33eb47 — Fix random forest model loading (Oliveirakun)

🔒Security observations

The GoLearn repository has moderate security concerns primarily centered on outdated dependencies (from 2018) and Docker configuration issues. The most critical risks are the use of unversioned dependencies that may contain known vulnerabilities and the lack of version pinning in the Dockerfile. The codebase itself appears to be a machine learning library without obvious injection risks, SQL operations, or embedded credentials. However, dependency management practices require significant improvement. No explicit security policy or vulnerability disclosure mechanism is documented. Immediate actions should include updating all dependencies to current versions, pinning specific Docker base image and tool versions, and implementing automated dependency scanning.

High · Outdated and Potentially Vulnerable Dependencies — go.mod. The go.mod file specifies outdated versions of critical dependencies, particularly github.com/gonum/blas and github.com/gonum/matrix with dates from 2018. These dependencies may contain known security vulnerabilities. Additionally, github.com/rocketlaunchr/dataframe-go version 0.0.0-20201007021539 is a pre-release snapshot version that may not receive security updates. Fix: Update all dependencies to their latest stable versions. Run 'go get -u' and 'go mod tidy' to upgrade to current releases. Review security advisories for these packages at pkg.go.dev and ensure all transitive dependencies are up-to-date.
High · Insecure Base Image in Dockerfile — Dockerfile. The Dockerfile uses 'FROM alpine' without specifying a version tag. This results in pulling the latest Alpine image automatically, which may introduce breaking changes or security vulnerabilities without explicit control. Production builds should pin specific image versions. Fix: Specify a specific Alpine version tag, e.g., 'FROM alpine:3.18' or the latest long-term support version. Regularly update and rebuild images with the latest patched version.
Medium · Unverified Package Installation from Git — Dockerfile. The Dockerfile uses 'go get' to fetch github.com/gonum/blas and github.com/sjwhitworth/golearn without version pinning or checksum verification. This could be vulnerable to man-in-the-middle attacks or supply chain compromises. Fix: Use go mod tidy and go.mod with explicit version specifications. Verify package integrity using 'go get -u' with checksums validated by go.sum. Consider using private module proxies for additional security.
Medium · Build Tools Installed Without Version Constraints — Dockerfile. The Dockerfile installs build tools (gcc, make, perl, linux-headers, musl-dev) without specifying versions. This could lead to unpredictable builds and potential security issues if vulnerable tool versions are installed. Fix: Pin specific versions of build tools in the apk add command, e.g., 'apk add make=4.x.x gcc=x.x.x'. Document the exact versions used for reproducible builds.
Low · Missing SECURITY.md or Security Policy — Repository root. No SECURITY.md file is present in the repository root to define vulnerability disclosure procedures. This makes it unclear how security issues should be reported privately. Fix: Create a SECURITY.md file that outlines responsible disclosure practices and contact information for reporting security vulnerabilities.
Low · Outdated Go Version Target — go.mod. The go.mod specifies 'go 1.15', which reached end-of-life in December 2020. While not directly a vulnerability, using outdated Go versions may miss security patches and language improvements. Fix: Update to 'go 1.21' or the latest stable Go version. Test thoroughly to ensure compatibility with the updated runtime.
Low · No Supply Chain Security Mechanisms — Repository configuration. The repository does not appear to use any supply chain security measures such as SBOM generation, signed commits, or dependency vulnerability scanning (SLSA framework). Fix: Implement automated dependency scanning using GitHub Dependabot or similar tools. Enable branch protection and require signed commits. Consider generating SBOMs for releases.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

sjwhitworth/golearn

Embed the "Healthy" badge

Onboarding doc

Onboarding: sjwhitworth/golearn

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

🪤Traps & gotchas

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive test coverage for base/conversion.go

Migrate from Travis CI to GitHub Actions with coverage reporting

Add integration tests for clustering algorithms with validation datasets

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next