RepoPilotOpen in app →

cjlin1/libsvm

LIBSVM -- A Library for Support Vector Machines

Healthy

Healthy across all four use cases

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 4mo ago
  • 27+ active contributors
  • Distributed ownership (top contributor 22% of recent commits)
Show all 7 evidence items →
  • BSD-3-Clause licensed
  • CI configured
  • Slowing — last commit 4mo ago
  • No test directory detected

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/cjlin1/libsvm)](https://repopilot.app/r/cjlin1/libsvm)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/cjlin1/libsvm on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: cjlin1/libsvm

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/cjlin1/libsvm shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

  • Last commit 4mo ago
  • 27+ active contributors
  • Distributed ownership (top contributor 22% of recent commits)
  • BSD-3-Clause licensed
  • CI configured
  • ⚠ Slowing — last commit 4mo ago
  • ⚠ No test directory detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live cjlin1/libsvm repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/cjlin1/libsvm.

What it runs against: a local clone of cjlin1/libsvm — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in cjlin1/libsvm | Confirms the artifact applies here, not a fork | | 2 | License is still BSD-3-Clause | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 160 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>cjlin1/libsvm</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of cjlin1/libsvm. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/cjlin1/libsvm.git
#   cd libsvm
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of cjlin1/libsvm and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "cjlin1/libsvm(\\.git)?\\b" \\
  && ok "origin remote is cjlin1/libsvm" \\
  || miss "origin remote is not cjlin1/libsvm (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(BSD-3-Clause)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"BSD-3-Clause\"" package.json 2>/dev/null) \\
  && ok "license is BSD-3-Clause" \\
  || miss "license drift — was BSD-3-Clause at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "svm.h" \\
  && ok "svm.h" \\
  || miss "missing critical file: svm.h"
test -f "svm.cpp" \\
  && ok "svm.cpp" \\
  || miss "missing critical file: svm.cpp"
test -f "svm-train.c" \\
  && ok "svm-train.c" \\
  || miss "missing critical file: svm-train.c"
test -f "svm-predict.c" \\
  && ok "svm-predict.c" \\
  || miss "missing critical file: svm-predict.c"
test -f "python/libsvm/svm.py" \\
  && ok "python/libsvm/svm.py" \\
  || miss "missing critical file: python/libsvm/svm.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 160 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~130d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/cjlin1/libsvm"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

LIBSVM is a widely-used C/C++ library that implements Support Vector Machines (SVM) for classification, regression, and one-class learning. It solves C-SVM classification, nu-SVM, epsilon-SVM regression, nu-SVM regression, and one-class SVM problems via sequential minimal optimization (SMO). The library includes command-line tools (svm-train, svm-predict, svm-scale) and bindings for Java, Python, and MATLAB, with automatic parameter selection and model selection capabilities. Monolithic library organized by language binding: svm.h and svm.cpp contain the core SMO solver; svm-train.c, svm-predict.c, svm-scale.c are standalone CLI tools; language-specific implementations in java/, python/, matlab/ directories each wrap the core C library or reimplement it. Python binding uses ctypes to call compiled C code; Java version regenerated from svm.m4 template.

👥Who it's for

Machine learning researchers and practitioners building SVM-based classifiers or regressors who need a battle-tested, production-grade implementation. Data scientists using Python (via python/libsvm) or Java (via java/libsvm) for rapid experimentation. System integrators who need a lightweight, language-agnostic SVM solver they can embed or call from C/C++.

🌱Maturity & risk

Production-ready and stable. The codebase is old and well-established (from National Taiwan University's ML lab), with a professional structure including Windows binaries pre-built, multiple language bindings, CI via GitHub Actions (wheel.yml), and comprehensive documentation. No recent commits visible in the file list, but this reflects a mature, feature-complete project rather than active development.

Low risk for core functionality, but single-maintainer risk (maintained by cjlin1) and infrequent updates mean new bugs or compatibility issues may have long response times. The M4 templating in java/libsvm/svm.m4 suggests build system complexity that could break on modern toolchains. No visible automated test suite in the file list, relying instead on manual testing and user feedback.

Active areas of work

Minimal active development. The wheel.yml GitHub Action suggests packaging for PyPI is automated, but the file list shows no recent commits or open PRs. The project appears in maintenance mode: accepting critical bug fixes but not pursuing new features.

🚀Get running

Clone and build on Unix: git clone https://github.com/cjlin1/libsvm.git && cd libsvm && make. This compiles svm-train, svm-predict, svm-scale executables and libsvm.a. For Python: cd python && make or pip install -e .. For Java: cd java && make builds libsvm.jar. Test with: ./svm-train -h or try the included heart_scale sample dataset.

Daily commands: For CLI: ./svm-train -t 2 heart_scale heart_scale.model trains an RBF SVM; ./svm-predict heart_scale heart_scale.model output predicts. For Python: python -c "from libsvm.svmutil import *; y, x = svm_read_problem('heart_scale'); m = svm_train(y, x); svm_predict(y, x, m)". For Java: java -cp java/libsvm.jar libsvm.svm_train shows usage.

🗺️Map of the codebase

  • svm.h — Core SVM library header defining structures (svm_problem, svm_parameter, svm_model, svm_node) and public API that all implementations depend on.
  • svm.cpp — Main SVM training and prediction implementation containing the solver algorithms (SMO, coordinate descent) and kernel computations—the computational heart of the library.
  • svm-train.c — Command-line training interface that parses arguments and orchestrates the training pipeline; entry point for most users.
  • svm-predict.c — Command-line prediction interface that loads models and applies them to test data; critical user-facing tool.
  • python/libsvm/svm.py — Python binding to C library via ctypes; bridges the high-level Python API to native SVM implementation.
  • java/libsvm/svm.java — Pure Java port of SVM algorithm; provides self-contained SVM training/prediction without C dependencies.
  • tools/easy.py — High-level automation tool that orchestrates data scaling, parameter selection, and model training—recommended entry point for new users.

🛠️How to make changes

Add a New Kernel Function

  1. Define kernel ID and parameters in the SVM_KERNEL_* enum and svm_parameter structure (svm.h)
  2. Implement kernel computation logic in the kernel() function within the Solver class (svm.cpp)
  3. Add parameter parsing in parse_command_line() to recognize new kernel type flag (svm-train.c)
  4. Mirror the kernel implementation in Java if supporting JVM usage (java/libsvm/svm.java)
  5. Update tools/easy.py grid.py parameter ranges to include new kernel type (tools/grid.py)

Add Custom Preprocessing Pipeline

  1. Create new utility function in commonutil.py for feature transformation (python/libsvm/commonutil.py)
  2. Extend svmutil.py with wrapper combining preprocessing and training (python/libsvm/svmutil.py)
  3. Update easy.py to call preprocessing before scaling and training (tools/easy.py)

Implement Automatic Model Selection

  1. Add new parameter grid definition in grid.py covering C and kernel hyperparameters (tools/grid.py)
  2. Implement cross-validation loop that trains models and evaluates on validation folds (tools/grid.py)
  3. Integrate into easy.py workflow to automatically select best parameters (tools/easy.py)

Support New Input Data Format

  1. Add parser function in commonutil.py or new utility to convert format to libsvm sparse vector format (python/libsvm/commonutil.py)
  2. Update svmutil.py to accept new format in read_problem() wrapper (python/libsvm/svmutil.py)
  3. Document format in tools/README with example conversion (tools/README)

🔧Why these technologies

  • C/C++ Core (svm.cpp) — Provides optimal computational performance for iterative solver (SMO) and large-scale kernel matrix operations; critical for practical SVM training on real datasets.
  • ctypes Python FFI (python/libsvm/svm.py) — Enables pure Python bindings without compilation overhead; allows users to install via pip without C compiler dependency while still leveraging fast C implementation.
  • Pure Java Port (java/libsvm/svm.java) — Provides self-contained SVM implementation for JVM environments without native library dependencies; improves portability across platforms lacking C toolchains.
  • Sparse Vector Format (libsvm data format) — Efficient representation for high-dimensional sparse problems common in NLP/text classification; reduces memory footprint and training time compared to dense formats.
  • Kernel Caching (svm.cpp) — Pre-computes and caches kernel values during training to avoid redundant computation; critical optimization reducing overall training time by orders of magnitude.

⚖️Trade-offs already made

  • Libsvm as library vs. standalone tool-only system

    • Why: Library design maximizes reusability; enables integration into larger ML pipelines and different language ecosystems.
    • Consequence: Added complexity in C/Java implementations; requires bindings maintenance across platforms. Alternative would be simpler as CLI-only tool.
  • Sequential SMO solver vs. distributed/parallel training

    • Why: SMO is simple, deterministic, and works well for datasets fitting in RAM; avoids network overhead and synchronization complexity.
    • Consequence: Does not scale to multi-node distributed training; single-machine CPU-bound for very large datasets. Users must pre-sample data or parallelize externally.
  • Fixed kernel types (RBF, polynomial, sigmoid, linear) vs. custom kernel plugins

    • Why: Pre-built kernels cover most practical use cases; reduces API surface and maintenance burden.
    • Consequence: Users with domain-specific kernels must modify source code or use precomputed kernel workaround; less flexible than kernel plugin architecture.
  • In-memory model format vs. streaming/incremental learning

    • Why: Simpler API and faster serialization; standard for batch SVM usage where data fits in RAM.
    • Consequence: Cannot train on streaming data or datasets larger than RAM; requires data resampling for online learning scenarios.

🚫Non-goals (don't propose these)

  • Does not support GPU acceleration or distributed training
  • Does not provide deep learning or neural network functionality
  • Does not handle missing values or categorical features automatically
  • Does not include built-in hyperparameter optimization beyond grid search (easy.py tool)
  • Not designed for real-time streaming classification (batch prediction only)
  • Does not provide visualization beyond the toy 2D interactive demo

🪤Traps & gotchas

M4 template regeneration: java/svm.java is auto-generated from java/libsvm/svm.m4 — direct edits to svm.java will be lost on rebuild; edit the .m4 file instead. Sparse index 1-indexing: feature indices start at 1, not 0 (exception: precomputed kernel uses 0-indexing). libsvm expects sorted feature indices; data is not validated at parse time and will silently produce wrong results if unsorted. Python ctypes FFI requires a compiled libsvm.so/.dll in the load path; python/Makefile handles this but custom installs may fail. Java bindings use native method calls or pure Java implementation — check which version you're importing (libsvm/svm.java vs native .so wrappers).

🏗️Architecture

💡Concepts to learn

  • scikit-learn/scikit-learn — scikit-learn wraps libsvm's SVM classes (sklearn.svm.SVC/SVR) as a backend and is the de facto way most Python users access SVMs today; good alternative if you want integrated preprocessing and hyperparameter tuning
  • JetBrains-Research/kotlin-libsvm — Kotlin wrapper and JVM alternative for LIBSVM; relevant if you're building JVM applications and want better language ergonomics than raw Java bindings
  • pytorch/pytorch — PyTorch includes torch.nn.functional with SVM-like loss functions (hinge loss, margin loss) if you want to build SVM-like models in a deep learning framework
  • nicolargo/libsvm-official-mirror — Maintained mirror of LIBSVM on GitHub; useful if cjlin1/libsvm is hard to reach or you need CI/GitHub integration
  • cjlin1/liblinear — By the same author as LIBSVM; a faster alternative for linear classification and regression when you don't need kernels; shares similar API design

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for Python interface (python/libsvm/)

The Python interface lacks formal unit tests. Currently, there's no test/ directory or pytest configuration. Given that Python is a primary interface for ML practitioners, adding tests for svm.py, svmutil.py, and commonutil.py would catch regressions, validate cross-platform behavior, and improve contributor confidence. This is especially important since the wheel.yml workflow exists but has no test stage.

  • [ ] Create tests/ directory at repo root with Python test structure
  • [ ] Write unit tests for python/libsvm/svm.py covering model training/prediction/serialization
  • [ ] Write unit tests for python/libsvm/svmutil.py covering utility functions
  • [ ] Add pytest configuration (pytest.ini or setup.cfg)
  • [ ] Update .github/workflows/wheel.yml to run pytest before building wheels
  • [ ] Update python/README with testing instructions

Add Java unit tests and update CI pipeline for java/libsvm/

The Java interface (java/libsvm/) has multiple classes (svm.java, svm_parameter.java, svm_model.java, etc.) but no corresponding test suite. The wheel.yml workflow only handles Python/wheel building. Adding Java tests with JUnit would validate the Java binding works correctly across versions, especially important since java/libsvm.jar is a pre-compiled binary that needs verification.

  • [ ] Create java/test/ directory structure with JUnit 4 or 5 setup
  • [ ] Write test classes for java/libsvm/svm.java covering SVM operations
  • [ ] Write test classes for java/libsvm/svm_parameter.java parameter validation
  • [ ] Add java/pom.xml or java/build.gradle for dependency management
  • [ ] Create new .github/workflows/java.yml to compile and run tests on PR
  • [ ] Update java/README with testing instructions

Add integration tests validating cross-language consistency (C/Python/Java/MATLAB)

LIBSVM supports multiple language bindings (C, Python, Java, MATLAB) but there's no validation that they produce identical results on the same dataset. This is critical for a multi-language library—a bug in Python bindings should be caught before release. The heart_scale sample file exists but isn't used in any formal test suite. Adding cross-language integration tests would catch binding bugs and ensure API consistency.

  • [ ] Create tests/integration/ directory with test datasets (use heart_scale and others)
  • [ ] Write integration test script (bash/Python) that trains identical models in C, Python, and Java
  • [ ] Compare outputs (predictions, accuracy, model parameters) across all languages
  • [ ] Add .github/workflows/integration.yml to run these tests on PR
  • [ ] Document expected behavior in tests/integration/README.md
  • [ ] Include Java compilation step to ensure java/libsvm/ builds correctly in CI

🌿Good first issues

  • Add automated unit tests for core solver: svm.cpp has no test suite visible in the file list. Create tests/test_solver.c using a testing framework (CUnit or similar) to verify SMO convergence, kernel computations, and edge cases like single-class data.: Improves reliability and makes future contributions safer
  • Document M4 templating process: java/libsvm/svm.m4 is opaque to most contributors. Add a tools/BUILD_JAVA.md explaining how to regenerate java/libsvm/svm.java and why the template exists.: Reduces friction for Java-focused maintainers and prevents accidental direct edits to generated code
  • Add input validation and error handling to python/libsvm/svm.py: ctypes calls to C code provide no bounds checking. Wrap svm_train and svm_predict to validate feature indices are sorted and 1-indexed, catching common data format errors early.: Reduces cryptic silent failures and improves user experience for Python users

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 6b90713 — Update libsvm.jar for 3.37 release (chcwww)
  • effcfa4 — Update the version number to 3.37 (chcwww)
  • f4cc58f — update wheel.yml to support pre-built wheels for newer python versions and various platforms on pypi (chcwww)
  • 9b13bae — Align naming of prediction results in python interface (wjustin784)
  • f95bfa0 — add file path and file object examples for svm_read_problem() in python/README (chcwww)
  • c8c2148 — fix bug that can't run predict() without scipy installed in python interface (chcwww)
  • 65367d0 — Update OS and Artifact settings for building wheel (wjustin784)
  • 72a48b8 — Update Windows binaries and libsvm.jar for 3.36 release (wjustin784)
  • f972c4e — Update the version number to 3.36 (wjustin784)
  • d8a2465 — sort the column indices of the csr matrix to meet LIBSVM data requirement (chcwww)

🔒Security observations

  • High · Unvalidated External Input in Python Tools — python/libsvm/svmutil.py, tools/easy.py, tools/grid.py, tools/subset.py, tools/checkdata.py. The Python tools (tools/easy.py, tools/grid.py, tools/subset.py, tools/checkdata.py) process user-supplied data files without apparent input validation. These tools parse SVM data files and could be vulnerable to malformed input attacks, buffer overflows, or denial of service attacks through specially crafted input files. Fix: Implement strict input validation and sanitization for all data file parsing. Add checks for file size limits, format validation, and handle exceptions gracefully. Consider using a robust parser library with built-in protections.
  • High · Native Code Execution via C Extensions — svm.cpp, svm-train.c, svm-predict.c, matlab/svmtrain.c, matlab/svmpredict.c, python/setup.py (builds C extensions). The repository contains multiple C/C++ implementations (svm.cpp, svm-train.c, svm-predict.c, MATLAB/Python bindings) that interact with user-supplied data. Without proper bounds checking and input validation in the C code, there is potential for buffer overflows, format string vulnerabilities, or other memory safety issues. Fix: Conduct a thorough security code review of all C/C++ code focusing on buffer management, array bounds checking, and input validation. Consider using static analysis tools (e.g., Clang Static Analyzer, Coverity). Enable compiler security flags (-fstack-protector-all, -D_FORTIFY_SOURCE=2).
  • High · Lack of Input Validation in Data Parsing — matlab/libsvmread.c, matlab/libsvmwrite.c, python/libsvm/svmutil.py, python/libsvm/commonutil.py. The libsvm data format parser (evident from libsvmread/libsvmwrite in matlab and Python implementations) does not appear to have comprehensive validation. Malformed label values, feature indices, or feature values could lead to crashes or unexpected behavior. Fix: Implement comprehensive input validation including: range checking for labels and feature values, validation of feature indices, rejection of negative indices, strict format enforcement, and error handling with informative error messages.
  • Medium · Pre-built Binaries Without Verification — windows/*.exe, windows/*.dll, windows/*.mexw64, libsvm.jar. The repository contains pre-built binaries (windows/svm-train.exe, windows/svm-predict.exe, windows/libsvm.dll, etc.) without checksums or signature verification mechanisms. These binaries could be tampered with or replaced with malicious versions. Fix: Provide cryptographic checksums (SHA-256) for all pre-built binaries. Implement code signing for binary releases. Consider removing pre-built binaries from the repository and directing users to build from source or download from a verified distribution channel.
  • Medium · Missing Security Headers in Documentation — README, FAQ.html. The FAQ.html file in the root directory may contain outdated security information or lack HTTPS enforcement guidance. The README points to an external URL (http://www.csie.ntu.edu.tw/~cjlin/libsvm) without HTTPS. Fix: Update all external URLs to use HTTPS. Add security documentation advising users on safe installation practices, input validation, and secure usage patterns.
  • Medium · Insufficient Bounds Checking in Model Serialization — matlab/svm_model_matlab.c, matlab/svm_model_matlab.h, svm.cpp (model I/O functions). The svm_model serialization and deserialization logic (used in svm_model_matlab.c and Python implementations) could be vulnerable to attacks through crafted model files. Insufficient validation of model file format could lead to out-of-bounds access or integer overflows. Fix: Add strict validation when loading model files. Check for file size consistency, validate array dimensions before allocation, implement size limit checks, and use safe integer arithmetic to prevent overflows.
  • Medium · Build System Security Concerns — undefined. The Makefile and setup.py don't show explicit security-related build flags. The compilation may not be using compiler protections against common vulnerabilities like stack-based buffer overflows or undefined behavior exploitation. Fix: undefined

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · cjlin1/libsvm — RepoPilot