lightgbm-org/LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Healthy across the board
Permissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 3d ago
- ✓17 active contributors
- ✓Distributed ownership (top contributor 44% of recent commits)
Show 3 more →Show less
- ✓MIT licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/lightgbm-org/lightgbm)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/lightgbm-org/lightgbm on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: lightgbm-org/LightGBM
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/lightgbm-org/LightGBM shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 3d ago
- 17 active contributors
- Distributed ownership (top contributor 44% of recent commits)
- MIT licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live lightgbm-org/LightGBM
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/lightgbm-org/LightGBM.
What it runs against: a local clone of lightgbm-org/LightGBM — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in lightgbm-org/LightGBM | Confirms the artifact applies here, not a fork |
| 2 | License is still MIT | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 33 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of lightgbm-org/LightGBM. If you don't
# have one yet, run these first:
#
# git clone https://github.com/lightgbm-org/LightGBM.git
# cd LightGBM
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of lightgbm-org/LightGBM and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "lightgbm-org/LightGBM(\\.git)?\\b" \\
&& ok "origin remote is lightgbm-org/LightGBM" \\
|| miss "origin remote is not lightgbm-org/LightGBM (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
&& ok "license is MIT" \\
|| miss "license drift — was MIT at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "CMakeLists.txt" \\
&& ok "CMakeLists.txt" \\
|| miss "missing critical file: CMakeLists.txt"
test -f ".github/workflows/python_package.yml" \\
&& ok ".github/workflows/python_package.yml" \\
|| miss "missing critical file: .github/workflows/python_package.yml"
test -f "R-package/DESCRIPTION" \\
&& ok "R-package/DESCRIPTION" \\
|| miss "missing critical file: R-package/DESCRIPTION"
test -f ".ci/setup.sh" \\
&& ok ".ci/setup.sh" \\
|| miss "missing critical file: .ci/setup.sh"
test -f ".github/CODEOWNERS" \\
&& ok ".github/CODEOWNERS" \\
|| miss "missing critical file: .github/CODEOWNERS"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 33 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~3d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/lightgbm-org/LightGBM"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
A software project. See architecture tab.
👥Who it's for
Developers.
🌱Maturity & risk
See activity metrics.
Standard open source risks apply.
Active areas of work
Check recent commits.
🚀Get running
Check README for instructions.
🗺️Map of the codebase
CMakeLists.txt— Main build configuration orchestrating compilation for C++, Python, R, and GPU support; essential for understanding the project's architecture and dependencies..github/workflows/python_package.yml— Defines the Python package CI/CD pipeline; critical for understanding how releases and testing work for the primary Python API.R-package/DESCRIPTION— R package metadata and dependencies; required for contributors working on R bindings and package maintenance..ci/setup.sh— Bootstrap script for CI environments; shows how dependencies are installed and the build environment is prepared across platforms..github/CODEOWNERS— Defines ownership and review responsibilities for different parts of the codebase; essential for navigating contribution workflows.CONTRIBUTING.md— Contribution guidelines and development workflow; every new contributor must understand these before submitting changes.LICENSE— Legal framework (MIT License) governing all contributions and usage; foundational for understanding project obligations.
🧩Components & responsibilities
- CMakeLists.txt — Orchestrates platform detection, dependency discovery, and build target generation for C++ core, Python/R bindings, and
🛠️How to make changes
Add a new CI/CD workflow for a language or platform
- Create a new GitHub Actions workflow file in .github/workflows/ (
.github/workflows/your_workflow_name.yml) - Define build matrix, dependencies, and test commands matching patterns from existing workflows (e.g., cpp.yml or python_package.yml) (
.github/workflows/your_workflow_name.yml) - Add corresponding setup/build scripts in .ci/ directory if complex logic is needed (
.ci/test-your_platform.sh) - Update CODEOWNERS to assign reviewers for new platform-specific files (
.github/CODEOWNERS)
Add a new R demo or example
- Create a new .R file in R-package/demo/ directory (
R-package/demo/your_demo_name.R) - Register the demo in the index file with a description (
R-package/demo/00Index) - Follow the pattern from existing demos (e.g., basic_walkthrough.R or cross_validation.R) (
R-package/demo/basic_walkthrough.R)
Update Python or R dependencies for testing
- For Python, modify the appropriate requirements file based on test scope (
.ci/pip-envs/requirements-latest.txt) - For R, update the DESCRIPTION file with new Imports or Suggests entries (
R-package/DESCRIPTION) - Add corresponding conda environment specification if using conda (
.ci/conda-envs/ci-core.txt)
Contribute code changes following project standards
- Read the contribution workflow and development setup (
CONTRIBUTING.md) - Set up pre-commit hooks to enforce code quality locally (
.pre-commit-config.yaml) - Ensure code follows editor and linting standards defined in the config files (
.editorconfig) - Submit PR and ensure it passes the relevant CI/CD workflows defined in .github/workflows/ (
.github/workflows/)
🔧Why these technologies
- C++ — Core gradient boosting engine requiring high performance and low-level optimizations for decision tree training on large datasets
- CMake — Cross-platform build system supporting Windows, macOS, Linux with GPU backends (CUDA, OpenCL) and multiple language bindings
- GitHub Actions — Native CI/CD integration for automated testing across multiple platforms, languages, and dependency versions at scale
- Python & R — Primary user-facing APIs for the ML ecosystem; allows users to leverage LightGBM in their preferred statistical/ML environment
- CUDA/OpenCL — GPU acceleration support for training large-scale models; optional but critical for performance-sensitive deployments
⚖️Trade-offs already made
-
Single C++ core with multiple language bindings rather than per-language reimplementation
- Why: Reduces maintenance burden and ensures algorithm consistency across Python, R, and other languages
- Consequence: Language-specific features are limited to wrappers; deep algorithmic customization requires C++ knowledge
-
Multi-language test matrices (Python latest/oldest, R on multiple versions) in CI/CD
- Why: Ensures backward compatibility and works across dependency versions users may have installed
- Consequence: CI/CD runtime is longer; more infrastructure cost but higher stability guarantee
-
Optional GPU support (CUDA/OpenCL) via build flags rather than mandatory
- Why: Reduces dependency burden for CPU-only users and supports diverse hardware (NVIDIA, AMD, Intel)
- Consequence: Adds build complexity and requires conditional compilation paths that must be tested separately
🚫Non-goals (don't propose these)
- Real-time model serving or production inference optimization (focus is on training)
- Platform-agnostic abstraction (C++ core bindings expose platform-specific features like GPU)
- Automatic hyperparameter tuning (users must implement or use external AutoML frameworks)
- Handling of missing data imputation (assumed to be done in preprocessing)
- Native support for streaming data (assumes batch training; online learning not a design goal)
🪤Traps & gotchas
Standard debugging applies.
🏗️Architecture
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive SWIG binding tests for all language interfaces
The repo has .github/workflows/swig.yml workflow but lacks dedicated test coverage for SWIG-generated bindings. Given that LightGBM supports Python, R, and potentially other languages via SWIG, there should be integration tests verifying that language bindings correctly expose core functionality. This is critical for maintaining API consistency across language interfaces.
- [ ] Examine existing test structure in tests/ directory to understand current test patterns
- [ ] Create tests/swig/ directory with binding validation tests
- [ ] Add tests that verify SWIG-generated Python bindings match the C++ core API (parameter validation, return types, exception handling)
- [ ] Add tests that verify R package bindings (beyond current R-package/R/ tests) handle edge cases
- [ ] Integrate new tests into
.github/workflows/swig.ymlto run on every SWIG-related change
Create missing OpenCL platform-specific CI workflow and platform validation
The repo has .ci/install-opencl.ps1 for Windows and mentions OpenCL in workflows, but lacks a dedicated OpenCL validation workflow similar to .github/workflows/cuda.yml. Given that OpenCL is an important acceleration path, there should be a dedicated workflow that tests OpenCL builds on compatible platforms, plus pre-commit checks to validate OpenCL code paths aren't broken.
- [ ] Create
.github/workflows/opencl.ymlfollowing the pattern ofcuda.yml - [ ] Add matrix strategy testing OpenCL builds on Linux (with Intel/AMD OpenCL implementations) and Windows
- [ ] Create
.ci/check-opencl-builds.shto validate OpenCL-specific compilation flags and runtime behavior - [ ] Add OpenCL device availability checks and fallback testing to
.ci/test.sh - [ ] Update
.github/workflows/optional_checks.ymlto include OpenCL validation if it's an optional check
Add missing CI validation for Python distribution artifacts and wheel compatibility
While .ci/check-python-dists.sh exists, the repo lacks a dedicated workflow to validate that generated wheels are compatible across Python versions and platforms before release. The .ci/pip-envs/requirements-oldest.txt and requirements-latest.txt suggest version compatibility testing, but there's no workflow verifying wheel ABI compatibility, platform tags, or installation success across the full matrix.
- [ ] Create
.github/workflows/python_wheel_validation.ymlthat builds wheels for multiple Python versions (3.8-3.12+) - [ ] Add validation step in workflow that tests wheel installation across different platform configurations (Linux glibc versions, macOS versions, Windows)
- [ ] Create
.ci/validate-wheel-compat.pyscript to inspect wheel metadata (tags, dependencies, ABI markers) and verify correctness - [ ] Test backward compatibility by installing generated wheels in both
requirements-oldest.txtandrequirements-latest.txtenvironments - [ ] Add check for wheel size anomalies and missing symbols using
auditwheel(Linux) anddelocate(macOS)
🌿Good first issues
Check the issue tracker.
⭐Top contributors
Click to expand
Top contributors
- @jameslamb — 44 commits
- @StrikerRUS — 19 commits
- @wagner-austin — 9 commits
- @dependabot[bot] — 7 commits
- @daguirre11 — 6 commits
📝Recent commits
Click to expand
Recent commits
2a675f2— [ci]: Bump the ci-dependencies group with 3 updates (#7252) (dependabot[bot])33e90a5— [ci] update to r-lib/actions v2.12.0, align 'pip' and 'python' to same interpreter in 'build-python.sh' (#7249) (jameslamb)9545905— [ci] cut some macOS jobs (#7223) (jameslamb)a898cfc— [docs] add LightGBM-MoE to external repositories list (#7247) (kyo219)6d7d06e— [c++] mark a few more read-only methods const (#7228) (jameslamb)0c4c50a— [docs] add Michael Mayer to CODEOWNERS and docs (#7239) (jameslamb)2ccb9fd— [python-package] fix misleading redundant parameter warnings in Booster.refit() (#7124) (arjunprakash027)4472f39— [ci] adapt to scikit-learn ClassifierChain changes, fix {fs} install, work around pyarrow type-checking issues (#7236) (jameslamb)9fed960— [ci] [R-package] drop 'icc' test job, update clang and GCC r-hub container jobs (#7222) (jameslamb)f72ac26— [ci] remove uses of azurecr.io for CI images (#7199) (jameslamb)
🔒Security observations
The LightGBM project demonstrates a foundational security posture with proper vulnerability disclosure policies and modern CI/CD infrastructure. However, several areas require improvement: the vulnerability response process lacks specific SLA commitments, dependency management practices need clarification, and security documentation could be more comprehensive. The project benefits from automated workflows and Dependabot integration but should strengthen artifact verification, implement stricter dependency pinning in CI/CD pipelines, and provide alternative security contact methods. No critical or high-severity vulnerabilities were identified based on the provided file structure, but deeper analysis of actual implementation code is recommended, particularly around input validation in Python/R bindings and C++ core.
- Low · Incomplete Security.md Disclosure Policy —
SECURITY.md. The SECURITY.md file states 'This project is staffed exclusively by volunteers' without defining specific SLA or response timeframes for security vulnerability reports. This could lead to unclear expectations and potential delays in vulnerability remediation. Fix: Add specific response time commitments (e.g., 'We will acknowledge reports within 7 days') and define a clear timeline for coordinating disclosure (e.g., '90 days to patch before public disclosure'). - Low · Reliance on External GitHub Security Features —
SECURITY.md. The security reporting process relies entirely on GitHub's private vulnerability reporting feature, which creates a dependency on a third-party platform. If GitHub's feature has issues or the account is compromised, reports could be lost. Fix: Provide an alternative security contact email address (security@lightgbm.org) for researchers who prefer direct communication outside of GitHub. - Low · Missing Security Headers Documentation —
Repository root. No documentation found regarding security headers, HTTPS enforcement, or secure communication practices for the project's infrastructure (documentation sites, CI/CD pipelines). Fix: Add security best practices documentation covering: HTTPS enforcement, security headers for all web properties, and secure CI/CD configuration guidelines. - Low · CI/CD Pipeline Dependency Management —
.ci/ and .github/workflows/ directories. Multiple CI/CD scripts (.ci/.sh, .github/workflows/.yml) exist but there's no visible evidence of dependency pinning or lock files in the partial file structure provided, which could lead to supply chain vulnerabilities. Fix: Implement strict version pinning for all external dependencies in CI/CD workflows. Use lock files (requirements.lock, composer.lock equivalent) and regularly audit dependencies with tools like Dependabot (partially visible in .github/dependabot.yml). - Low · Build Artifacts Handling —
.ci/download-artifacts.sh. The script '.ci/download-artifacts.sh' exists but without visibility into its implementation, there's a potential risk of downloading untrusted or manipulated artifacts. Fix: Ensure artifact verification uses cryptographic checksums (SHA-256) or signed artifacts. Validate the source and integrity of all downloaded artifacts. - Low · Incomplete .gitignore Visibility —
.gitignore. The .gitignore file is listed but not provided for analysis. This could allow accidental commit of sensitive files like .env, API keys, or build artifacts. Fix: Verify .gitignore includes entries for: .env*, secrets, *.key, *.pem, build directories, and temporary files. Implement pre-commit hooks to prevent secret commits.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.