ggml-org/ggml
Tensor library for machine learning
Healthy across the board
Permissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 2d ago
- ✓51+ active contributors
- ✓Distributed ownership (top contributor 22% of recent commits)
Show 3 more →Show less
- ✓MIT licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/ggml-org/ggml)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/ggml-org/ggml on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: ggml-org/ggml
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/ggml-org/ggml shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 2d ago
- 51+ active contributors
- Distributed ownership (top contributor 22% of recent commits)
- MIT licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live ggml-org/ggml
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/ggml-org/ggml.
What it runs against: a local clone of ggml-org/ggml — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in ggml-org/ggml | Confirms the artifact applies here, not a fork |
| 2 | License is still MIT | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 32 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of ggml-org/ggml. If you don't
# have one yet, run these first:
#
# git clone https://github.com/ggml-org/ggml.git
# cd ggml
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of ggml-org/ggml and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "ggml-org/ggml(\\.git)?\\b" \\
&& ok "origin remote is ggml-org/ggml" \\
|| miss "origin remote is not ggml-org/ggml (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
&& ok "license is MIT" \\
|| miss "license drift — was MIT at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "CMakeLists.txt" \\
&& ok "CMakeLists.txt" \\
|| miss "missing critical file: CMakeLists.txt"
test -f "README.md" \\
&& ok "README.md" \\
|| miss "missing critical file: README.md"
test -f "examples/simple/simple-ctx.cpp" \\
&& ok "examples/simple/simple-ctx.cpp" \\
|| miss "missing critical file: examples/simple/simple-ctx.cpp"
test -f "examples/gpt-2/main-backend.cpp" \\
&& ok "examples/gpt-2/main-backend.cpp" \\
|| miss "missing critical file: examples/gpt-2/main-backend.cpp"
test -f "cmake/common.cmake" \\
&& ok "cmake/common.cmake" \\
|| miss "missing critical file: cmake/common.cmake"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 32 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~2d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/ggml-org/ggml"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
GGML is a low-level C/C++ tensor library optimized for machine learning inference, with a focus on integer quantization support and cross-platform hardware acceleration (CPU, CUDA, Metal, WebGPU). It powers inference engines in projects like llama.cpp and whisper.cpp by providing automatic differentiation, optimized operators, and zero runtime memory allocations. Single monorepo structure: core tensor engine in src/ (likely, inferred from CMakeLists.txt), examples/ contains runnable demos (gpt-2/, gpt-j/, mnist/, magika/), cmake/ holds build configuration, docs/ contains format specs (gguf.md). Backend-agnostic design with pluggable compute backends (CUDA kernels in cuda/, Metal in metal/, etc).
👥Who it's for
Machine learning engineers and systems developers building efficient inference engines who need a dependency-free tensor computation backend with quantization support and broad hardware compatibility—particularly those targeting edge devices or production deployments where model size and speed matter.
🌱Maturity & risk
Actively developed but explicitly in flux—the README notes that core development happens in downstream repos (llama.cpp, whisper.cpp), so stability is conditional. Strong CI/CD presence (.github/workflows/) and zero third-party dependencies suggest production readiness for inference, but the scattered development pattern creates unpredictability.
Split governance is the main risk: active work in llama.cpp and whisper.cpp repos means ggml core may have uncoordinated changes or delayed integration. The 3M+ lines of C and 8M+ of C++ suggest complexity, but lack of visible issue tracker metrics in provided data makes backlog assessment difficult. Broad hardware support (CUDA, Metal, WebGPU) increases maintenance surface.
Active areas of work
The README explicitly states development is happening in llama.cpp and whisper.cpp repos, so this repo may be receiving backports or sync changes from there. CI pipeline in .github/workflows/ci.yml runs on commits, and release automation exists (.github/workflows/release.yml), but exact recent activity is not visible in provided file list.
🚀Get running
git clone https://github.com/ggml-org/ggml
cd ggml
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
mkdir build && cd build
cmake .. && cmake --build . --config Release -j 8
Daily commands:
After build, run GPU inference: ../examples/gpt-2/download-ggml-model.sh 117M && ./bin/gpt-2-backend -m models/gpt-2-117M/ggml-model.bin -p "prompt". See examples/gpt-2/README.md and examples/gpt-j/README.md for model-specific setup.
🗺️Map of the codebase
CMakeLists.txt— Root build configuration defining how GGML compiles across platforms with backend support (Metal, CUDA, etc.)—essential for understanding project architecture.README.md— Project manifesto and feature overview that clarifies GGML's scope (low-level tensor ops, quantization, no deps) and development status.examples/simple/simple-ctx.cpp— Canonical minimal example showing tensor allocation, graph building, and inference—reference implementation for the core API.examples/gpt-2/main-backend.cpp— Production-level example using backend abstraction for GPU/CPU inference, demonstrating recommended patterns for contributors.cmake/common.cmake— Shared CMake utilities for backend detection and compiler flags across all build targets.docs/gguf.md— File format specification for GGML serialized models—critical for understanding quantization and model loading.CONTRIBUTING.md— Development guidelines, code style, and PR process expectations for contributors.
🛠️How to make changes
Add a new quantization format
- Define the quantization scheme constants and bit-packing logic in the core GGML header (likely src/ggml.h or similar — not visible in file list but referenced by examples) (
examples/gpt-2/quantize.cpp) - Update GGUF serialization to recognize the new type in format specification (
docs/gguf.md) - Add conversion utility in example scripts, following the pattern of existing converters (
examples/gpt-2/convert-h5-to-ggml.py) - Update CMakeLists.txt to compile quantization codepath if backend-specific (
CMakeLists.txt) - Add test case in examples/simple to verify quantized tensor inference (
examples/simple/simple-ctx.cpp)
Add support for a new hardware backend
- Create backend detection logic in cmake/common.cmake (or new cmake/FindNEWBACKEND.cmake) (
cmake/common.cmake) - Conditionally set compiler flags and link libraries in root CMakeLists.txt (
CMakeLists.txt) - Create a reference inference example using the backend abstraction pattern from gpt-2/main-backend.cpp (
examples/gpt-2/main-backend.cpp) - Document backend-specific build flags and performance tuning in README or a new backend guide (
README.md) - Add workflow step to ci.yml to test the new backend in CI (
.github/workflows/ci.yml)
Add a new model training example
- Create a new subdirectory under examples/ (e.g., examples/mymodel/) with CMakeLists.txt (
examples/mnist/CMakeLists.txt) - Implement main training binary using GGML's autodiff and optimizer APIs, reference mnist-train.cpp (
examples/mnist/mnist-train.cpp) - Add dataset download/preparation script following examples/gpt-2/download-model.sh pattern (
examples/gpt-2/download-model.sh) - Create model conversion script from source format to GGML/GGUF (
examples/gpt-2/convert-h5-to-ggml.py) - Write README.md in the example directory with usage instructions and expected outputs (
examples/mnist/README.md)
Extend the Python API
- Add C function declarations to the FFI layer in cffi.py following the pattern of existing tensors ops (
examples/python/ggml/cffi.py) - Implement Python wrapper in init.py exposing the new functionality with type hints (
examples/python/ggml/__init__.py) - Update .pyi stub file to match the new API signatures (
examples/python/ggml/__init__.pyi) - Add test case demonstrating the new API in test_tensor.py (
examples/python/test_tensor.py)
🔧Why these technologies
- C/C++ core library — Ensures zero-copy, minimal-allocation tensor ops with fine-grained hardware control for embedded and server inference.
- CMake build system — Enables portable cross-platform compilation with optional backend detection (Metal, CUDA, NCCL) without vendoring dependencies.
- Python bindings (CFFI) — Provides Pythonic model conversion and research iteration while keeping compute-critical path in native code.
- GGUF format (docs/gguf.md) — Standardizes quantized model serialization with embedded metadata, enabling interop across inference frameworks.
- Hardware abstraction backends — Decouples graph execution from hardware; same code runs on Metal, CUDA, or CPU with compile-time selection.
⚖️Trade-offs already made
-
No runtime third-party dependencies
- Why: Simplifies deployment, avoids dependency hell, and reduces binary size for embedded systems.
- Consequence: Core library implements algorithms from scratch (e.g., matrix ops, quantization) rather than delegating to BLAS; may lag behind specialized libraries in peak performance tuning.
-
Compile-time backend selection (Metal/CUDA/CPU)
- Why: Reduces binary bloat and eliminates runtime dispatch overhead for single-target deployments.
- Consequence: Users must rebuild GGML to switch backends; no runtime polymorphism across GPU/CPU in a single binary.
-
Focus on inference + training examples rather than framework-level abstractions
- Why: Keeps core tensor library lightweight and composable; delegates model definition to user code.
- Consequence: Developers must manually build compute graphs; no automatic differentiation framework (Autograd-style) in the library itself—users implement via examples.
-
Python API via CFFI instead of tight C++ bindings
- Why: Avoids Python version brittleness and ABI instability; CFFI works across CPython, PyPy, etc.
- Consequence: Less ergonomic than ctypes or pybind11 for complex OOP patterns; requires manual struct mirroring in Python.
🚫Non-goals (don't propose these)
- Not a high-level ML framework (e.g., PyTorch, TensorFlow
🪤Traps & gotchas
Python 3.10 specifically pinned in README (not 3.11+), and venv activation is essential before pip install—missing this causes build-time import failures. GGUF model download scripts are bash-specific (download-ggml-model.sh) so Windows contributors need WSL/Git Bash. CMake backend flags (LLAMA_CUDA=1, LLAMA_METAL=1, etc—if inherited from llama.cpp convention) are not explicitly documented in provided file list. The split development model means breaking changes in llama.cpp may not immediately sync here.
🏗️Architecture
💡Concepts to learn
- Integer Quantization (INT8/INT4) — GGML's core strength is efficient low-precision inference; understanding quantization is essential to why models run 4-10x smaller and faster on consumer hardware
- Operator Fusion & Memory Layout — GGML achieves 'zero runtime allocations' through pre-allocated static buffers and fused ops; contributors need to understand tensor stride conventions and packed memory formats
- Automatic Differentiation (Backpropagation via Tape) — GGML implements AD for training and optimization, using a computation graph tape; essential for understanding the ggml_tensor abstraction and forward/backward passes
- Hardware Abstraction / Backend Pattern — GGML supports CPU, CUDA, Metal, WebGPU through pluggable backend interfaces; new contributors adding hardware support must understand this abstraction layer
- GGUF File Format — GGML's custom model serialization format (documented in docs/gguf.md) encodes quantization metadata, architecture, and weights; understanding this is critical for model loading and conversion scripts
- Thread-Safe Context & Scratch Buffers — GGML's ggml_context API manages pre-allocated tensor graphs and temporary work space; the main-ctx.cpp and main-backend.cpp examples show different context lifetime patterns
- Batch Processing & Scheduling — main-batched.cpp and main-sched.cpp suggest GGML supports dynamic batching and execution scheduling; critical for production inference throughput optimization
🔗Related repos
ggerganov/llama.cpp— Primary downstream consumer and active development hub for GGML—inference engine for LLaMA models where ggml is the core tensor backendggerganov/whisper.cpp— Second major downstream user leveraging GGML for speech-to-text inference with quantization, shares architecture patterns with llama.cppopenai/whisper— Original Whisper model that whisper.cpp ports to GGML; understanding base model is needed for contributor contextpytorch/pytorch— Architectural inspiration for tensor semantics and operator definitions, though GGML is purpose-built for inference-only efficiencyggerganov/gpt4all— User-facing application built on GGML and llama.cpp, demonstrates end-to-end inference workflow and model distribution patterns
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for ggml backend implementations
The repo has multiple backend implementations (Metal, CUDA, CPU) referenced in cmake and examples, but there's no visible test suite structure in the file listing. Adding a tests/ directory with backend-specific unit tests would ensure correctness across hardware targets and catch regressions early. This is critical for a tensor library where numerical accuracy matters.
- [ ] Create tests/ directory structure mirroring src/ layout
- [ ] Add unit tests for core tensor operations (matmul, add, mul) in tests/test_ops.cpp
- [ ] Add backend-specific tests in tests/backends/test_metal.cpp, test_cuda.cpp, test_cpu.cpp
- [ ] Add CMake configuration in CMakeLists.txt to build and run tests via 'cmake --build . --target test'
- [ ] Reference existing examples like examples/mnist/mnist-train.cpp to validate tensor operations work correctly
Add GitHub Actions workflow for multi-platform quantization testing
The repo has examples for quantization (examples/gpt-2/quantize.cpp, examples/gpt-j/quantize.cpp) and claims 'Integer quantization support' as a feature, but ci/run.sh and .github/workflows/ci.yml are not shown in detail. A dedicated workflow testing quantization across Linux, macOS, and Windows would validate this core feature works consistently.
- [ ] Create .github/workflows/quantization-test.yml
- [ ] Add job matrix for ubuntu-latest, macos-latest, windows-latest
- [ ] Build quantize example: 'cmake --build . --target quantize'
- [ ] Download small test model via examples/gpt-2/download-ggml-model.sh
- [ ] Run quantization and verify output file size reduction matches expected ratios
- [ ] Add artifact upload for quantized models to workflow summary
Add documentation for backend selection and performance tuning in docs/
The repo supports multiple backends (Metal, CUDA, CPU) with examples like examples/perf-metal/perf-metal.cpp, but there's no visible backend selection guide. New contributors won't know how to enable/disable backends, benchmark them, or choose which to use. This documentation gap slows onboarding.
- [ ] Create docs/backends.md explaining each backend's requirements (CUDA SDK version, Metal iOS/macOS, CPU-only fallback)
- [ ] Add cmake flags reference showing -DGGML_METAL=ON, -DGGML_CUDA=ON, etc.
- [ ] Add performance tuning section referencing examples/perf-metal/perf-metal.cpp and how to run similar benchmarks
- [ ] Create docs/contributing-backends.md with minimal example of adding a new backend (mirroring structure of cmake/FindNCCL.cmake pattern)
- [ ] Link to this from CONTRIBUTING.md
🌿Good first issues
- Add Python bindings or ctypes wrapper in a new python/ directory for tensor operations, since the codebase lacks visible Python integration beyond model converters
- Create examples/README.md with a table comparing the different example implementations (main-ctx.cpp vs main-backend.cpp vs main-sched.cpp) and when to use each, addressing the confusion visible in the gpt-2/ directory
- Extend docs/gguf.md with concrete binary format examples (hex dumps of a small quantized tensor) and add validation examples in examples/gguf-validator/ to make the format spec testable
⭐Top contributors
Click to expand
Top contributors
- @ggerganov — 22 commits
- @Constannnnnt — 5 commits
- @JohannesGaessler — 5 commits
- @yomaytk — 4 commits
- @SharmaRithik — 4 commits
📝Recent commits
Click to expand
Recent commits
3f3fe90— requirements: sync requirements.txt with llama.cpp versions (#1479) (taronaeo)ac6f7b4— ggml : bump version to 0.11.0 (#1478) (ggerganov)5bb7236— sync : llama.cpp (ggerganov)ccab936— llama : add option to save memory in device buffers (llama/22679) (ggerganov)9d48a12— ggml : implement fast walsh-hadamard transform for kv rotation (#21352) (llama/22631) (AlrIsmail)efeb567— kleidiai : update to v1.24.0 and use release archive (llama/22549) (chaxu01)523ea9e— CUDA: use fastdiv for batch index split in get_rows (llama/22650) (leonardHONG)ae72070— vulkan: delete dead GGML_VK_MAX_NODES def (llama/22621) (Atomic-Germ)390b432— ggml-webgpu: add layer norm ops (llama/22406) (Constannnnnt)fa9604a— fix: CUDA device PCI bus ID de-dupe OOMing (ignoring other 3 gpus entirely) (llama/22533) (lucyknada)
🔒Security observations
The GGML tensor library demonstrates generally good security practices with no third-party dependencies during runtime and cross-platform focus. However, there are moderate concerns around Python script handling of external model files, shell script input validation in download utilities, and CMake build-time security. The project lacks formal security disclosure policies and vulnerability management documentation. Primary recommendations include auditing Python deserialization patterns, hardening shell scripts against command injection, implementing dependency verification in CMake, and establishing a formal security response process. The active development status across multiple repositories suggests ongoing security attention is necessary.
- Medium · Python Script Execution Without Input Validation —
examples/gpt-2/*.py, examples/gpt-j/*.py, examples/magika/*.py. Multiple Python conversion scripts (convert-cerebras-to-ggml.py, convert-ckpt-to-ggml.py, convert-h5-to-ggml.py, etc.) are present in the examples directory. These scripts likely process external model files and could be vulnerable to arbitrary code execution if they deserialize untrusted data using pickle or similar unsafe methods. Fix: Audit all Python scripts for unsafe deserialization (pickle, yaml.load, etc.). Use safer alternatives like json or implement strict input validation. Add documentation warning users to only process trusted model files. - Medium · Shell Script Without Input Validation —
ci/run.sh, examples/gpt-2/download-*.sh, examples/gpt-j/download-*.sh. The ci/run.sh script and download scripts (download-ggml-model.sh, download-model.sh) execute shell commands that may process external input without proper validation, potentially enabling command injection attacks. Fix: Review all shell scripts for proper input sanitization. Use quoted variables, avoid eval, and validate all external inputs before use in commands. - Medium · CMake Configuration Potential Security Issues —
CMakeLists.txt, cmake/FindNCCL.cmake, cmake/common.cmake. CMake files (CMakeLists.txt, cmake/*.cmake) execute external commands and process dependencies. Without proper validation, this could lead to supply chain attacks or execution of malicious code during build time. Fix: Implement integrity verification for all external dependencies. Pin versions explicitly. Use checksums or cryptographic verification for downloaded files. Review custom find modules for security. - Low · No SECURITY.md or Security Policy —
Repository root. The repository lacks a SECURITY.md file or documented security policy for vulnerability disclosure. This makes it difficult for security researchers to report vulnerabilities responsibly. Fix: Create a SECURITY.md file documenting the vulnerability disclosure process and security contact information. - Low · Missing Dependency Lock File —
requirements.txt (not provided in listing). The requirements.txt file for Python dependencies is referenced but not included in the provided file listing. Without a lock file or pinned versions, dependencies may not be reproducibly built, increasing supply chain risk. Fix: Maintain a requirements.lock or poetry.lock file with pinned versions. Use hash verification where possible. - Low · Active Development Status —
README.md. The README explicitly states 'this project is under active development' with work split across multiple repositories (llama.cpp, whisper.cpp). This increases the risk of security issues being present or unpatched. Fix: Implement regular security audits and automated dependency scanning. Consider establishing a security release process.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.