Onboarding: ggml-org/whisper.cpp

Item: ggml-org/whisper.cpp
Rating: 5
Author: RepoPilot

Generated by RepoPilot · 2026-05-05 · Source

Verdict

GO — Healthy across the board

Last commit 2d ago
5 active contributors
Distributed ownership (top contributor 49%)
MIT licensed
CI configured
Tests present
⚠ Small team — 5 top contributors

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

TL;DR

whisper.cpp is a high-performance C/C++ port of OpenAI's Whisper automatic speech recognition (ASR) model, enabling fully offline, on-device speech-to-text transcription without Python or heavy ML framework dependencies. The entire model logic lives in just two files — include/whisper.h and src/whisper.cpp — and is backed by the ggml tensor library. It supports quantized inference, Metal/CUDA/Vulkan GPU acceleration, and runs on platforms from NVIDIA GPUs to Raspberry Pi and iPhones. Flat-ish monorepo: the core model is entirely in src/whisper.cpp + include/whisper.h, with ggml as the tensor backend. Language bindings live under bindings/ (Go in bindings/go/, Java in bindings/java/), runnable examples under examples/ (iOS, Android, WASM, CLI), and GPU-specific Dockerfiles under .devops/. Build is managed via both CMakeLists.txt (primary) and a legacy Makefile.

Who it's for

Systems engineers and application developers who need to embed offline speech recognition into C/C++, Go, Java, iOS (Objective-C), Android, or WebAssembly applications without shipping a Python runtime or PyTorch dependency. Also targets ML inference researchers optimizing Whisper for edge/mobile hardware.

Maturity & risk

The project is at stable release v1.8.1 with an active CI pipeline defined in .github/workflows/build.yml covering MSVC, MinGW, CUDA, Vulkan, and WASM targets. It has bindings for Go, Java, Ruby, and Objective-C, test files like bindings/go/pkg/whisper/context_test.go, and Docker images in .devops/. Verdict: production-ready and actively developed.

The core C++ implementation has minimal external dependencies (ggml is vendored/submoduled), reducing supply chain risk significantly. However, the project is tightly coupled to the ggml-org/ggml library internals, meaning ggml breaking changes can ripple through. The GPU backends (CUDA via .devops/cublas.Dockerfile, Vulkan via .devops/main-vulkan.Dockerfile, MUSA/Intel) each have separate build paths that may lag behind the CPU path in testing coverage.

Active areas of work

Active work includes Voice Activity Detection (VAD) support (explicitly listed in the README feature list), ongoing GPU backend expansion (Moore Threads MUSA via .devops/main-musa.Dockerfile, Intel SYCL via README_sycl.md), and WebAssembly example CI in .github/workflows/examples-wasm.yml. The Go bindings CI (.github/workflows/bindings-go.yml) and Ruby bindings CI are also actively maintained.

Get running

git clone https://github.com/ggml-org/whisper.cpp.git && cd whisper.cpp

Download a model

bash ./models/download-ggml-model.sh base.en

Build with CMake

cmake -B build && cmake --build build --config Release

Run transcription on a sample

./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav

Or use the legacy Makefile

make && ./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav

Daily commands:

CPU (CMake)

cmake -B build -DGGML_CUDA=OFF && cmake --build build -j$(nproc) ./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav

CUDA GPU

cmake -B build -DGGML_CUDA=ON && cmake --build build -j$(nproc)

Go bindings

cd bindings/go && make whisper && go test ./...

Docker (CUDA)

docker build -f .devops/cublas.Dockerfile -t whisper-cublas .

Map of the codebase

src/whisper.cpp: Entire Whisper model implementation: encoder, decoder, beam search, sampling, and all inference logic — the heart of the project.
include/whisper.h: The public C API surface; all language bindings (Go, Java, Ruby, Objective-C) are generated against this header.
CMakeLists.txt: Primary build system entry point; controls all backend flags (GGML_CUDA, GGML_METAL, GGML_VULKAN) and target definitions.
bindings/go/whisper.go: CGo bridge layer translating Go calls into the C API defined in whisper.h.
bindings/go/pkg/whisper/context.go: High-level Go context object wrapping whisper inference lifecycle — the main entry point for Go consumers.
.devops/cublas.Dockerfile: Canonical Dockerfile for CUDA+cuBLAS GPU inference builds; reference for NVIDIA deployment.
.github/workflows/build.yml: CI matrix covering MSVC, MinGW, Linux, CUDA, and Vulkan builds — defines the supported build configurations.
models/download-ggml-model.sh: Script to fetch quantized GGML model weights; required before any inference can run.

How to make changes

Core model logic: edit src/whisper.cpp and include/whisper.h. To add a new C API function, declare it in include/whisper.h and implement in src/whisper.cpp. For Go binding changes, edit bindings/go/whisper.go and bindings/go/pkg/whisper/context.go. For new examples, add a directory under examples/ and wire it into CMakeLists.txt. For Metal GPU kernels, look in the Metal shader files. For CUDA kernels, see the Cuda-language files tracked in the repo.

Traps & gotchas

Models must be in GGML binary format (not original OpenAI .pt files); use models/download-ggml-model.sh or convert with models/convert-pt-to-ggml.py. 2. The Go bindings require CGo and a compiled libwhisper — running go test without first running make whisper in bindings/go/ will fail with linker errors. 3. CUDA builds require setting -DGGML_CUDA=ON explicitly; it is OFF by default. 4. On Apple Silicon, Metal is auto-enabled but requires Xcode command-line tools; missing them silently falls back to CPU. 5. The bindings/go/go.mod module path is github.com/ggerganov/whisper.cpp (old org name), not ggml-org — this matters for go get paths.

Concepts to learn

GGML Tensor Format & Quantization — All model weights are stored and computed in GGML's custom binary format with Q4/Q5/Q8 integer quantization — understanding this is prerequisite to loading models or modifying inference precision.
Mel Spectrogram (STFT) — Whisper's audio preprocessing converts raw PCM waveforms into 80-channel log-Mel spectrograms before encoding — this transform is implemented in whisper.cpp and must be understood to modify audio input handling.
Transformer Encoder-Decoder Architecture — Whisper uses a standard encoder-decoder Transformer: the audio encoder and autoregressive text decoder are both implemented from scratch in src/whisper.cpp without any framework.
Beam Search Decoding — Token generation uses beam search with configurable beam size — the decoding loop in src/whisper.cpp implements this, and tuning it affects both speed and transcription accuracy.
CGo FFI (Foreign Function Interface) — The Go bindings in bindings/go/whisper.go use CGo to call into the C API in whisper.h — understanding CGo's memory model (C.CString, C.free) is essential to safely extending the Go bindings.
Metal Performance Shaders (Apple Silicon) — On macOS/iOS, matrix multiplications are offloaded to Apple's GPU via Metal compute shaders — the Metal backend enables the real-time iPhone demo shown in the README.
Voice Activity Detection (VAD) — VAD segments audio into speech/non-speech regions before passing to Whisper, reducing hallucinations on silence and improving throughput — a recently added feature in this repo with active development.

Related repos

ggml-org/ggml — The underlying tensor computation library that whisper.cpp is built on — changes here directly affect whisper.cpp's backend.
openai/whisper — The original Python/PyTorch Whisper implementation that this repo ports; model weights and architecture originate here.
ggml-org/llama.cpp — Sister project by the same organization using the identical ggml+C++ inference pattern for LLMs — shares build patterns, quantization code, and contributor base.
k2-fsa/sherpa-onnx — Alternative cross-platform offline ASR framework targeting the same embedded/mobile use cases with ONNX runtime instead of ggml.
alphacep/vosk-api — Another offline speech recognition C API library solving the same problem (embedded ASR without cloud dependency) as a direct alternative.

PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add unit tests for bindings/go/pkg/whisper/context.go covering error paths and edge cases

The existing test file bindings/go/pkg/whisper/context_test.go likely covers the happy path, but the context.go file wraps low-level C calls that have many failure modes (nil context, invalid audio length, unsupported language codes, etc.). Adding targeted tests for these error paths would prevent regressions in the Go binding layer, which is a first-class supported binding with its own CI workflow (bindings-go.yml).

[ ] Audit bindings/go/pkg/whisper/context.go to enumerate all exported methods and their documented error returns
[ ] Open bindings/go/pkg/whisper/context_test.go and identify which error paths (e.g., ProcessWithPrompts with nil samples, SetLanguage with invalid locale string, ResetTimings before any processing) are not yet covered
[ ] Add table-driven tests using testify (already a declared dependency in go.mod) for each uncovered error path, using the existing jfk.wav sample in bindings/go/samples/ for positive cases
[ ] Add a test that verifies the model correctly returns ErrNotInitialized (or equivalent) when methods are called on a closed/nil context, referencing bindings/go/pkg/whisper/interface.go for the expected error contract
[ ] Run go test ./bindings/go/pkg/whisper/... locally and confirm coverage increases; update bindings-go.yml to fail the workflow if coverage drops below a defined threshold

Add a GitHub Actions CI workflow for the Java bindings (.github/workflows/bindings-java.yml)

The repository has dedicated CI workflows for Go bindings (bindings-go.yml) and Ruby bindings (bindings-ruby.yml), but there is no corresponding workflow for the Java bindings located in bindings/java/. The Java binding has a full Gradle build system (gradlew, build.gradle, settings.gradle) already in place, making it straightforward to wire up. Without CI, breakage in the Java binding can go undetected across commits.

[ ] Create .github/workflows/bindings-java.yml mirroring the structure of .github/workflows/bindings-go.yml
[ ] Add a job that checks out the repo, sets up JDK (e.g., actions/setup-java@v4 with temurin distribution), and builds the native whisper.cpp library using the CMakeLists.txt at the repo root as a prerequisite step
[ ] Add a step that runs ./gradlew build inside bindings/java/ to compile the Java sources and run any existing tests
[ ] Trigger the workflow on push and pull_request events filtered to paths: ['bindings/java/**', 'CMakeLists.txt', 'whisper.h', 'whisper.cpp'] to avoid unnecessary runs
[ ] Verify the workflow passes on a branch before opening the PR; document the new workflow in the bindings/java/README.md

Split bindings/go/whisper.go into focused sub-files to reduce cognitive load and improve navigability

The file bindings/go/whisper.go is the single CGO bridge file for the entire Go binding. In a CGO-heavy file this typically contains: C preamble/includes, type conversions, low-level wrapper functions, and possibly higher-level helpers — all mixed together. Splitting it into whisper_cgo.go (C preamble + raw CGO calls), whisper_convert.go (Go↔C type conversion helpers), and whisper_params.go (already partially separated as params.go) would make each concern independently reviewable and reduce the risk of merge conflicts as the binding evolves.

[ ] Read through bindings/go/whisper.go and bindings/go/params.go in full and group all symbols by concern: (1) CGO import block and raw C-call wrappers, (2) Go type conversions and

Good first issues

Add missing unit tests for bindings/go/pkg/whisper/model.go — model_test.go exists but likely has sparse coverage of error paths like loading a corrupt model file. 2. Add a bindings/go/pkg/whisper/context_test.go test case specifically for the VAD (Voice Activity Detection) code path, which is a recently added feature with probable gaps in Go-level test coverage. 3. Document the WGSL Vulkan shader files — the .github/workflows/build.yml Vulkan build exists but there is no README or inline documentation explaining the WGSL/GLSL compute shader architecture for contributors unfamiliar with GPU compute.

Top contributors

@ggerganov — 17 commits
@JohannesGaessler — 5 commits
@reeselevine — 5 commits
@yomaytk — 4 commits
@Constannnnnt — 4 commits

Recent commits

4bf7336 — talk-llama : sync llama.cpp (ggerganov)
18162bc — cmake : add FindNCCL.cmake (ggml/0) (ggerganov)
8384aa8 — sync : ggml (ggerganov)
bbdaa21 — ggml : remove obsolete rms_norm.wgsl (ggml/0) (ggerganov)
a5a8496 — ggml : remove obsoloete wgsl templates (ggml/0) (ggerganov)
28f8534 — ggml : bump version to 0.10.2 (ggml/1474) (ggerganov)
4861a3e — hexagon: hmx flash attention (llama/22347) (njsyw1997)
f2ce24f — hexagon: enable non-contiguous row tensor support for unary ops (llama/22574) (aparmp-quic)
9623c12 — ggml-webgpu: Fix vectorized handling in mul-mat and mul-mat-id (llama/22578) (yomaytk)
95053f6 — vulkan: Support asymmetric FA in coopmat2 path (llama/21753) (jeffbolznv)

Security observations

Medium · Outdated or Unverified Go Dependencies — bindings/go/go.mod. The go.mod file specifies dependencies such as github.com/go-audio/wav v1.1.0 and github.com/stretchr/testify v1.9.0 without pinning to specific commit hashes or verified checksums beyond the go.sum file. While go.sum provides integrity checks, the indirect dependencies (go-audio/audio, go-audio/riff, go-difflib, yaml.v3) may not be regularly audited for vulnerabilities. Transitive dependency vulnerabilities can propagate into the binary. Fix: Regularly audit dependencies using 'govulncheck ./...' or 'go list -m all | nancy sleuth'. Pin dependencies and review changelogs for security patches. Use a software composition analysis (SCA) tool in CI/CD pipelines.
Medium · Dockerfile Uses Mutable Base Image Tags — .devops/cublas.Dockerfile, .devops/main.Dockerfile, .devops/main-cuda.Dockerfile, .devops/main-intel.Dockerfile, .devops/main-musa.Dockerfile, .devops/main-vulkan.Dockerfile. Dockerfiles in .devops/ (e.g., cublas.Dockerfile, main.Dockerfile, main-cuda.Dockerfile, etc.) likely reference base images using mutable tags (e.g., 'latest' or version tags without digest pinning). This can lead to supply chain attacks where a compromised upstream image is pulled silently during builds, introducing malicious code into the final image. Fix: Pin base images to specific SHA256 digests (e.g., FROM nvidia/cuda@sha256:<digest>) rather than using floating tags like 'latest'. Integrate image signing verification (e.g., Cosign/Sigstore) in your CI pipeline.
Medium · Potential Unsafe Memory Operations in C/C++ Core — Root C/C++ source files (implied by project description and CMakeLists.txt). The project is primarily implemented in C/C++ (whisper.cpp). C/C++ codebases are inherently susceptible to memory safety vulnerabilities including buffer overflows, use-after-free, heap corruption, and integer overflows, especially when processing untrusted audio input files (WAV parsing, model loading). Since the library processes external user-supplied audio data, any flaw in parsing logic could be exploited for arbitrary code execution or denial of service. Fix: Employ static analysis tools (cppcheck, clang-tidy, Coverity) and dynamic analysis (AddressSanitizer, MemorySanitizer, Valgrind, libFuzzer) as part of CI. Enforce bounds checking and input validation on all external data (model files, audio input). Consider enabling compiler hardening flags: -fstack-protector-strong, -D_FORTIFY_SOURCE=2, -Wformat-security, -pie, -fPIE.
Medium · Insecure Model File Loading Without Integrity Verification — bindings/go/examples/go-model-download/main.go, bindings/go/examples/go-model-download/context.go, bindings/go/pkg/whisper/model.go. The application loads Whisper model files from disk (as seen in go-model-download example and model loading code). If model files are downloaded over an unverified channel or their integrity is not checked against a known-good hash before loading, an attacker with filesystem access or a man-in-the-middle position could substitute a malicious model file. Loading a crafted model could trigger vulnerabilities in the C parsing code. Fix: Verify model file integrity using cryptographic checksums (SHA-256) published by OpenAI/ggml-org before loading. Enforce HTTPS for model downloads. Validate model file headers and size bounds before passing to the C library.
Low · Gradle Wrapper JAR Binary in Version Control — bindings/java/gradle/wrapper/gradle-wrapper.jar. The file bindings/java/gradle/wrapper/gradle-wrapper.jar is a pre-compiled binary JAR committed directly into the repository. Binary blobs in version control are difficult to audit, and if this JAR is tampered with (e.g., via a compromised contributor account or supply chain attack), it could execute malicious code during the build process on developer machines and CI systems. Fix: Verify the SHA-256 checksum of the gradle-wrapper.jar against the official Gradle distribution checksums and document it in

LLM-derived; treat as a starting point, not a security audit.

Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

ggml-org/whisper.cpp

Embed this verdict

Onboarding doc