ggerganov/kbd-audio

Item: ggerganov/kbd-audio
Rating: 3
Author: RepoPilot

🎤⌨️ Acoustic keyboard eavesdropping

Mixed

Stale — last commit 3y ago

worst of 4 axes

Use as dependencyMixed

last commit was 3y ago; top contributor handles 99% of recent commits

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓2 active contributors
✓MIT licensed
✓CI configured

Show 4 more →

✓Tests present
⚠Stale — last commit 3y ago
⚠Small team — 2 contributors active in recent commits
⚠Single-maintainer risk — top contributor 99% of recent commits

What would change the summary?

→Use as dependency Mixed → Healthy if: 1 commit in the last 365 days

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Forkable](https://repopilot.app/api/badge/ggerganov/kbd-audio?axis=fork)](https://repopilot.app/r/ggerganov/kbd-audio)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/ggerganov/kbd-audio on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: ggerganov/kbd-audio

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/ggerganov/kbd-audio shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Stale — last commit 3y ago

2 active contributors
MIT licensed
CI configured
Tests present
⚠ Stale — last commit 3y ago
⚠ Small team — 2 contributors active in recent commits
⚠ Single-maintainer risk — top contributor 99% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live ggerganov/kbd-audio repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/ggerganov/kbd-audio.

What it runs against: a local clone of ggerganov/kbd-audio — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in ggerganov/kbd-audio | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 1240 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>ggerganov/kbd-audio</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of ggerganov/kbd-audio. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/ggerganov/kbd-audio.git
#   cd kbd-audio
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of ggerganov/kbd-audio and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "ggerganov/kbd-audio(\\.git)?\\b" \\
  && ok "origin remote is ggerganov/kbd-audio" \\
  || miss "origin remote is not ggerganov/kbd-audio (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "keytap.cpp" \\
  && ok "keytap.cpp" \\
  || miss "missing critical file: keytap.cpp"
test -f "keytap2.cpp" \\
  && ok "keytap2.cpp" \\
  || miss "missing critical file: keytap2.cpp"
test -f "audio-logger.h" \\
  && ok "audio-logger.h" \\
  || miss "missing critical file: audio-logger.h"
test -f "common.h" \\
  && ok "common.h" \\
  || miss "missing critical file: common.h"
test -f "subbreak.h" \\
  && ok "subbreak.h" \\
  || miss "missing critical file: subbreak.h"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 1240 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~1210d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/ggerganov/kbd-audio"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

kbd-audio is a collection of C++ tools (keytap, keytap2, keytap3) that recover text from keyboard audio by analyzing acoustic signatures of keypresses captured via microphone. The core innovation is acoustic side-channel attack on mechanical keyboards: by processing audio spectrograms and applying n-gram statistical analysis, it can guess what text was typed without seeing the screen. Monolithic C++ application organized by tool version: core shared utilities in common.cpp/common.h and audio-logger.cpp, with parallel implementations keytap.cpp, keytap2.cpp, keytap3.cpp for each algorithm version. GUI variants use SDL2 (keytap-gui.cpp, keytap2-gui.cpp, keytap3-gui.cpp), web versions compile to Emscripten in build-em/. N-gram data stored as binary files (data/ggwords-*-gram.dat.binary) and text files (data/english_trigrams.txt, etc).

👥Who it's for

Security researchers, penetration testers, and keyboard enthusiasts interested in understanding acoustic side-channel vulnerabilities in mechanical keyboards. Also appeals to CTF competitors and developers building audio signal processing tools.

🌱Maturity & risk

Active and mature: three major tool versions (keytap/keytap2/keytap3) with online deployments, CI/CD via GitHub Actions (build.yml), comprehensive n-gram datasets in data/, and web-based GUIs. Actively maintained with algorithmic improvements between versions, though no formal test suite visible in file list.

Single maintainer (ggerganov), no visible unit test files or test/ directory, and no explicit dependency management visible beyond CMake modules (FindFFTW.cmake, FindSDL2.cmake). Risk is moderate: mature algorithms with stable CLI interfaces, but lack of automated tests means regressions could slip through. No breaking changes apparent across keytap versions.

Active areas of work

Keytap3 is the latest focus with full automation (keytap3.cpp, keytap3-gui.cpp, keytap3-multi.cpp) and improved n-gram statistics. Web GUI deployment active (keytap3-gui-tmpl.html). No specific PR or milestone data visible, but file structure shows recent additions in keytap3 variants and corresponding Emscripten builds.

🚀Get running

Clone and build: git clone https://github.com/ggerganov/kbd-audio && cd kbd-audio && mkdir build && cd build && cmake .. && make. Requires CMake, C++17 compiler, SDL2, and FFTW. No package.json; pure CMake project. Run keytap3 on a WAV file: ./keytap3 <audio.wav>.

Daily commands: cd build && make && ./keytap3 recording.wav processes a WAV file. For GUI: ./keytap3-gui launches SDL2 window. Web version compiles via build-em/keytap-gui/compile.sh. Audio input captured via audio-logger.cpp (platform-specific microphone access, implementation details in audio-logger.h).

🗺️Map of the codebase

keytap.cpp — Primary entry point for the keytap tool that performs keyboard key detection from audio; must understand core audio analysis algorithm
keytap2.cpp — Statistical n-gram based keyboard recovery tool; represents alternative approach to keytap without training data requirements
audio-logger.h — Core audio capture abstraction; all audio ingestion flows through this interface
common.h — Shared utilities and data structures used across multiple tools (audio processing, frequency analysis)
subbreak.h — Key segmentation algorithm that breaks audio streams into individual keystroke events; foundational to detection pipeline
constants.h — Centralized frequency thresholds and detection parameters used across all analysis tools
CMakeLists.txt — Build configuration defining compilation targets for keytap, keytap2, keytap3, and GUI variants; required for development setup

🛠️How to make changes

Add a new keystroke detection algorithm variant

Create new segmentation header (e.g., subbreak4.h) that inherits segmentation logic from subbreak3.h, implementing improved silence detection thresholds in constants.h (subbreak4.h)
Implement key classification in new file keytap4.cpp that calls common.cpp spectral analysis and key-detector.cpp, using the new subbreak4 segmentation (keytap4.cpp)
Create CLI wrapper in keytap4-gui.cpp that extends common-gui.cpp for UI bindings and audio-logger.h for device input (keytap4-gui.cpp)
Add build targets in CMakeLists.txt for keytap4 and keytap4-gui executables, linking to FFTW and SDL2 via cmake/FindFFTW.cmake (CMakeLists.txt)
Add test validation in test-subbreak3.cpp (or new test file) to compare segmentation accuracy against baseline keytap3 on reference audio samples (test-subbreak3.cpp)

Add a new n-gram language model for a different language

Generate or source trigram, quadgram, and quintgram frequency tables and place in data/ directory (e.g., data/french_trigrams.txt, data/french_quadgrams.txt) (data/french_trigrams.txt)
Create precompiled binary model by running compress-n-grams.cpp with your new text files, outputting to data/ggwords-4-gram-fr.dat.binary (compress-n-grams.cpp)
Modify keytap2.cpp to load the new language model based on a command-line flag (e.g., --language fr) that selects the appropriate binary file from data/ (keytap2.cpp)
Update keytap2-gui.cpp to expose a language dropdown selector in the UI that passes the selected model path to keytap2 analysis routine (keytap2-gui.cpp)

Add frequency-based tuning parameters for different keyboard types

Define new keyboard profile struct in constants.h with frequency ranges, attack times, and decay thresholds specific to mechanical vs. membrane keyboards (constants.h)
Modify common.cpp spectral analysis functions to accept a keyboard_profile parameter and apply profile-specific filtering and gain adjustment (common.cpp)
Update key-detector.cpp to use the active keyboard profile when scoring candidate keys, adjusting frequency importance weights per profile (key-detector.cpp)
Extend keytap-gui.cpp and keytap3-gui.cpp to include dropdown menus for keyboard type selection (Mechanical/Membrane/Chiclet) that update the active profile in real time (keytap-gui.cpp)

Add web-based streaming analysis (extend keytap3-app.cpp)

Modify keytap3-app.cpp to accept WebSocket or HTTP streaming input from a client, decoding incoming audio chunks via dr_wav.h codec (keytap3-app.cpp)
Integrate subbreak3.h segmentation logic to detect keystroke boundaries on the incoming stream and invoke key-detector.cpp for real-time classification (subbreak3.h)
Stream detected keys and confidence scores back to the client via WebSocket messages; update index-keytap3-app-tmpl.html frontend to display live results (index-keytap3-app-tmpl.html)
Add CMakeLists.txt build target for streaming keytap3 server binary with WebSocket library (e.g., libwebsockets), linking common.cpp and audio-logger.h (CMakeLists.txt)

🔧Why these technologies

C++ with FFTW for FFT computation — Real-time acoustic analysis requires fast Fourier transforms on audio; FFTW is highly optimized for DSP workloads and is the standard in signal processing
SDL2 for cross-platform GUI and — undefined

🪤Traps & gotchas

FFTW must be installed system-wide or CMake configure will fail (FindFFTW.cmake is non-standard); no vcpkg/conan integration. Audio-logger.cpp contains platform-specific code (Windows/Linux/macOS microphone APIs) not visible in headers—linking may fail if targeting unsupported OS. N-gram binary files in data/ are not generated by build; they must exist or keytap3 will crash at runtime. Emscripten builds in build-em/ require separate emsdk installation and manual compile.sh execution—not integrated into main CMake. SDL2 is optional for CLI but required for GUI tools; no clear build-time guard in CMakeLists.txt visible from file list.

🏗️Architecture

💡Concepts to learn

Spectrogram and STFT (Short-Time Fourier Transform) — Core of keytap: audio is converted to time-frequency representation via overlapping FFT windows to visualize keyboard keystroke signatures across frequency bands
N-gram Language Model — keytap3 uses 3-to-6-grams (data/ggwords-*-gram.dat.binary) to statistically infer which letters are likely given acoustic cluster observations, replacing keytap's manual training
Acoustic Side-Channel Attack — The threat model: extracting sensitive information (typed text) from physical audio leakage rather than electromagnetic or timing channels
Feature Extraction from Audio — keytap clusters raw spectrograms into key 'fingerprints' by grouping similar acoustic signatures; this labeling problem is core to the algorithm's accuracy
Viterbi Algorithm (Hidden Markov Model inference) — Likely used implicitly in keytap3 for optimal sequence decoding: given noisy acoustic observations and n-gram constraints, find most probable text path
Emscripten WebAssembly Compilation — build-em/ shows how C++ keytap is cross-compiled to browser-runnable WASM (keytap3.ggerganov.com); key for online deployment without server backend
Binary Serialization of Language Models — data/ggwords-*-gram.dat.binary are packed n-gram frequency tables; understanding format is needed to extend language support or debug model loading

keyboardvaults/keyboardvaults-demo — Alternative keyboard eavesdropping research; overlaps with acoustic side-channel threat model
ggerganov/whisper.cpp — Same author's speech-to-text engine using similar audio FFT/spectrogram pipeline; shares signal processing patterns
Thalhammer/LanguageModels — Standalone n-gram and language model tools; kbd-audio's data/ggwords files could be validated or extended using this
trezor/trezor-firmware — Hardware wallet security; relevant context for understanding side-channel attacks on input devices
ggerganov/ollama — Same author's AI inference engine; demonstrates expertise in performance optimization and cross-platform C++ deployment

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add unit tests for subbreak3.cpp audio segmentation logic

The repo contains three versions of subbreak functionality (subbreak.cpp, subbreak2.cpp, subbreak3.cpp) with test-subbreak3.cpp existing but appearing incomplete. There are no automated tests in the CI pipeline (.github/workflows/build.yml) validating the core audio segmentation algorithms. This is critical since keytap's accuracy depends on correct keystroke detection and separation.

[ ] Expand test-subbreak3.cpp with comprehensive test cases for edge cases (short keystroke bursts, overlapping sounds, silence detection)
[ ] Add CMake test targets in CMakeLists.txt for building and running subbreak tests
[ ] Integrate test execution into .github/workflows/build.yml to run on every PR
[ ] Document expected behavior for subbreak3 algorithm in README.md with examples

Refactor common audio processing logic from keytap*.cpp into audio-processing utility module

The repo has multiple keytap variants (keytap.cpp, keytap2.cpp, keytap3.cpp) plus GUI versions, creating significant code duplication for audio feature extraction, FFT processing, and clustering. Extracting shared logic into a dedicated audio-processing library would reduce maintenance burden and make the codebase more modular.

[ ] Analyze keytap.cpp, keytap2.cpp, keytap3.cpp to identify common patterns (FFT operations, n-gram matching, frequency analysis)
[ ] Create new files: audio-processor.h and audio-processor.cpp with extracted functions
[ ] Update keytap*.cpp files to use the new audio-processor module instead of duplicated code
[ ] Update CMakeLists.txt to link the new module and ensure all variants still compile and pass existing functionality

Add integration tests and benchmarks for keytap accuracy against sample audio files

The repo includes sample data (data/english_trigrams.txt, etc.) and HTML demo templates (index-keytap3-app-tmpl.html) but no automated tests validating end-to-end keytap accuracy. Adding sample WAV files with known keystroke sequences and accuracy benchmarks would catch regressions and help contributors verify improvements.

[ ] Create test audio samples in WAV format (using record.cpp or record-full.cpp) with documented keystroke sequences
[ ] Add test cases in a new tests/ directory that run keytap3 against these samples and verify keystroke detection accuracy
[ ] Integrate benchmark execution into .github/workflows/build.yml to track accuracy metrics across commits
[ ] Document the testing methodology in TESTING.md with instructions for contributors to generate their own test audio

🌿Good first issues

Add unit tests for common.cpp spectral analysis functions (FFT output validation, spectrogram correctness)—currently no test/ directory; tests would catch regressions in signal processing.
Document n-gram binary file format and add Python script to regenerate data/ggwords-*-gram.dat.binary from text files—currently undocumented; blocks future language model updates.
Refactor audio-logger.cpp platform detection into separate cpp files (audio-logger-win.cpp, audio-logger-linux.cpp, audio-logger-macos.cpp)—reduces conditional compilation clutter and improves testability.

⭐Top contributors

Click to expand

@ggerganov — 99 commits
@herrhotzenplotz — 1 commits

📝Recent commits

Click to expand

a918207 — record : fix crash on MacOS (close #47) (ggerganov)
277e9c7 — Update README.md (ggerganov)
bd03e0a — Update README.md (ggerganov)
efc999b — Fix link in README.md (ggerganov)
7706c48 — keytap3-gui : new stable version of the GUI (ggerganov)
df87502 — Update README.md (ggerganov)
cac8409 — keytap3 : multi-thread wasm version (ggerganov)
c139e16 — keytap3 : remove low-similarity keys (ggerganov)
88186ce — keytap3 : find optimal cutoff freq for filter (ggerganov)
390aa9e — keytap3 : unify default parameters + multi-thread beam search (ggerganov)

🔒Security observations

This codebase presents a significant security and privacy concern due to its core functionality - acoustic keyboard eavesdropping. While the code quality appears reasonable, the inherent purpose of inferring sensitive information (passwords, messages) from audio recordings poses substantial risks to confidentiality. Additionally, there are moderate concerns around input validation in audio processing, potential XSS in GUI components, and dependency management. The project requires strong ethical guidelines, responsible disclosure practices, and

High · Acoustic Eavesdropping Capability - Inherent Security Risk — keytap.cpp, keytap2.cpp, keytap3.cpp, keytap-gui.cpp, keytap2-gui.cpp, keytap3-gui.cpp. The entire codebase is designed to perform acoustic keyboard eavesdropping - inferring typed keystrokes from audio analysis. While not a code vulnerability per se, this represents a significant security and privacy threat. The tools (keytap, keytap2, keytap3) can recover sensitive information (passwords, messages, credentials) from ambient audio without user knowledge or consent. Fix: This tool should only be used for authorized security research with proper disclosure and consent. Organizations should be aware of this attack vector and implement acoustic countermeasures. Consider adding prominent warnings in documentation about responsible use.
Medium · Hardcoded Data Files and Language Models — data/ directory, constants.h, compress-n-grams.cpp. The codebase includes hardcoded n-gram data files (english_trigrams.txt, english_quadgrams.txt, english_quintgrams.txt) and binary language models in the data/ directory. While not secrets, these static resources could be targeted for manipulation or may contain outdated/biased linguistic data. Fix: Document the sources and versions of linguistic data. Consider implementing checksum verification for data files. Provide a mechanism to update language models independently of code releases.
Medium · Missing Input Validation in Audio Processing — audio-logger.cpp, audio-logger.h, dr_wav.h, record-full.cpp, play-full.cpp. Files like audio-logger.cpp, record.cpp, and audio processing tools (play-full.cpp, view-full-gui.cpp) handle raw audio input and WAV file parsing (dr_wav.h). There's potential for buffer overflow or malformed audio file handling without visible bounds checking. Fix: Add explicit bounds checking for audio buffer operations. Validate WAV file headers and chunk sizes before processing. Use safe memory functions and consider using AddressSanitizer during development and testing.
Medium · GUI Applications May Have XSS Vulnerabilities — index-keytap2-gui-tmpl.html, index-keytap3-gui-tmpl.html, index-keytap3-app-tmpl.html, style.css. The codebase includes HTML template files (index-keytap2-gui-tmpl.html, index-keytap3-gui-tmpl.html, index-keytap3-app-tmpl.html) for web-based GUI. These may be vulnerable to XSS if user input is not properly sanitized or if the GUI files are served over HTTP. Fix: Ensure all user inputs in GUI applications are properly HTML-escaped. Use HTTPS for any web-based deployment. Apply Content Security Policy headers. Review template files for unsafe script injection points.
Low · No Visible Dependency Version Pinning — CMakeLists.txt, cmake/ directory. No package manager files (package.json, requirements.txt, Cargo.toml, etc.) are visible in the repository structure. CMakeLists.txt references external dependencies (FFTW, SDL2) but versions may not be pinned, potentially leading to compatibility issues or security updates being missed. Fix: Specify explicit versions for all external dependencies. Use dependency lock files when available. Regularly audit and update dependencies for security patches.
Low · Insufficient Error Handling in Audio Recording — audio-logger.cpp, audio-logger.h, record.cpp, record-full.cpp. Audio recording and logging functionality (audio-logger.cpp, record.cpp) may not have comprehensive error handling for microphone access failures, permission issues, or device disconnections. Fix: Implement comprehensive error handling for audio device access. Log permission errors appropriately. Provide user-friendly error messages when microphone access is denied or devices are unavailable.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

ggerganov/kbd-audio

Embed the "Forkable" badge

Onboarding doc

Onboarding: ggerganov/kbd-audio

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

Add a new keystroke detection algorithm variant

Add a new n-gram language model for a different language

Add frequency-based tuning parameters for different keyboard types

Add web-based streaming analysis (extend keytap3-app.cpp)

🔧Why these technologies

🪤Traps & gotchas

🏗️Architecture

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add unit tests for subbreak3.cpp audio segmentation logic

Refactor common audio processing logic from keytap*.cpp into audio-processing utility module

Add integration tests and benchmarks for keytap accuracy against sample audio files

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next