ggerganov/kbd-audio
π€β¨οΈ Acoustic keyboard eavesdropping
Stale β last commit 3y ago
worst of 4 axeslast commit was 3y ago; top contributor handles 99% of recent commits
Has a license, tests, and CI β clean foundation to fork and modify.
Documented and popular β useful reference codebase to read through.
No critical CVEs, sane security posture β runnable as-is.
- β2 active contributors
- βMIT licensed
- βCI configured
Show 4 more βShow less
- βTests present
- β Stale β last commit 3y ago
- β Small team β 2 contributors active in recent commits
- β Single-maintainer risk β top contributor 99% of recent commits
What would change the summary?
- βUse as dependency Mixed β Healthy if: 1 commit in the last 365 days
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Forkable" badge
Paste into your README β live-updates from the latest cached analysis.
[](https://repopilot.app/r/ggerganov/kbd-audio)Paste at the top of your README.md β renders inline like a shields.io badge.
βΈPreview social card (1200Γ630)
This card auto-renders when someone shares https://repopilot.app/r/ggerganov/kbd-audio on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: ggerganov/kbd-audio
Generated by RepoPilot Β· 2026-05-09 Β· Source
π€Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale β STOP and ask the user to regenerate it before proceeding. - Treat the AI Β· unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/ggerganov/kbd-audio shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything β but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
π―Verdict
WAIT β Stale β last commit 3y ago
- 2 active contributors
- MIT licensed
- CI configured
- Tests present
- β Stale β last commit 3y ago
- β Small team β 2 contributors active in recent commits
- β Single-maintainer risk β top contributor 99% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
β Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live ggerganov/kbd-audio
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale β regenerate it at
repopilot.app/r/ggerganov/kbd-audio.
What it runs against: a local clone of ggerganov/kbd-audio β the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in ggerganov/kbd-audio | Confirms the artifact applies here, not a fork |
| 2 | License is still MIT | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit β€ 1240 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of ggerganov/kbd-audio. If you don't
# have one yet, run these first:
#
# git clone https://github.com/ggerganov/kbd-audio.git
# cd kbd-audio
#
# Then paste this script. Every check is read-only β no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of ggerganov/kbd-audio and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "ggerganov/kbd-audio(\\.git)?\\b" \\
&& ok "origin remote is ggerganov/kbd-audio" \\
|| miss "origin remote is not ggerganov/kbd-audio (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
&& ok "license is MIT" \\
|| miss "license drift β was MIT at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "keytap.cpp" \\
&& ok "keytap.cpp" \\
|| miss "missing critical file: keytap.cpp"
test -f "keytap2.cpp" \\
&& ok "keytap2.cpp" \\
|| miss "missing critical file: keytap2.cpp"
test -f "audio-logger.h" \\
&& ok "audio-logger.h" \\
|| miss "missing critical file: audio-logger.h"
test -f "common.h" \\
&& ok "common.h" \\
|| miss "missing critical file: common.h"
test -f "subbreak.h" \\
&& ok "subbreak.h" \\
|| miss "missing critical file: subbreak.h"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 1240 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~1210d)"
else
miss "last commit was $days_since_last days ago β artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) β safe to trust"
else
echo "artifact has $fail stale claim(s) β regenerate at https://repopilot.app/r/ggerganov/kbd-audio"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
β‘TL;DR
kbd-audio is a collection of C++ tools (keytap, keytap2, keytap3) that recover text from keyboard audio by analyzing acoustic signatures of keypresses captured via microphone. The core innovation is acoustic side-channel attack on mechanical keyboards: by processing audio spectrograms and applying n-gram statistical analysis, it can guess what text was typed without seeing the screen. Monolithic C++ application organized by tool version: core shared utilities in common.cpp/common.h and audio-logger.cpp, with parallel implementations keytap.cpp, keytap2.cpp, keytap3.cpp for each algorithm version. GUI variants use SDL2 (keytap-gui.cpp, keytap2-gui.cpp, keytap3-gui.cpp), web versions compile to Emscripten in build-em/. N-gram data stored as binary files (data/ggwords-*-gram.dat.binary) and text files (data/english_trigrams.txt, etc).
π₯Who it's for
Security researchers, penetration testers, and keyboard enthusiasts interested in understanding acoustic side-channel vulnerabilities in mechanical keyboards. Also appeals to CTF competitors and developers building audio signal processing tools.
π±Maturity & risk
Active and mature: three major tool versions (keytap/keytap2/keytap3) with online deployments, CI/CD via GitHub Actions (build.yml), comprehensive n-gram datasets in data/, and web-based GUIs. Actively maintained with algorithmic improvements between versions, though no formal test suite visible in file list.
Single maintainer (ggerganov), no visible unit test files or test/ directory, and no explicit dependency management visible beyond CMake modules (FindFFTW.cmake, FindSDL2.cmake). Risk is moderate: mature algorithms with stable CLI interfaces, but lack of automated tests means regressions could slip through. No breaking changes apparent across keytap versions.
Active areas of work
Keytap3 is the latest focus with full automation (keytap3.cpp, keytap3-gui.cpp, keytap3-multi.cpp) and improved n-gram statistics. Web GUI deployment active (keytap3-gui-tmpl.html). No specific PR or milestone data visible, but file structure shows recent additions in keytap3 variants and corresponding Emscripten builds.
πGet running
Clone and build: git clone https://github.com/ggerganov/kbd-audio && cd kbd-audio && mkdir build && cd build && cmake .. && make. Requires CMake, C++17 compiler, SDL2, and FFTW. No package.json; pure CMake project. Run keytap3 on a WAV file: ./keytap3 <audio.wav>.
Daily commands:
cd build && make && ./keytap3 recording.wav processes a WAV file. For GUI: ./keytap3-gui launches SDL2 window. Web version compiles via build-em/keytap-gui/compile.sh. Audio input captured via audio-logger.cpp (platform-specific microphone access, implementation details in audio-logger.h).
πΊοΈMap of the codebase
keytap.cppβ Primary entry point for the keytap tool that performs keyboard key detection from audio; must understand core audio analysis algorithmkeytap2.cppβ Statistical n-gram based keyboard recovery tool; represents alternative approach to keytap without training data requirementsaudio-logger.hβ Core audio capture abstraction; all audio ingestion flows through this interfacecommon.hβ Shared utilities and data structures used across multiple tools (audio processing, frequency analysis)subbreak.hβ Key segmentation algorithm that breaks audio streams into individual keystroke events; foundational to detection pipelineconstants.hβ Centralized frequency thresholds and detection parameters used across all analysis toolsCMakeLists.txtβ Build configuration defining compilation targets for keytap, keytap2, keytap3, and GUI variants; required for development setup
π οΈHow to make changes
Add a new keystroke detection algorithm variant
- Create new segmentation header (e.g., subbreak4.h) that inherits segmentation logic from subbreak3.h, implementing improved silence detection thresholds in constants.h (
subbreak4.h) - Implement key classification in new file keytap4.cpp that calls common.cpp spectral analysis and key-detector.cpp, using the new subbreak4 segmentation (
keytap4.cpp) - Create CLI wrapper in keytap4-gui.cpp that extends common-gui.cpp for UI bindings and audio-logger.h for device input (
keytap4-gui.cpp) - Add build targets in CMakeLists.txt for keytap4 and keytap4-gui executables, linking to FFTW and SDL2 via cmake/FindFFTW.cmake (
CMakeLists.txt) - Add test validation in test-subbreak3.cpp (or new test file) to compare segmentation accuracy against baseline keytap3 on reference audio samples (
test-subbreak3.cpp)
Add a new n-gram language model for a different language
- Generate or source trigram, quadgram, and quintgram frequency tables and place in data/ directory (e.g., data/french_trigrams.txt, data/french_quadgrams.txt) (
data/french_trigrams.txt) - Create precompiled binary model by running compress-n-grams.cpp with your new text files, outputting to data/ggwords-4-gram-fr.dat.binary (
compress-n-grams.cpp) - Modify keytap2.cpp to load the new language model based on a command-line flag (e.g., --language fr) that selects the appropriate binary file from data/ (
keytap2.cpp) - Update keytap2-gui.cpp to expose a language dropdown selector in the UI that passes the selected model path to keytap2 analysis routine (
keytap2-gui.cpp)
Add frequency-based tuning parameters for different keyboard types
- Define new keyboard profile struct in constants.h with frequency ranges, attack times, and decay thresholds specific to mechanical vs. membrane keyboards (
constants.h) - Modify common.cpp spectral analysis functions to accept a keyboard_profile parameter and apply profile-specific filtering and gain adjustment (
common.cpp) - Update key-detector.cpp to use the active keyboard profile when scoring candidate keys, adjusting frequency importance weights per profile (
key-detector.cpp) - Extend keytap-gui.cpp and keytap3-gui.cpp to include dropdown menus for keyboard type selection (Mechanical/Membrane/Chiclet) that update the active profile in real time (
keytap-gui.cpp)
Add web-based streaming analysis (extend keytap3-app.cpp)
- Modify keytap3-app.cpp to accept WebSocket or HTTP streaming input from a client, decoding incoming audio chunks via dr_wav.h codec (
keytap3-app.cpp) - Integrate subbreak3.h segmentation logic to detect keystroke boundaries on the incoming stream and invoke key-detector.cpp for real-time classification (
subbreak3.h) - Stream detected keys and confidence scores back to the client via WebSocket messages; update index-keytap3-app-tmpl.html frontend to display live results (
index-keytap3-app-tmpl.html) - Add CMakeLists.txt build target for streaming keytap3 server binary with WebSocket library (e.g., libwebsockets), linking common.cpp and audio-logger.h (
CMakeLists.txt)
π§Why these technologies
- C++ with FFTW for FFT computation β Real-time acoustic analysis requires fast Fourier transforms on audio; FFTW is highly optimized for DSP workloads and is the standard in signal processing
- SDL2 for cross-platform GUI and β undefined
πͺ€Traps & gotchas
FFTW must be installed system-wide or CMake configure will fail (FindFFTW.cmake is non-standard); no vcpkg/conan integration. Audio-logger.cpp contains platform-specific code (Windows/Linux/macOS microphone APIs) not visible in headersβlinking may fail if targeting unsupported OS. N-gram binary files in data/ are not generated by build; they must exist or keytap3 will crash at runtime. Emscripten builds in build-em/ require separate emsdk installation and manual compile.sh executionβnot integrated into main CMake. SDL2 is optional for CLI but required for GUI tools; no clear build-time guard in CMakeLists.txt visible from file list.
ποΈArchitecture
π‘Concepts to learn
- Spectrogram and STFT (Short-Time Fourier Transform) β Core of keytap: audio is converted to time-frequency representation via overlapping FFT windows to visualize keyboard keystroke signatures across frequency bands
- N-gram Language Model β keytap3 uses 3-to-6-grams (data/ggwords-*-gram.dat.binary) to statistically infer which letters are likely given acoustic cluster observations, replacing keytap's manual training
- Acoustic Side-Channel Attack β The threat model: extracting sensitive information (typed text) from physical audio leakage rather than electromagnetic or timing channels
- Feature Extraction from Audio β keytap clusters raw spectrograms into key 'fingerprints' by grouping similar acoustic signatures; this labeling problem is core to the algorithm's accuracy
- Viterbi Algorithm (Hidden Markov Model inference) β Likely used implicitly in keytap3 for optimal sequence decoding: given noisy acoustic observations and n-gram constraints, find most probable text path
- Emscripten WebAssembly Compilation β build-em/ shows how C++ keytap is cross-compiled to browser-runnable WASM (keytap3.ggerganov.com); key for online deployment without server backend
- Binary Serialization of Language Models β data/ggwords-*-gram.dat.binary are packed n-gram frequency tables; understanding format is needed to extend language support or debug model loading
πRelated repos
keyboardvaults/keyboardvaults-demoβ Alternative keyboard eavesdropping research; overlaps with acoustic side-channel threat modelggerganov/whisper.cppβ Same author's speech-to-text engine using similar audio FFT/spectrogram pipeline; shares signal processing patternsThalhammer/LanguageModelsβ Standalone n-gram and language model tools; kbd-audio's data/ggwords files could be validated or extended using thistrezor/trezor-firmwareβ Hardware wallet security; relevant context for understanding side-channel attacks on input devicesggerganov/ollamaβ Same author's AI inference engine; demonstrates expertise in performance optimization and cross-platform C++ deployment
πͺPR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add unit tests for subbreak3.cpp audio segmentation logic
The repo contains three versions of subbreak functionality (subbreak.cpp, subbreak2.cpp, subbreak3.cpp) with test-subbreak3.cpp existing but appearing incomplete. There are no automated tests in the CI pipeline (.github/workflows/build.yml) validating the core audio segmentation algorithms. This is critical since keytap's accuracy depends on correct keystroke detection and separation.
- [ ] Expand test-subbreak3.cpp with comprehensive test cases for edge cases (short keystroke bursts, overlapping sounds, silence detection)
- [ ] Add CMake test targets in CMakeLists.txt for building and running subbreak tests
- [ ] Integrate test execution into .github/workflows/build.yml to run on every PR
- [ ] Document expected behavior for subbreak3 algorithm in README.md with examples
Refactor common audio processing logic from keytap*.cpp into audio-processing utility module
The repo has multiple keytap variants (keytap.cpp, keytap2.cpp, keytap3.cpp) plus GUI versions, creating significant code duplication for audio feature extraction, FFT processing, and clustering. Extracting shared logic into a dedicated audio-processing library would reduce maintenance burden and make the codebase more modular.
- [ ] Analyze keytap.cpp, keytap2.cpp, keytap3.cpp to identify common patterns (FFT operations, n-gram matching, frequency analysis)
- [ ] Create new files: audio-processor.h and audio-processor.cpp with extracted functions
- [ ] Update keytap*.cpp files to use the new audio-processor module instead of duplicated code
- [ ] Update CMakeLists.txt to link the new module and ensure all variants still compile and pass existing functionality
Add integration tests and benchmarks for keytap accuracy against sample audio files
The repo includes sample data (data/english_trigrams.txt, etc.) and HTML demo templates (index-keytap3-app-tmpl.html) but no automated tests validating end-to-end keytap accuracy. Adding sample WAV files with known keystroke sequences and accuracy benchmarks would catch regressions and help contributors verify improvements.
- [ ] Create test audio samples in WAV format (using record.cpp or record-full.cpp) with documented keystroke sequences
- [ ] Add test cases in a new tests/ directory that run keytap3 against these samples and verify keystroke detection accuracy
- [ ] Integrate benchmark execution into .github/workflows/build.yml to track accuracy metrics across commits
- [ ] Document the testing methodology in TESTING.md with instructions for contributors to generate their own test audio
πΏGood first issues
- Add unit tests for common.cpp spectral analysis functions (FFT output validation, spectrogram correctness)βcurrently no test/ directory; tests would catch regressions in signal processing.
- Document n-gram binary file format and add Python script to regenerate data/ggwords-*-gram.dat.binary from text filesβcurrently undocumented; blocks future language model updates.
- Refactor audio-logger.cpp platform detection into separate cpp files (audio-logger-win.cpp, audio-logger-linux.cpp, audio-logger-macos.cpp)βreduces conditional compilation clutter and improves testability.
βTop contributors
Click to expand
Top contributors
- @ggerganov β 99 commits
- @herrhotzenplotz β 1 commits
πRecent commits
Click to expand
Recent commits
a918207β record : fix crash on MacOS (close #47) (ggerganov)277e9c7β Update README.md (ggerganov)bd03e0aβ Update README.md (ggerganov)efc999bβ Fix link in README.md (ggerganov)7706c48β keytap3-gui : new stable version of the GUI (ggerganov)df87502β Update README.md (ggerganov)cac8409β keytap3 : multi-thread wasm version (ggerganov)c139e16β keytap3 : remove low-similarity keys (ggerganov)88186ceβ keytap3 : find optimal cutoff freq for filter (ggerganov)390aa9eβ keytap3 : unify default parameters + multi-thread beam search (ggerganov)
πSecurity observations
This codebase presents a significant security and privacy concern due to its core functionality - acoustic keyboard eavesdropping. While the code quality appears reasonable, the inherent purpose of inferring sensitive information (passwords, messages) from audio recordings poses substantial risks to confidentiality. Additionally, there are moderate concerns around input validation in audio processing, potential XSS in GUI components, and dependency management. The project requires strong ethical guidelines, responsible disclosure practices, and
- High Β· Acoustic Eavesdropping Capability - Inherent Security Risk β
keytap.cpp, keytap2.cpp, keytap3.cpp, keytap-gui.cpp, keytap2-gui.cpp, keytap3-gui.cpp. The entire codebase is designed to perform acoustic keyboard eavesdropping - inferring typed keystrokes from audio analysis. While not a code vulnerability per se, this represents a significant security and privacy threat. The tools (keytap, keytap2, keytap3) can recover sensitive information (passwords, messages, credentials) from ambient audio without user knowledge or consent. Fix: This tool should only be used for authorized security research with proper disclosure and consent. Organizations should be aware of this attack vector and implement acoustic countermeasures. Consider adding prominent warnings in documentation about responsible use. - Medium Β· Hardcoded Data Files and Language Models β
data/ directory, constants.h, compress-n-grams.cpp. The codebase includes hardcoded n-gram data files (english_trigrams.txt, english_quadgrams.txt, english_quintgrams.txt) and binary language models in the data/ directory. While not secrets, these static resources could be targeted for manipulation or may contain outdated/biased linguistic data. Fix: Document the sources and versions of linguistic data. Consider implementing checksum verification for data files. Provide a mechanism to update language models independently of code releases. - Medium Β· Missing Input Validation in Audio Processing β
audio-logger.cpp, audio-logger.h, dr_wav.h, record-full.cpp, play-full.cpp. Files like audio-logger.cpp, record.cpp, and audio processing tools (play-full.cpp, view-full-gui.cpp) handle raw audio input and WAV file parsing (dr_wav.h). There's potential for buffer overflow or malformed audio file handling without visible bounds checking. Fix: Add explicit bounds checking for audio buffer operations. Validate WAV file headers and chunk sizes before processing. Use safe memory functions and consider using AddressSanitizer during development and testing. - Medium Β· GUI Applications May Have XSS Vulnerabilities β
index-keytap2-gui-tmpl.html, index-keytap3-gui-tmpl.html, index-keytap3-app-tmpl.html, style.css. The codebase includes HTML template files (index-keytap2-gui-tmpl.html, index-keytap3-gui-tmpl.html, index-keytap3-app-tmpl.html) for web-based GUI. These may be vulnerable to XSS if user input is not properly sanitized or if the GUI files are served over HTTP. Fix: Ensure all user inputs in GUI applications are properly HTML-escaped. Use HTTPS for any web-based deployment. Apply Content Security Policy headers. Review template files for unsafe script injection points. - Low Β· No Visible Dependency Version Pinning β
CMakeLists.txt, cmake/ directory. No package manager files (package.json, requirements.txt, Cargo.toml, etc.) are visible in the repository structure. CMakeLists.txt references external dependencies (FFTW, SDL2) but versions may not be pinned, potentially leading to compatibility issues or security updates being missed. Fix: Specify explicit versions for all external dependencies. Use dependency lock files when available. Regularly audit and update dependencies for security patches. - Low Β· Insufficient Error Handling in Audio Recording β
audio-logger.cpp, audio-logger.h, record.cpp, record-full.cpp. Audio recording and logging functionality (audio-logger.cpp, record.cpp) may not have comprehensive error handling for microphone access failures, permission issues, or device disconnections. Fix: Implement comprehensive error handling for audio device access. Log permission errors appropriately. Provide user-friendly error messages when microphone access is denied or devices are unavailable.
LLM-derived; treat as a starting point, not a security audit.
πWhere to read next
- Open issues β current backlog
- Recent PRs β what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals β see the live page for receipts. Re-run on a new commit to refresh.