coqui-ai/TTS
πΈπ¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Healthy across all four use cases
Permissive license, no critical CVEs, actively maintained β safe to depend on.
Has a license, tests, and CI β clean foundation to fork and modify.
Documented and popular β useful reference codebase to read through.
No critical CVEs, sane security posture β runnable as-is.
- β Stale β last commit 2y ago
- β16 active contributors
- βDistributed ownership (top contributor 44% of recent commits)
- βMPL-2.0 licensed
- βCI configured
- βTests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README β live-updates from the latest cached analysis.
[](https://repopilot.app/r/coqui-ai/tts)Paste at the top of your README.md β renders inline like a shields.io badge.
βΈPreview social card
This card auto-renders when someone shares https://repopilot.app/r/coqui-ai/tts on X, Slack, or LinkedIn.
Ask AI about coqui-ai/TTS
Grounded in the actual source code. Pick a starter question or write your own.
Onboarding doc
Onboarding: coqui-ai/TTS
Generated by RepoPilot Β· 2026-06-21 Β· Source
π―Verdict
GO β Healthy across all four use cases
- 16 active contributors
- Distributed ownership (top contributor 44% of recent commits)
- MPL-2.0 licensed
- CI configured
- Tests present
- β Stale β last commit 2y ago
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
β‘TL;DR
Coqui TTS is a deep learning toolkit for Text-to-Speech synthesis supporting 1100+ languages with pretrained models (Glow-TTS, Tacotron2, FastPitch, XTTS, Bark). It provides both inference APIs and training pipelines for building custom TTS systems, with production-grade vocoder support (WaveGrad, HiFi-GAN, Univnet) and streaming capability achieving <200ms latency. Monolithic package structure: TTS/ contains core API (api.py), model implementations (TTS/models/), training bins (TTS/bin/train_tts.py, train_vocoder.py), data utilities (TTS/datasets/), and config system (TTS/config/). TTS/.models.json acts as a model registry. Training workflows orchestrated via TTS/bin/ CLI scripts. Recipes in recipes/ljspeech provide end-to-end fine-tuning examples. Tests split across workflows organized by component (inference, vocoder, XTTS).
π₯Who it's for
ML researchers and speech engineers building multilingual TTS systems, developers needing production TTS inference via TTS/api.py, data scientists curating audio datasets via TTS/bin tools, and practitioners fine-tuning XTTS on custom voices using recipes in recipes/ljspeech.
π±Maturity & risk
Production-ready and actively maintained. Repository shows extensive CI coverage across 13 GitHub Actions workflows (tts_tests.yml, vocoder_tests.yml, xtts_tests.yml, zoo_tests*.yml), comprehensive test suites (data_tests, inference_tests, aux_tests), and recent feature releases (XTTS v2 with 16 languages, streaming support). High engagement signaled by Discord community (1037+ members) and multiple model architectures battle-tested in production.
Moderate maintenance risk: monolithic codebase (2.99M Python lines) without clear internal module boundaries; heavy dependency on external model zoos (.models.json) and Hugging Face for model hosting creates version compatibility surface; TTS/bin/*.py training scripts lack comprehensive documentation on hyperparameter tuning, increasing user support burden. No obvious single-maintainer dependency but ecosystem relies on Facebook Fairseq and external vocoder implementations (Suno Bark, Tortoise TTS integration) whose compatibility may drift.
Active areas of work
Active development on XTTS v2 (16 languages, improved TTS quality), streaming inference optimization (<200ms latency target), fine-tuning recipes refinement, and integration of external models (Bark, Tortoise with faster inference, 1100 Fairseq MMS models). Multiple concurrent test workflows suggest ongoing validation of new model architectures and dataset handling.
πGet running
git clone https://github.com/coqui-ai/TTS.git
cd TTS
pip install -e .
# Test basic inference
python -c "from TTS.api import TTS; tts = TTS(model_name='tts_models/en/ljspeech/tacotron2-DDC', gpu=True); tts.tts_to_file(text='Hello world', file_path='output.wav')"
Daily commands:
Inference server: python -m TTS.server or programmatic via from TTS.api import TTS; tts.tts_to_file(). Training: python TTS/bin/train_tts.py --config_path config.json (requires dataset config in TTS/datasets/). Dataset analysis: python TTS/bin/find_unique_chars.py --metadata_file metadata.csv. Model export: python TTS/bin/synthesize.py --model_name tts_models/en/ljspeech/glow-tts --text 'test'. See Makefile for test targets: make test, make lint.
πΊοΈMap of the codebase
TTS/api.pyβ Main public API entry point for TTS inference β all text-to-speech requests funnel through here.TTS/model.pyβ Core model abstraction and lifecycle management (loading, inference, caching) for all TTS backends.TTS/tts/__init__.pyβ TTS module initialization and model registry β defines which TTS architectures are available (Tacotron2, VITS, XTTS, Bark, etc.).TTS/config/shared_configs.pyβ Shared configuration base classes and patterns used across all TTS and vocoder models.TTS/bin/train_tts.pyβ Main training entrypoint for TTS models β orchestrates dataset loading, model training, and checkpointing.TTS/bin/synthesize.pyβ Command-line synthesis tool demonstrating the full inference pipeline for end-users.TTS/__init__.pyβ Package initialization β exports public API and version information.
π οΈHow to make changes
Add a New TTS Architecture (e.g., new FastSpeech variant)
- Create a new config class in TTS/tts/configs/ inheriting from shared_configs.py base classes (e.g., BaseTTSConfig) (
TTS/tts/configs/my_new_tts_config.py) - Implement the model class in TTS/tts/models/ with forward() and inference() methods, matching the model.py interface (
TTS/tts/models/my_new_tts.py) - Register the new model in TTS/tts/init.py by adding imports and updating the model factory logic (
TTS/tts/__init__.py) - Create training script TTS/bin/train_<model_name>.py following the pattern in train_tts.py with dataset and optimizer setup (
TTS/bin/train_my_new_tts.py) - Test inference via TTS/api.py by specifying the new model name in TTS.tts() constructor (
TTS/api.py)
Add Support for a New Language or Phoneme Set
- Extract unique characters/phonemes from your dataset using bin/find_unique_chars.py or bin/find_unique_phonemes.py (
TTS/bin/find_unique_chars.py) - Update the TTS model config (e.g., TTS/tts/configs/xtts_config.py) to include the new characters/phonemes in its character_list (
TTS/tts/configs/xtts_config.py) - Retrain the model using TTS/bin/train_tts.py with a dataset containing the new language (
TTS/bin/train_tts.py) - Validate via TTS/api.py by testing synthesis with text in the new language (
TTS/api.py)
Add Fine-Tuning Support for Voice Cloning (e.g., speaker adaptation)
- Review existing speaker encoder in TTS/encoder/models/base_encoder.py and TTS/encoder/configs/speaker_encoder_config.py (
TTS/encoder/models/base_encoder.py) - Create a fine-tuning script (follow pattern in TTS/demos/xtts_ft_demo/utils/gpt_train.py) that loads a pre-trained TTS model and adapts speaker embeddings (
TTS/demos/xtts_ft_demo/utils/gpt_train.py) - Integrate the fine-tuned checkpoint loading into TTS/api.py by allowing checkpoint path specification in TTS.tts() constructor (
TTS/api.py) - Test the fine-tuned model via TTS/bin/synthesize.py with --checkpoint flag (
TTS/bin/synthesize.py)
Add a New Evaluation Metric or Dataset Processor
- Create a new utility script in TTS/bin/ (e.g., TTS/bin/compute_new_metric.py) that processes model outputs and datasets (
TTS/bin/compute_statistics.py) - Reference dataset loading patterns from TTS/bin/train_tts.py to ensure compatibility (
TTS/bin/train_tts.py) - Add the script to the CLI via Makefile or setup.py entry points to make it discoverable (
Makefile)
π§Why these technologies
- PyTorch β Deep learning framework for model training, inference, and checkpointing across heterogeneous hardware (CPU/GPU/TPU).
- HuggingFace Transformers & Hub β Centralized model registry and distribution; enables pre-trained weights and community contributions.
- Gradio β Rapid web UI generation for interactive TTS demos without writing frontend code.
- faster_whisper β Dependency for potential audio-to-text integration or voice-to-voice cloning pipelines.
- Python CLI (argparse) β Minimal dependency overhead for training, inference, and data preprocessing scripts.
βοΈTrade-offs already made
- Model-agnostic abstraction (TTS/model.py) instead of architecture-specific inference
- Why: Enables rapid addition of new TTS architectures without duplicating inference logic.
- Consequence: Slight performance overhead from polymorphic dispatch; mitigated by caching
πͺ€Traps & gotchas
XTTS requires fairseq and additional dependencies not listed in main setup.py (check extras_require for [xtts]). Model downloads from Hugging Face default to ~/.local/share/tts_models/ β ensure disk space. Training expects metadata.csv with specific column format (see TTS/datasets/dataset.py BaseDataset class). Phoneme encoding requires language-specific g2p_en or gruut libraries not auto-installed. Vocoder models (HiFi-GAN) must match training mel-spec config (hop_length, n_fft); mismatches cause runtime errors. GPU memory defaults to float32; mixed precision requires explicit config. Some model paths hardcoded in .models.json β offline usage requires manual model caching. Streaming XTTS uses stateful gpt_cond_len parameter β context resets between batches.
ποΈArchitecture
π‘Concepts to learn
- Mel-spectrogram frontend β Core audio feature representation used by all TTS models; understanding mel-scale frequency warping, hop_length, n_fft is essential for troubleshooting vocoder mismatches and training stability
- Teacher forcing in seq2seq TTS β Tacotron2 and similar models use teacher forcing during training but autoregressive inference; understanding this gap prevents error accumulation bugs in streaming scenarios
- Glow-based normalizing flows β Glow-TTS achieves faster training/inference via invertible transformations; understanding flow-based models explains why Glow requires different sampling strategies than Tacotron
- Multi-speaker speaker embedding (d-vectors) β XTTS and multi-speaker models use speaker embeddings for voice cloning; understanding embedding space enables fine-tuning on custom speakers via TTS/bin/compute_embeddings.py
- Autoregressive decoding with sampling β XTTS GPT decoder uses temperature/top-k sampling; understanding sampling strategies explains inference variability and how to control diversity vs determinism
- Duration prediction and alignment β FastPitch predicts phoneme durations explicitly; alignment mechanisms between text and mel-frames are critical for text-to-speech quality and debugging timing issues
- Adversarial vocoder training (GAN-based) β HiFi-GAN and WaveGrad use discriminator losses; understanding vocoder GAN stability is essential when reproducing training or debugging audio artifacts
πRelated repos
mozilla/TTSβ Historical predecessor/inspiration; Coqui TTS is the maintained fork after Mozilla discontinued their TTS projectfacebookresearch/fairseqβ Dependency for MMS multilingual models and g2p phoneme encoders; Coqui integrates 1100 fairseq models via recipessuno-ai/barkβ Alternative TTS model integrated into Coqui; users can choose between XTTS and Bark for different use casesjik876/hifi-ganβ Vocoder implementation used by Coqui's synthesis pipeline; critical dependency for mel-to-waveform generationnvidia/glow-ttsβ Model architecture reference; Coqui includes optimized Glow-TTS implementation as fast alternative to Tacotron2
πͺPR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add integration tests for TTS/api.py with multiple model backends
The TTS/api.py is the main public API but there are no dedicated integration tests in the test suite (based on the workflow files). Currently, tests are split by component (tts_tests.yml, vocoder_tests.yml, xtts_tests.yml, zoo_tests*.yml) but no unified API-level tests. This PR would add end-to-end tests covering common inference patterns across different models (Glow-TTS, XTTS, Bark, Tortoise) to catch regressions early.
- [ ] Create tests/test_api_integration.py with fixtures for major model types
- [ ] Test TTS().tts() method with different model backends and output formats
- [ ] Add tests for speaker/emotion conditioning where applicable
- [ ] Integrate into a new workflow .github/workflows/api_integration_tests.yml
- [ ] Document expected model download behavior and caching in tests
Add missing unit tests for TTS/encoder/ models and losses
The encoder module (TTS/encoder/models/base_encoder.py, TTS/encoder/losses.py, TTS/encoder/dataset.py) exists but there are no dedicated test files in the visible structure. Given the complexity of speaker and emotion encoding, this component needs isolated unit tests to ensure correctness of embedding generation and loss calculations.
- [ ] Create tests/encoder/test_base_encoder.py with forward pass and output shape validation
- [ ] Create tests/encoder/test_encoder_losses.py covering speaker verification and emotion classification losses
- [ ] Create tests/encoder/test_encoder_dataset.py for data loading and augmentation
- [ ] Add encoder tests to .github/workflows/aux_tests.yml or create dedicated .github/workflows/encoder_tests.yml
- [ ] Include validation that embeddings have expected dimensionality for downstream TTS use
Add CI validation for TTS/demos/xtts_ft_demo code quality and imports
The XTTS fine-tuning demo (TTS/demos/xtts_ft_demo/) is actively promoted in the README but has no CI validation. The demo code (xtts_demo.py, utils/gpt_train.py, utils/formatter.py) could have import errors or syntax issues that only surface when users try to run it. This PR adds linting and import-check tests specific to demo code.
- [ ] Create tests/test_demo_imports.py that imports all modules from TTS/demos/xtts_ft_demo/
- [ ] Add pylint/flake8 checks for demo code to .github/workflows/style_check.yml
- [ ] Validate that demo requirements.txt doesn't conflict with main package dependencies
- [ ] Document demo setup instructions in a TTS/demos/xtts_ft_demo/README_SETUP.md
- [ ] Add CI step to ensure demo scripts are executable and have proper entry points
πΏGood first issues
- Add test coverage for TTS/bin/extract_tts_spectrograms.py β file exists but no visible test in workflows; create tests/test_spectrograms.py with synthetic audio fixtures
- Document hyperparameter tuning guide for TTS/config/glow_tts_config.py and TTS/config/fastpitch_config.py with concrete examples (learning_rate, batch_size impact); add to docs/ as tutorial
- Implement missing validation in TTS/datasets/dataset.py for metadata.csv schema β add schema validation with helpful error messages for common mistakes (missing speaker_id column, invalid audio paths)
βTop contributors
Click to expand
Top contributors
- @erogol β 44 commits
- @WeberJulian β 16 commits
- @eginhard β 11 commits
- @Edresson β 7 commits
- @omahs β 6 commits
πRecent commits
Click to expand
Recent commits
dbf1a08β Update generic_utils.py (#3561) (Nick Potafiy)5dcc16dβ Bug fix in MP3 and FLAC compute length on TTSDataset (#3092) (Edresson)55c7063β Merge pull request #3423 from idiap/fix-aux-tests (erogol)99fee6fβ build: use Trainer>=0.0.36 (eginhard)186cafbβ Merge pull request #3412 from coqui-ai/reuben/docs-studio-refs (erogol)3991d83β Merge branch 'dev' into reuben/docs-studio-refs (erogol)fa28f99β Update to v0.22.0 (erogol)8c1a8b5β Merge pull request #3405 from coqui-ai/studio_speakers (erogol)0859e9fβ Remove Coqui Studio references (reuben)9f325b1β fixup! Fix aux unit tests (eginhard)
πSecurity observations
The codebase has moderate security concerns primarily centered on outdated dependencies with known vulnerabilities (Gradio and faster-whisper). The Dockerfile configuration lacks some hardening best practices including incomplete cache cleanup and unversioned package installations. The web server component (Gradio-based) requires careful review for XSS and injection vulnerabilities given the outdated dependency version. Immediate action should be taken to update Gradio to current versions and implement Docker image hardening practices. The static analysis does not reveal hardcoded credentials or obvious SQL injection patterns, but runtime security depends heavily on proper dependency patching and secure Gradio configuration.
- High Β· Outdated Gradio Dependency with Known Vulnerabilities β
Dependencies/Package file: gradio==4.7.1. gradio==4.7.1 is significantly outdated. Current versions of Gradio (5.x+) contain critical security patches for XSS vulnerabilities, authentication bypass issues, and other exploits. This version was released in late 2023 and lacks security updates from 2024+. Fix: Update Gradio to the latest stable version (5.x or higher). Review release notes for breaking changes and test thoroughly with the application. - Medium Β· Outdated faster-whisper Dependency β
Dependencies/Package file: faster_whisper==0.9.0. faster_whisper==0.9.0 is outdated. Current versions include bug fixes and security improvements. The pinned version from early 2023 may contain unpatched vulnerabilities. Fix: Update to the latest stable version of faster-whisper. Review changelog for security-related fixes and test compatibility. - Medium Β· Missing pip Cache Cleanup in Multi-Stage Docker Build β
Dockerfile: Lines with RUN pip3 and 'RUN make install'. The Dockerfile cleans pip cache only once after PyTorch installation, but subsequent pip installs (via 'make install') may create new cache artifacts containing sensitive build data in the final image layer. Fix: Add 'RUN rm -rf /root/.cache/pip' after the 'RUN make install' command to ensure complete cache cleanup in the final image. - Medium Β· Unrestricted Base Image β
Dockerfile: Base image declaration and apt-get upgrade. The Dockerfile uses 'ARG BASE=nvidia/cuda:11.8.0-base-ubuntu22.04' without version pinning for Ubuntu packages. The initial 'apt-get upgrade -y' applies all available updates which could introduce breaking changes or unexpected behavior. Fix: Pin specific package versions in apt-get install commands. Replace 'apt-get upgrade -y' with 'apt-get update' only, or pin specific versions for critical packages like espeak-ng. - Low Β· ENTRYPOINT Configuration May Mask Execution Errors β
Dockerfile: ENTRYPOINT and CMD lines. The Dockerfile uses ENTRYPOINT ["tts"] with CMD ["--help"], which may mask container startup errors if the 'tts' command is not properly installed or has issues. Fix: Consider using a shell wrapper as ENTRYPOINT to provide better error handling and logging. Alternatively, use form 'ENTRYPOINT ["python", "-m", "TTS.bin.synthesize"]' if applicable. - Low Β· No HEALTHCHECK Defined β
Dockerfile: Missing HEALTHCHECK instruction. The Dockerfile does not include a HEALTHCHECK instruction, making it difficult to detect container runtime failures in orchestrated environments. Fix: Add a HEALTHCHECK instruction to verify the TTS service is functional. Example: 'HEALTHCHECK --interval=30s CMD tts --version || exit 1' - Low Β· Potential Path Traversal Risk in Server Configuration β
TTS/server/ directory structure. The server component includes static files and templates. Without examining the actual server.py implementation, there's a standard risk of path traversal attacks if file serving is implemented insecurely. Fix: Review TTS/server/server.py to ensure proper path validation for static file serving. Use safe path joining (os.path.join with normalization) and reject requests with '../' patterns.
LLM-derived; treat as a starting point, not a security audit.
πWhere to read next
- Open issues β current backlog
- Recent PRs β what's actively shipping
- Source on GitHub
π€Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale β STOP and ask the user to regenerate it before proceeding. - Treat the AI Β· unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/coqui-ai/TTS shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything β but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
β Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live coqui-ai/TTS
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale β regenerate it at
repopilot.app/r/coqui-ai/TTS.
What it runs against: a local clone of coqui-ai/TTS β the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in coqui-ai/TTS | Confirms the artifact applies here, not a fork |
| 2 | License is still MPL-2.0 | Catches relicense before you depend on it |
| 3 | Default branch dev exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit β€ 661 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of coqui-ai/TTS. If you don't
# have one yet, run these first:
#
# git clone https://github.com/coqui-ai/TTS.git
# cd TTS
#
# Then paste this script. Every check is read-only β no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of coqui-ai/TTS and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "coqui-ai/TTS(\\.git)?\\b" \\
&& ok "origin remote is coqui-ai/TTS" \\
|| miss "origin remote is not coqui-ai/TTS (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(MPL-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"MPL-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is MPL-2.0" \\
|| miss "license drift β was MPL-2.0 at generation time"
# 3. Default branch
git rev-parse --verify dev >/dev/null 2>&1 \\
&& ok "default branch dev exists" \\
|| miss "default branch dev no longer exists"
# 4. Critical files exist
test -f "TTS/api.py" \\
&& ok "TTS/api.py" \\
|| miss "missing critical file: TTS/api.py"
test -f "TTS/model.py" \\
&& ok "TTS/model.py" \\
|| miss "missing critical file: TTS/model.py"
test -f "TTS/tts/__init__.py" \\
&& ok "TTS/tts/__init__.py" \\
|| miss "missing critical file: TTS/tts/__init__.py"
test -f "TTS/config/shared_configs.py" \\
&& ok "TTS/config/shared_configs.py" \\
|| miss "missing critical file: TTS/config/shared_configs.py"
test -f "TTS/bin/train_tts.py" \\
&& ok "TTS/bin/train_tts.py" \\
|| miss "missing critical file: TTS/bin/train_tts.py"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 661 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~631d)"
else
miss "last commit was $days_since_last days ago β artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) β safe to trust"
else
echo "artifact has $fail stale claim(s) β regenerate at https://repopilot.app/r/coqui-ai/TTS"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
Generated by RepoPilot. Verdict based on maintenance signals β see the live page for receipts. Re-run on a new commit to refresh.
Embed this chat in your README β
Drop this iframe anywhere β the widget runs against the same live analysis cache as the main app.
<iframe src="https://repopilot.app/embed/coqui-ai/TTS" width="100%" height="500" style="border:1px solid #d0d7de; border-radius:8px;" allow="microphone" loading="lazy" ></iframe>