rany2/edge-tts

Item: rany2/edge-tts
Rating: 3
Author: RepoPilot

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Mixed

Single-maintainer risk — review before adopting

weakest axis

Use as dependencyConcerns

non-standard license (Other)

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 7w ago
✓16 active contributors
✓Other licensed

Show all 7 evidence items →

✓CI configured
✓Tests present
⚠Single-maintainer risk — top contributor 83% of recent commits
⚠Non-standard license (Other) — review terms

What would change the summary?

→Use as dependency Concerns → Mixed if: clarify license terms

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Forkable](https://repopilot.app/api/badge/rany2/edge-tts?axis=fork)](https://repopilot.app/r/rany2/edge-tts)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/rany2/edge-tts on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: rany2/edge-tts

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/rany2/edge-tts shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Single-maintainer risk — review before adopting

Last commit 7w ago
16 active contributors
Other licensed
CI configured
Tests present
⚠ Single-maintainer risk — top contributor 83% of recent commits
⚠ Non-standard license (Other) — review terms

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live rany2/edge-tts repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/rany2/edge-tts.

What it runs against: a local clone of rany2/edge-tts — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in rany2/edge-tts | Confirms the artifact applies here, not a fork | | 2 | License is still Other | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 76 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>rany2/edge-tts</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of rany2/edge-tts. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/rany2/edge-tts.git
#   cd edge-tts
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of rany2/edge-tts and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "rany2/edge-tts(\\.git)?\\b" \\
  && ok "origin remote is rany2/edge-tts" \\
  || miss "origin remote is not rany2/edge-tts (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
  && ok "license is Other" \\
  || miss "license drift — was Other at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "src/edge_tts/communicate.py" \\
  && ok "src/edge_tts/communicate.py" \\
  || miss "missing critical file: src/edge_tts/communicate.py"
test -f "src/edge_tts/__init__.py" \\
  && ok "src/edge_tts/__init__.py" \\
  || miss "missing critical file: src/edge_tts/__init__.py"
test -f "src/edge_tts/voices.py" \\
  && ok "src/edge_tts/voices.py" \\
  || miss "missing critical file: src/edge_tts/voices.py"
test -f "src/edge_tts/data_classes.py" \\
  && ok "src/edge_tts/data_classes.py" \\
  || miss "missing critical file: src/edge_tts/data_classes.py"
test -f "src/edge_tts/__main__.py" \\
  && ok "src/edge_tts/__main__.py" \\
  || miss "missing critical file: src/edge_tts/__main__.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 76 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~46d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/rany2/edge-tts"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

edge-tts is a Python library that exposes Microsoft Edge's online text-to-speech (TTS) service without requiring Microsoft Edge, Windows, or an API key. It provides both a programmatic API (via communicate.py) and CLI tools (edge-tts and edge-playback commands) to convert text to speech with support for 100+ voices, subtitle generation (SRT format), and audio streaming. The core innovation is reverse-engineering Edge's TTS endpoint to access high-quality neural voices freely. Two-package structure under src/: (1) edge_tts/ contains the core TTS engine (communicate.py handles async/sync communication, voices.py lists available voices, drm.py handles protocol encryption, submaker.py and srt_composer.py generate subtitles), and (2) edge_playback/ wraps TTS with real-time audio playback via mpv. Examples in examples/ show async/sync usage patterns with dynamic voice selection and subtitle streaming. CLI entry points in __main__.py files.

👥Who it's for

Python developers and end-users who need TTS synthesis without Microsoft licensing costs—particularly those building voice assistants, accessibility tools, video generation pipelines, or batch audio processing scripts. Contributors include maintainers maintaining the Edge protocol compatibility layer and users reporting when Microsoft changes the TTS service.

🌱Maturity & risk

Actively maintained and production-ready. The repo shows consistent Python code quality (63KB of Python with linting/type-checking in place via .isort.cfg, mypy.ini, pylintrc), has CI/CD via GitHub Actions (CodeQL, code quality), and includes both sync/async examples. However, the single-maintainer structure (rany2) and dependence on reverse-engineered Microsoft endpoints introduce stability risk.

High risk of breaking changes: the entire library depends on Microsoft Edge's undocumented TTS endpoint, which could change without notice, breaking all functionality. Minimal external dependencies is a strength, but the maintainer appears to be a single person (rany2), creating bus-factor risk. Last visible activity and issue response velocity are unclear from file list alone, but the GPLv3 license may deter commercial use.

Active areas of work

Active development around async audio generation, streaming with subtitles, and voice catalog management. The examples/ directory has recent additions for async dynamic voice selection (async_audio_gen_with_dynamic_voice_selection.py) and streaming variants. Code quality workflows are active (CodeQL, linting). Specific PR/issue visibility unavailable from file list.

🚀Get running

git clone https://github.com/rany2/edge-tts.git
cd edge-tts
pip install -e .
# Or for CLI only:
pipx install .
# Test with:
edge-tts --list-voices
edge-tts --text "Hello world" --write-media output.mp3 --write-subtitles output.srt

Daily commands:

# As CLI:
edge-tts --text "Hello" --write-media out.mp3
# With playback (requires mpv):
edge-playback --text "Hello"
# As library:
python -c "import asyncio; from edge_tts import Communicate; asyncio.run(Communicate('Hello', 'en-US-AriaNeural').save('out.mp3'))"

🗺️Map of the codebase

src/edge_tts/communicate.py — Core orchestrator that handles the TTS API communication workflow; all text-to-speech requests flow through this module.
src/edge_tts/__init__.py — Public API surface and primary entry point; defines the main Communicate class and exceptions exposed to users.
src/edge_tts/voices.py — Voice enumeration and metadata; critical for understanding available voices and voice selection logic.
src/edge_tts/data_classes.py — Data model definitions for voices, synthesis results, and configuration; used across all TTS operations.
src/edge_tts/__main__.py — Command-line interface entry point; demonstrates primary usage patterns and argument parsing.
src/edge_playback/__main__.py — Playback command-line interface; shows how to integrate TTS with audio streaming and subtitle rendering.

🧩Components & responsibilities

Communicate (src/edge_tts/communicate.py) (asyncio, aiohttp, WebSocket, SSML generation) — Orchestrates the entire synthesis workflow: connects to Edge TTS API via WebSocket, sends SSML requests, streams audio chunks, extracts timing metadata, and yields results to consumers.
- Failure mode: Network errors (connection refused, timeout) → ConnectionError; API errors (invalid SSML, unsupported voice) → UnknownStatusError or exception from Edge service
VoiceManager (src/edge_tts/voices.py) — undefined

🛠️How to make changes

Add a new voice or modify voice metadata

Update the voice enumeration in src/edge_tts/voices.py with new voice names and regional variants (src/edge_tts/voices.py)
If adding new voice attributes, extend the Voice dataclass in src/edge_tts/data_classes.py (src/edge_tts/data_classes.py)
Update the --list-voices output handler in src/edge_tts/main.py if voice schema changed (src/edge_tts/__main__.py)

Support a new audio output format

Add format constant to src/edge_tts/constants.py (e.g., AUDIO_FORMATS) (src/edge_tts/constants.py)
Modify the API request builder in src/edge_tts/communicate.py to support the new format parameter (src/edge_tts/communicate.py)
Update CLI argument parsing in src/edge_tts/main.py to expose the format option (src/edge_tts/__main__.py)
If format requires special handling, add logic to src/edge_tts/drm.py for decryption/processing (src/edge_tts/drm.py)

Add a new CLI command option

Define the new argument in the argument parser within src/edge_tts/main.py (src/edge_tts/__main__.py)
Pass the new argument to the Communicate class instantiation in src/edge_tts/main.py (src/edge_tts/communicate.py)
If the option affects API communication, update the request builder in src/edge_tts/communicate.py (src/edge_tts/communicate.py)
Document the option in the README.md with usage examples (README.md)

Implement a new subtitle format (e.g., WebVTT)

Create a new module src/edge_tts/webvtt_composer.py following the pattern of src/edge_tts/srt_composer.py (src/edge_tts/srt_composer.py)
Use src/edge_tts/submaker.py to generate timing metadata (reuse existing logic) (src/edge_tts/submaker.py)
Add format selection logic to src/edge_tts/main.py to choose between SRT and WebVTT (src/edge_tts/__main__.py)

🔧Why these technologies

asyncio + aiohttp WebSocket — Microsoft Edge's TTS API streams audio and metadata over WebSocket; async I/O prevents blocking during network waits and enables concurrent synthesis jobs.
Dataclasses (data_classes.py) — Type-safe, lightweight structured data for voices and synthesis results; leverages Python's built-in typing for IDE support and validation.
SSML (Speech Synthesis Markup Language) — Allows fine-grained control over prosody (rate, pitch, volume) and pronunciation; required by the Edge TTS API.
SRT subtitle format — Human-readable, widely supported subtitle standard; simple to generate from synthesis metadata timestamps.
argparse CLI — Standard Python library for command-line interfaces; minimal dependencies, sufficient for feature-rich CLI with --write-media, --write-subtitles, etc.

⚖️Trade-offs already made

Async-only API (no sync wrapper in core)
- Why: Simplifies codebase and avoids run_loop complexity; WebSocket I/O is inherently async.
- Consequence: Users must run in an async context or use asyncio.run(). Mitigated by providing sync CLI entry points that handle the event loop.
Direct WebSocket to Edge API (no proxy/cache layer)
- Why: Reduces latency and eliminates extra server infrastructure; direct streaming.
- Consequence: Each synthesis request hits the live Edge service; no offline caching of audio. Rate limits depend on Microsoft's service policies.
Single voice selection (no voice mixing)
- Why: Matches Edge TTS API design; simpler user experience.
- Consequence: Cannot blend multiple voices in a single synthesis. Users must run multiple jobs and post-process audio.
Platform-specific playback (mpv on Unix, win32 API on Windows)
- Why: Leverages OS-native audio subsystems; avoids heavy dependencies like PyAudio.
- Consequence: Requires mpv installation on Linux/macOS; different code paths increase test burden.

🚫Non-goals (don't propose these)

Does not provide offline TTS synthesis (requires live Microsoft Edge API)
Does not handle authentication or API keys (Edge TTS is unauthenticated)
Does not support real-time interactive voice synthesis (streaming is one-way from service)
Does not provide voice cloning or custom voice training
Does not handle watermarking or DRM verification (passthrough only)

🪤Traps & gotchas

Microsoft endpoint volatility: The TTS service URL and protocol in constants.py can change without notice; test against live service. DRM/encryption drift: drm.py may fail silently if Microsoft changes their request signing scheme—monitor for 401/403 errors. Voice list staleness: voices.py is manually maintained; running locally may show outdated or missing voices. Platform-specific playback: edge_playback requires mpv on non-Windows; Windows uses win32_playback.py with different audio sink. Async context requirements: All TTS methods are async-only in communicate.py; mixing sync calls may cause event loop issues. Subtitle timing precision: SRT generation relies on Microsoft's metadata; malformed or missing timing data causes subtitle misalignment.

🏗️Architecture

💡Concepts to learn

Server-Sent Events (SSE) — Microsoft's TTS service streams audio and metadata via SSE; communicate.py parses chunked responses in real-time without buffering the entire audio.
SSML (Speech Synthesis Markup Language) — TTS requests use SSML tags for voice selection and prosody control, but edge-tts explicitly restricts SSML to match Microsoft's constraints (see communicate.py).
Reverse Engineering HTTP Protocols — The entire project is built on reverse-engineered Microsoft Edge browser API; understanding request/response signing in drm.py is key to maintaining compatibility.
Async/Await Concurrency — All core TTS operations in communicate.py are async-first with asyncio; handling concurrent voice synthesis requires understanding event loops and coroutine patterns.
SubRip (SRT) Subtitle Format — Output subtitles are SRT format (parsed/generated in srt_composer.py); understanding timecode parsing and composition is needed for subtitle feature development.
Data Marshalling & Serialization — Voice metadata and synthesis responses use custom data classes (data_classes.py) with JSON-like serialization; the mapping between Microsoft's API schema and Python types is critical.
Request Signing & API Authentication — Microsoft's TTS endpoint enforces per-request signatures via drm.py; this is not standard OAuth but a custom protocol that can break without warning.

pyttsx3/pyttsx3 — Local TTS engine (not cloud-based) as alternative; no API key needed but limited voice quality compared to Edge.
gtts/gTTS — Google Translate TTS wrapper with similar ease-of-use; different service provider with different voice library.
espeak-ng/espeak-ng — Open-source offline TTS engine; used as fallback when cloud services unavailable.
openai/whisper — Companion tool for speech-to-text; often paired with edge-tts for voice conversation pipelines.
yt-dlp/yt-dlp — Video processing ecosystem; edge-tts frequently used in scripts to add audio tracks to downloaded video content.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for src/edge_tts/communicate.py

The communicate.py module is the core of the library (handles TTS communication), but there are no visible unit tests in the tests/ directory. Currently only shell integration tests exist (001-long-text.sh). Adding unit tests would improve code quality, catch regressions, and make it easier for contributors to modify the core logic safely.

[ ] Create tests/test_communicate.py with unit tests for the Communicate class
[ ] Test key methods like list_voices(), synthesize(), and get_voices()
[ ] Add mocked network requests to avoid external API calls during testing
[ ] Add test cases for edge cases (empty text, invalid voices, network errors)
[ ] Update .github/workflows/code-quality.yml or create a pytest workflow if missing

Add unit tests for src/edge_tts/submaker.py and src/edge_tts/srt_composer.py

The subtitle/SRT generation modules (submaker.py and srt_composer.py) are critical for the --write-subtitles feature, but have no corresponding test coverage. These modules handle data transformation and formatting, making them ideal candidates for unit testing with diverse input samples.

[ ] Create tests/test_submaker.py with tests for SubMaker class methods
[ ] Create tests/test_srt_composer.py with tests for SrtComposer class
[ ] Add test cases for different subtitle timing scenarios and edge cases
[ ] Test SRT format compliance (proper newlines, timecode format)
[ ] Add integration test ensuring --write-subtitles generates valid SRT files

Add GitHub Action workflow for testing across Python versions and platforms

The existing .github/workflows (code-quality.yml, codeql-analysis.yml) don't include a dedicated matrix testing workflow. Adding a Python version matrix test (3.8, 3.9, 3.10, 3.11, 3.12) and running on Linux/macOS/Windows would catch platform-specific bugs early and ensure compatibility claims in the README are validated.

[ ] Create .github/workflows/test-matrix.yml with pytest job
[ ] Configure matrix strategy for python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']
[ ] Configure matrix for os: [ubuntu-latest, macos-latest, windows-latest]
[ ] Install edge-tts in editable mode and run pytest on all platforms
[ ] Ensure code-quality.yml and test-matrix.yml don't have overlapping concerns

🌿Good first issues

Add type hints to src/edge_tts/util.py and src/edge_playback/util.py (currently no .pyi files for these utility modules; mypy may be incomplete).
Write unit tests for src/edge_tts/srt_composer.py subtitle composition logic—currently no test files visible in tests/ directory beyond shell integration tests.
Document the DRM protocol in src/edge_tts/drm.py with inline comments explaining the encryption/signing steps, as this is the most opaque part of the codebase.

⭐Top contributors

Click to expand

@rany2 — 83 commits
@scott306lr — 3 commits
@Copilot — 1 commits
@hcgiub001 — 1 commits
@Shoghy — 1 commits

📝Recent commits

Click to expand

4bdb8e4 — Bump edge_tts version to 7.2.8 (rany2)
b1055c8 — perf: cache SSL context at module level to avoid per-call overhead (#469) (Copilot)
9965046 — fix: replace metadata-based offset compensation with CBR byte-count math (#468) (hcgiub001)
b45c019 — Bump edge_tts version to 7.2.7 (rany2)
60453da — Update Voice TypedDict (#449) (Shoghy)
3db3824 — Bump edge_tts version to 7.2.6 (rany2)
2636306 — Switch back to endpoint currently in use by Microsoft Edge on macOS (rany2)
8e8936b — Add --version to print version of edge-tts (rany2)
80544f7 — Bump edge_tts version to 7.2.5 (rany2)
cb6715f — Setup old endpoint again with MUID support (rany2)

🔒Security observations

The edge-tts codebase is a relatively lightweight Python module with moderate security posture. No hardcoded credentials or exposed secrets were found. The primary security concerns relate to external service dependencies, input validation, and potential injection vectors through shell scripts. The project follows good Python packaging practices with linting and code quality workflows. However, the lack of explicit input validation, rate limiting, and SSL/TLS certificate pinning for external API calls presents medium-risk vulnerabilities. Addressing input validation and adding defensive measures against MITM attacks would significantly improve the security profile.

Medium · External Service Dependency Without Verification — src/edge_tts/communicate.py. The codebase relies on Microsoft Edge's online text-to-speech service without apparent SSL/TLS certificate pinning or request signature verification. This could expose users to man-in-the-middle (MITM) attacks if network communication is not properly secured. Fix: Implement certificate pinning for Microsoft Edge TTS service endpoints, validate SSL certificates explicitly, and consider adding request signing/verification mechanisms.
Medium · Potential Command Injection via Shell Commands — build_and_publish.sh, format.sh, lint.sh, tests/001-long-text.sh. The file 'build_and_publish.sh' and 'format.sh', 'lint.sh' suggest shell script execution. If any user input is passed to these scripts without proper sanitization, command injection vulnerabilities could occur. Fix: Avoid passing untrusted user input to shell scripts. If necessary, use parameterized commands and escape all inputs. Consider using Python subprocess with shell=False.
Medium · Missing Input Validation in Text-to-Speech Processing — src/edge_tts/communicate.py, src/edge_tts/__main__.py. The module processes user-supplied text for speech synthesis. Without proper input validation and sanitization, malicious or oversized inputs could cause denial-of-service, resource exhaustion, or unexpected behavior. Fix: Implement strict input validation for text content, enforce maximum length limits, validate encoding, and handle edge cases gracefully.
Low · No Apparent Rate Limiting — src/edge_tts/communicate.py. The codebase does not show evidence of rate limiting mechanisms for calls to the external TTS service. This could allow abuse or accidental resource exhaustion. Fix: Implement client-side rate limiting with configurable thresholds, request queuing, and backoff strategies to prevent abuse of the external service.
Low · Missing Security Headers Documentation — src/edge_tts/communicate.py. No explicit security headers validation is documented for responses from the Microsoft Edge TTS service. Fix: Document and validate security headers (Content-Type, Content-Length, etc.) in responses from the external service. Implement response validation.
Low · Potential Information Disclosure via Error Messages — src/edge_tts/exceptions.py, src/edge_tts/communicate.py. Error handling in the codebase may expose sensitive information about internal service calls, API structures, or system details to end users. Fix: Implement generic error messages for end users while logging detailed errors server-side. Avoid exposing stack traces or service details in user-facing output.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

rany2/edge-tts

Embed the "Forkable" badge

Onboarding doc

Onboarding: rany2/edge-tts

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🧩Components & responsibilities

🛠️How to make changes

Add a new voice or modify voice metadata

Support a new audio output format

Add a new CLI command option

Implement a new subtitle format (e.g., WebVTT)

🔧Why these technologies

⚖️Trade-offs already made

🚫Non-goals (don't propose these)

🪤Traps & gotchas

🏗️Architecture

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive unit tests for src/edge_tts/communicate.py

Add unit tests for src/edge_tts/submaker.py and src/edge_tts/srt_composer.py

Add GitHub Action workflow for testing across Python versions and platforms

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next