RepoPilot

RVC-Project/Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!

Healthy

Healthy across all four use cases

HealthyDependency

Permissive license, no critical CVEs, actively maintained — safe to depend on.

HealthyFork & modify

Has a license, tests, and CI — clean foundation to fork and modify.

HealthyLearn from

Documented and popular — useful reference codebase to read through.

HealthyDeploy as-is

No critical CVEs, sane security posture — runnable as-is.

  • Stale — last commit 1y ago
  • 31+ active contributors
  • Distributed ownership (top contributor 33% of recent commits)
  • MIT licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/rvc-project/retrieval-based-voice-conversion-webui)](https://repopilot.app/r/rvc-project/retrieval-based-voice-conversion-webui)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card

This card auto-renders when someone shares https://repopilot.app/r/rvc-project/retrieval-based-voice-conversion-webui on X, Slack, or LinkedIn.

Ask AI about rvc-project/retrieval-based-voice-conversion-webui

Grounded in the actual source code. Pick a starter question or write your own.

Or write your own question →

Onboarding doc

Onboarding: RVC-Project/Retrieval-based-Voice-Conversion-WebUI

Generated by RepoPilot · 2026-06-20 · Source

🎯Verdict

GO — Healthy across all four use cases

  • 31+ active contributors
  • Distributed ownership (top contributor 33% of recent commits)
  • MIT licensed
  • CI configured
  • Tests present
  • ⚠ Stale — last commit 1y ago

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

TL;DR

RVC is a VITS-based voice conversion framework that trains speaker-mimicking models on minimal audio data (≤10 minutes) and performs real-time voice transformation via WebUI or API. It combines retrieval-based speaker embedding with diffusion/vocoder synthesis to achieve sub-100ms latency voice cloning with near-zero training overhead. Monolithic structure: configs/ holds model architecture JSONs (v1/32k.json, v2/48k.json, etc.), assets/ stores pretrained weights and RMVPE/HuBERT embedders, api_.py files expose FastAPI endpoints, and Jupyter notebooks (Retrieval_based_Voice_Conversion_WebUI.ipynb) are primary entry points. Training and inference pipelines are implicit in the main Python modules (not visible in top 60).

👥Who it's for

Voice actors, content creators, and researchers who need fast voice cloning without GPU-heavy training pipelines; also developers building voice-enabled applications via the FastAPI endpoints (api_240604.py, api_231006.py) or Gradio WebUI who want retrieval-based voice control.

🌱Maturity & risk

Actively developed and production-ready: the project has 40k+ Jupyter notebook cells, comprehensive CI/CD (GitHub Actions for Docker, linting, unit tests), versioned model configs (v1 and v2 with multiple sample rates), and dual API versions indicating iterative refinement. However, it's a single-team project from RVC-Project, so maintenance depends on core contributor bandwidth.

High dependency count (fairseq, faiss, onnxruntime, torchcrepe, torchfcpe, librosa 0.9.1 pinned) with Python version constraints (numba==0.56.4, llvmlite==0.39.0 are tightly coupled). No visible test suite in top 60 files; unitest.yml workflow exists but content unknown. GPU/CPU switching logic (onnxruntime conditional on platform) is fragile and platform-specific.

Active areas of work

Anticipating RVCv3 with larger base models and reduced training data requirements; Docker automation via GitHub Actions (docker.yml), localization (genlocale.yml), and format enforcement (pull_format.yml, push_format.yml) suggest active community. Two API versions (231006, 240604) indicate rapid iteration on inference endpoints.

🚀Get running

Clone the repo: git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI && cd Retrieval-based-Voice-Conversion-WebUI. Install dependencies: pip install -r requirements.txt (inferred from conda/pip stacks listed). Start WebUI: python -m gradio Retrieval_based_Voice_Conversion_WebUI.ipynb or run go-web.bat on Windows; real-time mode via go-realtime-gui.bat.

Daily commands: WebUI: python web.py (inferred from file structure; actual entry not in top 60). Real-time: python infer-web.py --listen 0.0.0.0. API: uvicorn api_240604:app --reload --host 0.0.0.0 --port 7865. Docker: docker-compose up (see docker-compose.yml). Set environment vars from .env (likely GPU/device selection).

🗺️Map of the codebase

  • configs/config.py — Central configuration loader that all modules depend on for runtime settings, model paths, and feature flags.
  • api_240604.py — Latest FastAPI entry point defining all REST endpoints for voice conversion inference and training orchestration.
  • gui_v1.py — Gradio-based web UI main entry point; the primary user-facing interface for the entire application.
  • i18n/i18n.py — Internationalization system that loads locale JSON files; required for multi-language support across UI.
  • Retrieval_based_Voice_Conversion_WebUI.ipynb — Primary Colab notebook entry point; documents the full training and inference pipeline for reproducibility.
  • docker-compose.yml — Production deployment configuration specifying container orchestration, volumes, and environment setup.
  • .env — Environment variable defaults (via python-dotenv) that control API keys, model paths, and feature toggles.

🛠️How to make changes

Add a New Voice Conversion Model Architecture

  1. Create model config in configs/v2/ directory (e.g., configs/v2/48k.json) with new architecture hyperparameters. (configs/v2/48k.json)
  2. Implement model class in Python module (typically in a vits/ or models/ folder) and register in config.py. (configs/config.py)
  3. Add pre-trained weights to assets/pretrained_v2/ and update model loader in API. (api_240604.py)
  4. Update notebook inference cells to call new model variant. (Retrieval_based_Voice_Conversion_WebUI_v2.ipynb)

Add a New REST API Endpoint

  1. Define FastAPI route handler in api_240604.py with request/response Pydantic models. (api_240604.py)
  2. Document endpoint path, parameters, and response schema in the route docstring. (api_240604.py)
  3. If endpoint requires UI integration, add corresponding Gradio component in gui_v1.py. (gui_v1.py)
  4. Add endpoint test case (if test suite exists) and update API documentation. (.github/workflows/unitest.yml)

Add Support for a New Language in the UI

  1. Create new JSON locale file in i18n/locale/ (e.g., i18n/locale/es_ES.json) with complete key-value translations. (i18n/locale/en_US.json)
  2. Call i18n.load() in gui_v1.py or api_240604.py with new locale code to register translations. (i18n/i18n.py)
  3. Update README and language selector dropdown to include new locale code. (README.md)
  4. Run locale_diff.py to validate all keys are present in new translation file. (i18n/locale_diff.py)

Deploy to Production with Docker

  1. Customize docker-compose.yml to set resource limits, GPU device mapping, and volume mount paths. (docker-compose.yml)
  2. Update Dockerfile if adding system dependencies or changing Python base image version. (Dockerfile)
  3. Set environment variables in .env for API keys, model paths, and feature flags before docker-compose up. (.env)
  4. Verify CI/CD workflow in .github/workflows/docker.yml runs on push to ensure image builds successfully. (.github/workflows/docker.yml)

🔧Why these technologies

  • VITS (Variational Inference Text-to-Speech) — Core voice conversion backbone; fast synthesis with high quality on limited training data (<10 mins).
  • FAISS (Facebook AI Similarity Search) — Efficient retrieval of speaker embeddings; enables fast k-NN lookup for speaker matching without full model inference.
  • HuBERT (Hidden Unit BERT) — Pre-trained speech encoder for extracting speaker-agnostic acoustic features; reduces dependency on paired data.
  • Gradio — Minimal-code web UI framework; rapid iteration on model demo without custom frontend boilerplate.
  • FastAPI + Uvicorn — Async REST API server; handles concurrent inference requests and supports both web and programmatic access.
  • FFmpeg + pydub — Audio codec handling and format conversion; supports .wav, .mp3, .flac inputs without user preprocessing.
  • RMVPE (Robust Multi-scale Vocal Pitch Estimation) — Pitch detection model alternative to traditional signal processing; improves accuracy on noisy/singing inputs.

⚖️Trade-offs already made

  • Single-entry-point Gradio GUI instead of multi-tab SPA

    • Why: Faster development; no JavaScript framework needed. Minimal technical overhead for users.
    • Consequence: Limited real-time responsiveness; full page refreshes on some interactions. Harder to add advanced UI features (e.g., drag-drop batch processing).
  • CPU FAISS indices rather than GPU-accelerated retrieval

    • Why: Compatibility across hardware; avoids CUDA version lock-in.
    • Consequence: Latency bottleneck for large speaker databases (>100k speakers); index queries ~50–100ms CPU-bound.

🪤Traps & gotchas

HuBERT and RMVPE weights are auto-downloaded but gitignored (assets/*/.gitignore), so first run requires internet. Numba 0.56.4 + llvmlite 0.39.0 are pinned and incompatible with Python 3.11+; use Python 3.10 or rebuild from source. ONNX Runtime GPU variant is conditional on OS (onnxruntime-gpu excluded on macOS)—check sys_platform logic. Config JSON sample rates (32k, 40k, 48k) must match training data; mismatches cause silent or distorted output. .env must define CUDA_VISIBLE_DEVICES or device selection fails silently.

🏗️Architecture

💡Concepts to learn

  • openai/whisper — Audio encoder alternative to HuBERT; used in some voice conversion forks for speaker embedding extraction.
  • coqui-ai/TTS — End-to-end text-to-speech with voice cloning; complements RVC for full voice synthesis pipelines.
  • yl4579/StarGAN-Voice-Conversion — Predecessor GAN-based voice conversion method; RVC improves on this with retrieval-based embeddings and VITS decoder.
  • dipjyoti92/MusicSeparationModel — UVR5 integration (assets/uvr5_weights) for background vocal removal; common preprocessing step in RVC workflows.
  • facebookresearch/fairseq — Dependency providing HuBERT and wav2vec2 embeddings; RVC wraps fairseq models for speaker feature extraction.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for API endpoints (api_231006.py and api_240604.py)

The repo has two API files with different versions but no test coverage visible in workflows. The unitest.yml workflow exists but likely doesn't cover API endpoints. Adding integration tests would ensure both API versions work correctly, handle edge cases (invalid audio, model loading failures), and prevent regressions. This is critical for a production voice conversion service.

  • [ ] Create tests/test_api_endpoints.py with pytest fixtures for mock audio files
  • [ ] Add test cases for voice conversion endpoints in both api_231006.py and api_240604.py
  • [ ] Add test cases for model loading, inference, and error handling
  • [ ] Update .github/workflows/unitest.yml to include pytest coverage reporting
  • [ ] Document API testing procedures in CONTRIBUTING.md

Add GPU/CPU device selection tests to unitest.yml workflow

The dependencies include both onnxruntime and onnxruntime-gpu with platform-specific installation. The current unitest.yml workflow doesn't show device compatibility testing. Adding matrix testing for CPU-only and GPU paths would catch device-related failures early and ensure the codebase works across different hardware setups.

  • [ ] Extend .github/workflows/unitest.yml with strategy matrix for os: [ubuntu-latest, windows-latest] and pytorch-device: [cpu, cuda]
  • [ ] Create tests/test_device_compatibility.py to verify model loading on different devices
  • [ ] Add tests for audio processing pipeline with different device contexts
  • [ ] Document hardware requirements and testing in docs/en/training_tips_en.md

Add audio preprocessing validation and sanitization tests

The repo handles user audio uploads through Gradio (configs/config.py) and processes them with librosa, ffmpeg-python, pydub, and praat-parselmouth. There's no visible test coverage for audio format validation, corruption handling, or edge cases (silent audio, very short clips <10ms, mismatched sample rates). This is critical given the core claim of '<=10 mins of voice data'.

  • [ ] Create tests/test_audio_preprocessing.py with pytest and audio fixtures
  • [ ] Add tests for audio format validation (wav, mp3, m4a handling)
  • [ ] Add tests for edge cases: empty files, corrupted files, very short audio (<1s), mono vs stereo conversion
  • [ ] Add tests for resampling consistency across librosa/ffmpeg-python
  • [ ] Add test for max duration enforcement and minimum duration validation
  • [ ] Document audio format requirements in docs/en/faq_en.md

🌿Good first issues

  • Add unit tests for api_240604.py FastAPI routes (test inference with sample audio, validate JSON response schema, check error handling for missing models)—no visible test files in top 60.
  • Document configuration quirks: create docs/CONFIG_GUIDE.md explaining which configs/v2/*.json values affect inference latency, quality, and VRAM usage; currently implicit in JSON structure.
  • Implement graceful fallback for missing pretrained weights: add auto-download retry logic with progress bars in assets/hubert and assets/rmvpe loaders, replacing cryptic file-not-found errors.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 7ef1986 — Add warnings for files related to Onnx exports (#2385) (PriesiaMioShirakana)
  • 709bbba — optimize nsf inference (#2387) (yxlllc)
  • 1376ce7 — Fix onnx export & Support TensorRT (#2384) (PriesiaMioShirakana)
  • 9ae3af4 — 移除音频文件路径 Unicode 控制字符 (#2334) (huanzhijin)
  • 3548b4f — Update requirements-ipex.txt (RVC-Boss)
  • 1963b7d — Update requirements-win-for-realtime_vc_gui-dml.txt (RVC-Boss)
  • e954fa9 — Update requirements-win-for-realtime_vc_gui.txt (RVC-Boss)
  • cfc1e24 — fix: dlmodels.sh exits after download single file (#2175) (wellcoming)
  • b405aed — Translate Chinese text in English README (#2172) (Dafterfly)
  • 7e3965c — Update requirements.txt (#2203) (pato-pan)

🔒Security observations

  • High · Outdated and Vulnerable Dependencies — requirements.txt (dependencies list). Multiple dependencies have known vulnerabilities and are outdated. Notably: numba==0.56.4 (security issues), numpy==1.23.5 (outdated), librosa==0.9.1 (has known CVEs), gradio==3.34.0 (security vulnerabilities), fastapi==0.88 (outdated with known issues), and onnxruntime versions without proper pinning. These should be updated to patched versions. Fix: Update all dependencies to their latest stable versions. Use tools like pip-audit or safety to identify and remediate known vulnerabilities. Implement dependency scanning in CI/CD pipeline.
  • High · Missing Input Validation in API Endpoints — api_231006.py, api_240604.py. API files (api_231006.py, api_240604.py) are present but no validation logic is visible in the structure. Voice conversion systems typically process file uploads and audio data. Without visible input validation, there's risk of arbitrary file upload, path traversal, or malicious audio processing. Fix: Implement strict input validation for all API endpoints: validate file types, sizes, and audio formats. Use whitelisting for allowed file extensions. Implement path traversal protection. Sanitize all user inputs before processing.
  • High · Plaintext Secrets in .env File — .env file. The presence of a .env file suggests environment variables containing secrets may be stored. If committed to version control (even with .gitignore), or if exposed in Docker layers, credentials could be compromised. Fix: Ensure .env is in .gitignore (verify current state). Use Docker secrets or external secret management (HashiCorp Vault, AWS Secrets Manager). Never commit .env files. Scan git history for accidentally committed secrets using tools like git-secrets or truffleHog.
  • High · Unrestricted Port Exposure in Docker — Dockerfile (EXPOSE 7865), docker-compose.yml (ports: 7865:7865). The Dockerfile exposes port 7865 and docker-compose.yml maps it to 0.0.0.0:7865. Without authentication or network segmentation, the voice conversion service is accessible to anyone who can reach the host. Fix: Implement authentication/authorization on API endpoints. Use a reverse proxy (nginx) with authentication. Restrict port access via firewall rules. Consider running behind a VPN or bastion host. Add rate limiting to prevent abuse.
  • High · Arbitrary Command Execution via aria2c — Dockerfile (line: RUN aria2c). The Dockerfile contains 'RUN aria2c' with no arguments, which may attempt to download or execute unexpected content. aria2 is a download utility that could be exploited if misused or if downloading from untrusted sources. Fix: Remove the bare 'RUN aria2c' command or specify exact URLs and verify checksums. If aria2 is needed for downloading models, pin specific versions and use checksum verification (sha256sum). Use explicit download URLs only from trusted sources.
  • Medium · No Security Headers in Web UI — Web UI configuration (Gradio setup). The project uses Gradio for the web interface (gradio==3.34.0). There's no visible configuration for security headers (CSP, X-Frame-Options, HSTS, etc.) which could expose the UI to XSS and clickjacking attacks. Fix: Configure Gradio with security headers. Use a reverse proxy (nginx) to add: Content-Security-Policy, X-Frame-Options: DENY, X-Content-Type-Options: nosniff, Strict-Transport-Security. Implement CORS restrictions.
  • Medium · Insecure File Upload Handling — api_231006.py, api_240604.py, and related audio processing modules. Voice conversion requires audio file uploads. The structure suggests file handling in api_*.py files, but without visible validation. Risks include: unrestricted file sizes causing DoS, path traversal attacks, or malicious file processing. Fix: Implement file upload restrictions: max file size limits, allowed MIME types, scan with antivirus/file scanners before processing. Store uploads in isolated directories outside webroot. Use unique filenames to prevent collisions.
  • undefined · undefined — undefined. undefined Fix: undefined

LLM-derived; treat as a starting point, not a security audit.

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/RVC-Project/Retrieval-based-Voice-Conversion-WebUI shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live RVC-Project/Retrieval-based-Voice-Conversion-WebUI repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.

What it runs against: a local clone of RVC-Project/Retrieval-based-Voice-Conversion-WebUI — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in RVC-Project/Retrieval-based-Voice-Conversion-WebUI | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 561 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>RVC-Project/Retrieval-based-Voice-Conversion-WebUI</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of RVC-Project/Retrieval-based-Voice-Conversion-WebUI. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git
#   cd Retrieval-based-Voice-Conversion-WebUI
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of RVC-Project/Retrieval-based-Voice-Conversion-WebUI and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "RVC-Project/Retrieval-based-Voice-Conversion-WebUI(\\.git)?\\b" \\
  && ok "origin remote is RVC-Project/Retrieval-based-Voice-Conversion-WebUI" \\
  || miss "origin remote is not RVC-Project/Retrieval-based-Voice-Conversion-WebUI (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "configs/config.py" \\
  && ok "configs/config.py" \\
  || miss "missing critical file: configs/config.py"
test -f "api_240604.py" \\
  && ok "api_240604.py" \\
  || miss "missing critical file: api_240604.py"
test -f "gui_v1.py" \\
  && ok "gui_v1.py" \\
  || miss "missing critical file: gui_v1.py"
test -f "i18n/i18n.py" \\
  && ok "i18n/i18n.py" \\
  || miss "missing critical file: i18n/i18n.py"
test -f "Retrieval_based_Voice_Conversion_WebUI.ipynb" \\
  && ok "Retrieval_based_Voice_Conversion_WebUI.ipynb" \\
  || miss "missing critical file: Retrieval_based_Voice_Conversion_WebUI.ipynb"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 561 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~531d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/RVC-Project/Retrieval-based-Voice-Conversion-WebUI"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Embed this chat in your README →

Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.

<iframe
  src="https://repopilot.app/embed/rvc-project/retrieval-based-voice-conversion-webui"
  width="100%" height="500"
  style="border:1px solid #d0d7de; border-radius:8px;"
  allow="microphone"
  loading="lazy"
></iframe>