RepoPilotOpen in app →

RVC-Project/Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!

Healthy

Healthy across all four use cases

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • 31+ active contributors
  • Distributed ownership (top contributor 33% of recent commits)
  • MIT licensed
  • CI configured
  • Tests present
  • Stale — last commit 1y ago

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/rvc-project/retrieval-based-voice-conversion-webui)](https://repopilot.app/r/rvc-project/retrieval-based-voice-conversion-webui)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/rvc-project/retrieval-based-voice-conversion-webui on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: RVC-Project/Retrieval-based-Voice-Conversion-WebUI

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/RVC-Project/Retrieval-based-Voice-Conversion-WebUI shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

  • 31+ active contributors
  • Distributed ownership (top contributor 33% of recent commits)
  • MIT licensed
  • CI configured
  • Tests present
  • ⚠ Stale — last commit 1y ago

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live RVC-Project/Retrieval-based-Voice-Conversion-WebUI repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.

What it runs against: a local clone of RVC-Project/Retrieval-based-Voice-Conversion-WebUI — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in RVC-Project/Retrieval-based-Voice-Conversion-WebUI | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 558 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>RVC-Project/Retrieval-based-Voice-Conversion-WebUI</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of RVC-Project/Retrieval-based-Voice-Conversion-WebUI. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git
#   cd Retrieval-based-Voice-Conversion-WebUI
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of RVC-Project/Retrieval-based-Voice-Conversion-WebUI and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "RVC-Project/Retrieval-based-Voice-Conversion-WebUI(\\.git)?\\b" \\
  && ok "origin remote is RVC-Project/Retrieval-based-Voice-Conversion-WebUI" \\
  || miss "origin remote is not RVC-Project/Retrieval-based-Voice-Conversion-WebUI (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "configs/config.py" \\
  && ok "configs/config.py" \\
  || miss "missing critical file: configs/config.py"
test -f "api_240604.py" \\
  && ok "api_240604.py" \\
  || miss "missing critical file: api_240604.py"
test -f "gui_v1.py" \\
  && ok "gui_v1.py" \\
  || miss "missing critical file: gui_v1.py"
test -f "configs/config.json" \\
  && ok "configs/config.json" \\
  || miss "missing critical file: configs/config.json"
test -f "i18n/i18n.py" \\
  && ok "i18n/i18n.py" \\
  || miss "missing critical file: i18n/i18n.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 558 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~528d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/RVC-Project/Retrieval-based-Voice-Conversion-WebUI"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

A software project. See architecture tab.

👥Who it's for

Developers.

🌱Maturity & risk

See activity metrics.

Standard open source risks apply.

Active areas of work

Check recent commits.

🚀Get running

Check README for instructions.

🗺️Map of the codebase

  • configs/config.py — Central configuration loader that defines all model paths, audio parameters, and feature extraction settings—all training and inference depends on this.
  • api_240604.py — Latest FastAPI server entry point for inference; handles voice conversion requests and model loading—primary production API.
  • gui_v1.py — Gradio-based web UI entry point for both training and inference workflows—main user-facing interface.
  • configs/config.json — Runtime configuration file specifying device, model versions, and feature extractors—must be present and valid for any operation.
  • i18n/i18n.py — Internationalization loader that maps all UI strings—required for multi-language support across web UI.
  • Dockerfile — Container build definition for reproducible deployment—defines the canonical runtime environment.
  • .env — Environment variable configuration for deployment settings—controls secrets and runtime behavior.

🛠️How to make changes

Add a new voice feature extractor

  1. Define the extractor class with forward() method in a new module (e.g., rvc/modules/extractors/new_extractor.py) (configs/config.py)
  2. Register the extractor in config.py under the hubert_path or rmvpe_path conditionals (configs/config.py)
  3. Update configs/config.json to add a new feature_extractor_type option (configs/config.json)
  4. Modify the inference pipeline (api_240604.py) to use the new extractor when selected (api_240604.py)

Add a new API endpoint for voice conversion

  1. Define a new FastAPI route in api_240604.py with @app.post() decorator (api_240604.py)
  2. Accept audio file (bytes or multipart) and model_id parameters (api_240604.py)
  3. Load model from configs/inuse/v2 (or v1) directory using existing model loader (api_240604.py)
  4. Run inference pipeline with feature extraction and return WAV audio (api_240604.py)

Add a new training tab to the Gradio UI

  1. Create a new Gradio Tab block in gui_v1.py with layout components (inputs, buttons, outputs) (gui_v1.py)
  2. Hook Tab's button events to training functions that call model training scripts (gui_v1.py)
  3. Save trained model to configs/inuse/v2 or configs/inuse/v1 directory (gui_v1.py)
  4. Add corresponding i18n labels in i18n/locale/*.json files for each language (i18n/locale/en_US.json)

Add a new model configuration variant

  1. Create a new JSON file (e.g., configs/v2/16k.json) with sampling_rate, hidden_channels, and feature dimensions (configs/v2/16k.json)
  2. Update configs/config.py to load this config variant when selected (configs/config.py)
  3. Download or train weights and place in assets/pretrained_v2/ (assets/pretrained_v2/.gitignore)
  4. Add UI option in gui_v1.py Dropdown to select the new variant (gui_v1.py)

🔧Why these technologies

  • VITS (Variational Inference Text-to-Speech) — Provides fast, high-quality neural vocoding for voice conversion without requiring phoneme alignment at inference time.
  • HuBERT (Hidden-Unit BERT) — State-of-the-art self-supervised voice representation; enables speaker-agnostic feature extraction for retrieval-based matching.
  • FAISS (Facebook AI Similarity Search) — Enables fast k-NN retrieval of similar speaker embeddings from training dataset—core of the 'retrieval-based' approach.
  • Gradio — Rapid web UI prototyping with minimal code; integrates with Jupyter/Colab for easy deployment.
  • FastAPI — Lightweight async Python API server; supports streaming audio responses and concurrent inference requests.
  • PyTorch + fairseq — Standard ML stack for model training and inference; fairseq provides pretrained HuBERT and training utilities.

⚖️Trade-offs already made

  • Retrieval-based approach (FAISS index) vs. fine-tuning model on each speaker

    • Why: Allows training on < 10 minutes of audio per speaker without full retraining; FAISS lookup is ~50ms.
    • Consequence: Quality depends on training dataset diversity; poor retrieval = worse conversion quality. Requires maintaining embeddings index.
  • Separate v1 and v2 model configs (32k/40k/48k sampling rates)

    • Why: v2 uses better feature extractors and supports higher quality; v1 is legacy but still supported.
    • Consequence: Users must choose version upfront; models are not interchangeable. Doubles maintenance burden.
  • Single Gradio UI for both training and inference

    • Why: Simplifies deployment and reduces code duplication; both workflows share model loading logic.
    • Consequence: Tab-based UI can become cluttered; training and inference have different requirements (GPU memory, latency).
  • Pitch (F0) extraction via RMVPE or other pitch detectors

    • Why: Preserves speaker's pitch contour during conversion; separate from timbre extraction.
    • Consequence: Adds ~0.5s latency per inference; requires tuning for noisy/multi-speaker audio.

🚫Non-goals (don't propose these)

  • Real-time streaming voice conversion (batch processing only; ~1–3s latency per utterance)
  • Automatic speaker diarization or separation (

🪤Traps & gotchas

Standard debugging applies.

🏗️Architecture

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for API endpoints (api_231006.py and api_240604.py)

The repo has two API files with different versions but no visible test coverage in .github/workflows/unitest.yml. The API layer is critical for inference and training workflows. Adding tests would catch regressions when updating FastAPI (currently pinned to 0.88) and validate request/response schemas across both API versions.

  • [ ] Create tests/test_api_231006.py with fixtures for common endpoints (inference, training status, model management)
  • [ ] Create tests/test_api_240604.py with similar coverage for the newer API version
  • [ ] Add test cases for error handling (missing model files, invalid audio formats, GPU/CPU fallback paths)
  • [ ] Update .github/workflows/unitest.yml to run pytest on tests/ directory before deployment
  • [ ] Validate compatibility with uvicorn>=0.21.1 and fastapi==0.88 dependency pins

Add audio format validation and preprocessing tests for audio I/O pipeline

The repo depends on librosa==0.9.1, soundfile>=0.12.1, pydub>=0.25.1, and ffmpeg-python>=0.2.0 for audio handling, but there's no visible test coverage for the audio loading/conversion pipeline. Given the critical nature of audio processing for voice conversion quality, testing edge cases (corrupt files, unsupported formats, extreme sample rates) would prevent silent failures.

  • [ ] Create tests/test_audio_preprocessing.py with fixtures for various audio formats (wav, mp3, flac, m4a)
  • [ ] Add tests for librosa and soundfile integration covering edge cases (mono/stereo, sample rate mismatches, bit depth handling)
  • [ ] Test ffmpeg fallback paths when native libraries fail
  • [ ] Validate resampling correctness with tests/fixtures/sample_audio/ reference files
  • [ ] Update .github/workflows/unitest.yml to include ffmpeg-python compatibility checks

Add missing type hints and create mypy CI workflow for Python codebase

The codebase has complex dependencies (fairseq==0.12.2, torchcrepe==0.0.20, onnxruntime, PyTorch models) but lacks visible type annotations. With multiple Python API versions and model inference paths, adding type hints and a mypy workflow would catch integration bugs early, especially around model I/O types.

  • [ ] Create .github/workflows/mypy.yml to run mypy on configs/, infer.py, train.py, and api_*.py files
  • [ ] Add py.typed marker to the package root
  • [ ] Start with type hints in api_240604.py (newer version) and configs/config.py as reference implementations
  • [ ] Configure mypy.ini to handle fairseq and torch stubs (add plugins for common ML libraries)
  • [ ] Document type hint conventions in CONTRIBUTING.md for future PRs

🌿Good first issues

Check the issue tracker.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 7ef1986 — Add warnings for files related to Onnx exports (#2385) (PriesiaMioShirakana)
  • 709bbba — optimize nsf inference (#2387) (yxlllc)
  • 1376ce7 — Fix onnx export & Support TensorRT (#2384) (PriesiaMioShirakana)
  • 9ae3af4 — 移除音频文件路径 Unicode 控制字符 (#2334) (huanzhijin)
  • 3548b4f — Update requirements-ipex.txt (RVC-Boss)
  • 1963b7d — Update requirements-win-for-realtime_vc_gui-dml.txt (RVC-Boss)
  • e954fa9 — Update requirements-win-for-realtime_vc_gui.txt (RVC-Boss)
  • cfc1e24 — fix: dlmodels.sh exits after download single file (#2175) (wellcoming)
  • b405aed — Translate Chinese text in English README (#2172) (Dafterfly)
  • 7e3965c — Update requirements.txt (#2203) (pato-pan)

🔒Security observations

  • High · Outdated and Vulnerable Dependencies — requirements.txt / dependency specifications. Multiple dependencies have known vulnerabilities or are significantly outdated: numpy==1.23.5 (EOL), numba==0.56.4 (outdated), librosa==0.9.1 (outdated), gradio==3.34.0 (outdated), fastapi==0.88 (outdated). These versions contain known CVEs and security patches that are not applied. Fix: Update all dependencies to latest stable versions. Implement automated dependency scanning with tools like Dependabot. Pin versions for reproducibility but use ranges like 'numpy>=1.23.5,<2.0' instead of exact old versions.
  • High · Insecure Pip Installation in Docker — Dockerfile, line with 'curl https://bootstrap.pypa.io/get-pip.py'. The Dockerfile uses 'curl https://bootstrap.pypa.io/get-pip.py | python3.9' which downloads and executes a script over HTTPS without verification. This is vulnerable to MITM attacks and is not recommended by the Python community. Fix: Use the system package manager to install pip: 'apt-get install -y python3.9-pip' or use 'python3.9 -m ensurepip' instead.
  • High · Missing .env File Security — .env. The repository includes a '.env' file in the file structure, which typically contains sensitive configuration. If secrets are stored here, they could be exposed in version control or Docker images. Fix: Ensure .env is in .gitignore (verify in .gitignore content). Use environment variables passed at runtime instead of storing in files. For Docker, use secrets management or environment variable injection at container start time.
  • High · Exposed Port Without Authentication — docker-compose.yml, Dockerfile (EXPOSE 7865). Port 7865 is exposed publicly via Docker (ports: 7865:7865) and the application appears to be a Gradio/FastAPI web interface. No authentication mechanism is evident, allowing unauthenticated access to voice conversion capabilities. Fix: Implement authentication (OAuth2, API keys, JWT). Use reverse proxy with authentication. Restrict port binding to localhost (127.0.0.1:7865:7865) in docker-compose.yml. Implement rate limiting and input validation.
  • High · Insecure FastAPI/Gradio Configuration — api_231006.py, api_240604.py, dependencies (fastapi==0.88). FastAPI==0.88 is outdated with known vulnerabilities. The application does not show evidence of security headers, CORS configuration, or input validation middleware that would protect against common web attacks. Fix: Update FastAPI to latest version. Implement CORS properly with specific origins. Add middleware for security headers (X-Content-Type-Options, X-Frame-Options, CSP). Validate all inputs and implement rate limiting.
  • Medium · Dockerfile Runs as Root — Dockerfile. The Dockerfile does not specify a USER directive, meaning the container runs as root. This increases the blast radius if the application is compromised. Fix: Add a non-root user: 'RUN useradd -m -u 1000 rvc && USER rvc' before the application starts. Ensure all files have correct permissions.
  • Medium · Unvetted External Model Downloads — assets/ directories, likely in main application code. The application downloads pre-trained models (hubert, rmvpe, pretrained_v2, uvr5_weights) from external sources. The .gitignore files in these directories suggest they're downloaded at runtime. No integrity verification (checksums) is evident. Fix: Implement checksum verification (SHA256) for all downloaded models. Use HTTPS for downloads. Cache models securely. Document model sources and maintain a manifest of approved models with versions.
  • Medium · Arbitrary File Upload Risk — api_231006.py, api_240604.py (presumed upload endpoints). Voice conversion system implies file upload functionality. Without visible input validation and file type checking, this could lead to arbitrary file upload vulnerabilities, especially with audio processing libraries. Fix: Implement strict file type validation

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.