YaoFANGUK/video-subtitle-remover

Item: YaoFANGUK/video-subtitle-remover
Rating: 5
Author: RepoPilot

基于AI的图片/视频硬字幕去除、文本水印去除，无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API，本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.

Healthy

Healthy across all four use cases

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 4w ago
✓3 active contributors
✓Apache-2.0 licensed

Show all 7 evidence items →

✓CI configured
✓Tests present
⚠Small team — 3 contributors active in recent commits
⚠Concentrated ownership — top contributor handles 68% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/yaofanguk/video-subtitle-remover)](https://repopilot.app/r/yaofanguk/video-subtitle-remover)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/yaofanguk/video-subtitle-remover on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: YaoFANGUK/video-subtitle-remover

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/YaoFANGUK/video-subtitle-remover shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

Last commit 4w ago
3 active contributors
Apache-2.0 licensed
CI configured
Tests present
⚠ Small team — 3 contributors active in recent commits
⚠ Concentrated ownership — top contributor handles 68% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live YaoFANGUK/video-subtitle-remover repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/YaoFANGUK/video-subtitle-remover.

What it runs against: a local clone of YaoFANGUK/video-subtitle-remover — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in YaoFANGUK/video-subtitle-remover | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | Last commit ≤ 56 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>YaoFANGUK/video-subtitle-remover</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of YaoFANGUK/video-subtitle-remover. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/YaoFANGUK/video-subtitle-remover.git
#   cd video-subtitle-remover
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of YaoFANGUK/video-subtitle-remover and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "YaoFANGUK/video-subtitle-remover(\\.git)?\\b" \\
  && ok "origin remote is YaoFANGUK/video-subtitle-remover" \\
  || miss "origin remote is not YaoFANGUK/video-subtitle-remover (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 56 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~26d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/YaoFANGUK/video-subtitle-remover"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Video-subtitle-remover (VSR) is an AI-powered tool that removes hard-coded subtitles and text watermarks from videos and images at lossless resolution by detecting text regions and inpainting them with multiple neural network backends (LaMa, ProPainter, STTN). It runs entirely locally without requiring third-party APIs, supporting both single-file and batch processing with automatic or user-defined subtitle region detection. Monolithic Python backend structured as backend/inpaint/ (multiple inpainting strategies: lama_inpaint.py, propainter_inpaint.py, sttn_*_inpaint.py), backend/inpaint/video/ (training core with dataset.py, trainer.py, model definitions), and backend/ffmpeg/ (platform-specific binaries). Entry point is backend/main.py with config in backend/config.py.

👥Who it's for

Video editors, content creators, and developers who need to remove burned-in subtitles or watermarks from video files without quality loss, and who prefer local processing over cloud-based solutions for privacy or cost reasons.

🌱Maturity & risk

Actively maintained with pre-built binaries for Windows (CPU/CUDA 11.8/12.6/12.8/DirectML), Docker support, and cross-platform compatibility (Windows/macOS/Linux). The project has structured CI/CD workflows, multiple hardware acceleration paths, and recent versioning (1.4.0+), indicating production-ready status with ongoing development.

Heavy dependency on PyTorch, PaddleOCR, and FFmpeg with specific version pinning (e.g., onnxruntime-directml 1.20.1 locked for Windows 10/11 compatibility per code comment); GPU memory requirements and CUDA version specificity create deployment friction. Single primary maintainer (YaoFANGUK) with Chinese-language documentation may slow community contributions.

Active areas of work

Recent work includes CUDA 12.8 support (build-windows-cuda-12.8.yml workflow), DirectML GPU acceleration path for AMD/Intel, and stabilization of multiple inpainting backends with version-specific dependency management. The project is actively releasing pre-built binaries and maintaining Docker images across hardware platforms.

🚀Get running

Clone and install: git clone https://github.com/YaoFANGUK/video-subtitle-remover.git && cd video-subtitle-remover. Create a conda environment from .condarc and install dependencies: pip install -r requirements.txt (or use pre-built binaries from Releases). Run: python backend/main.py -i <input_video> -o <output_video> or python backend/main.py -i <input_image> -o <output_image> for image mode.

Daily commands: CLI (primary): python backend/main.py -i input.mp4 -o output.mp4 (auto-detect subtitles) or with region: python backend/main.py -i input.mp4 -o output.mp4 --rect '[x1,y1,x2,y2]'. Docker: docker run -it --gpus all eritpchy/video-subtitle-remover:1.4.0-cuda12.6 python backend/main.py -i test/test.mp4 -o test/test_no_sub.mp4. Batch images: Specify directory instead of file. Check backend/main.py and backend/config.py for all CLI flags.

🗺️Map of the codebase

backend/main.py: Entry point: CLI argument parsing, orchestrates text detection, inpainting backend selection, and FFmpeg frame extraction/re-encoding.
backend/config.py: Configuration and hyperparameters: inpainting model paths, OCR settings, device selection (CUDA/CPU/DirectML), frame processing options.
backend/inpaint/lama_inpaint.py: LaMa (resolution-agnostic) inpainting implementation; likely the highest-quality backend for subtitle removal based on paper citations.
backend/inpaint/propainter_inpaint.py: ProPainter backend for video-aware inpainting with temporal consistency; critical for motion-aware subtitle removal.
backend/inpaint/sttn_det_inpaint.py: STTN with detection-guided inpainting; bridges fast detection and spatio-temporal inpainting for video.
backend/inpaint/video/core/trainer.py: Training loop for video inpainting models; needed to understand model training pipeline and loss computation.
backend/inpaint/video/core/dataset.py: Frame dataset loader for video training; defines how frames, masks, and flows are loaded and augmented.
.github/workflows/build-windows-cuda-12.8.yml: Example CI/CD pipeline showing dependency resolution, CUDA toolkit setup, and artifact packaging for Windows builds.

🛠️How to make changes

Adding inpainting backends: Implement interface in backend/inpaint/__init__.py and create module (e.g., backend/inpaint/new_model_inpaint.py). Text detection/OCR: Edit backend/config.py OCR settings or extend PaddleOCR usage in detection pipeline. Video processing: Modify backend/inpaint/video/core/dataset.py for frame I/O or backend/inpaint/video/core/trainer.py for training logic. Hardware acceleration: Add device-specific code in backend/inpaint/utils/utils.py or create new device handler (e.g., hardware_accelerator.py mentioned in comments).

🪤Traps & gotchas

CUDA version strictness: .condarc and Dockerfiles pin specific CUDA versions (11.8, 12.6, 12.8); using wrong version causes 'CUDA device not found' even if GPU is present. onnxruntime-directml locked to 1.20.1: Code comment warns against upgrading due to Windows 10/11 compatibility; upgrading silently breaks DirectML inference. FFmpeg bundling: Platform-specific ffmpeg binaries in backend/ffmpeg/win_x64/ (multiple exe files) are required; missing or outdated ffmpeg causes frame extraction to fail. GPU memory: Inpainting models (especially ProPainter) require 6–10 GB VRAM; CPU fallback is very slow. PaddleOCR model download: First run downloads multi-hundred-MB OCR weights on-the-fly; network interruption during setup breaks initialization. Path assumptions: Code may assume relative paths from repo root; running from different directories can break model/ffmpeg lookups.

💡Concepts to learn

Inpainting (image/video generative infilling) — Core technique VSR uses: after detecting subtitle regions, neural networks 'hallucinate' plausible content to fill voids. Understanding inpainting quality metrics (LPIPS, FID) is critical to evaluating backend choice.
Spatio-Temporal Consistency (video) — VSR's STTN and ProPainter backends enforce temporal coherence so inpainted frames don't flicker or drift between frames; essential concept for video-specific inpainting.
Optical Flow estimation — Tracked in backend/inpaint/video/core/trainer_flow_w_edge.py and loss.py; optical flow guides the network on object motion to maintain visual continuity when filling subtitle regions.
Mask-guided diffusion / conditional generation — LaMa and other backends use masked regions to condition the generative process, ensuring only subtitle areas are modified while preserving surrounding content.
Canny edge detection (computer vision) — Visible in backend/inpaint/model/canny/ (custom Canny implementation); used to preserve fine edges and text boundaries when computing inpainting masks, preventing over-smoothing.
Spectral Normalization (neural network training) — Found in backend/inpaint/utils/spectral_norm.py; stabilizes discriminator training in GANs used for video inpainting, improving convergence and output quality.
Platform-specific GPU abstraction (CUDA, DirectML, Metal, CPU) — VSR bundles multiple hardware backends; understanding device detection and fallback logic in backend/config.py is critical for cross-platform deployment.

YaoFANGUK/video-subtitle-extractor — Official companion tool (VSE) for extracting subtitles from videos; users often apply VSE to extract text before using VSR to remove it.
facebookresearch/detectron2 — Underlying object detection library potentially used by PaddleOCR for text region proposals; understanding Detectron2 helps optimize text detection.
openai/DALL-E-2 — Inspiration for inpainting quality; LaMa (used in VSR) was developed as competitor to DALL-E inpainting for resolution-agnostic removal.
PaddlePaddle/PaddleOCR — Direct dependency for text detection and recognition; fork or extend to improve subtitle detection accuracy specific to burned-in video subtitles.
gmalivenko/pytorch_stylegan2 — Similar local-first AI processing philosophy; reference for how to bundle pre-trained models and avoid cloud APIs.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for inpainting modules

The repo has multiple inpainting implementations (lama, opencv, propainter, sttn) in backend/inpaint/ but no visible test suite. This is critical for a production AI tool where different inpainting algorithms produce different quality results. Testing edge cases (empty masks, full masks, various resolutions) would catch regressions and improve code reliability.

[ ] Create tests/inpaint/ directory structure mirroring backend/inpaint/
[ ] Add unit tests for backend/inpaint/lama_inpaint.py covering mask validation and tensor outputs
[ ] Add tests for backend/inpaint/propainter_inpaint.py with various video frame dimensions
[ ] Add integration tests for backend/inpaint/video/core/dataset.py to verify frame loading and masking
[ ] Add GitHub Actions workflow to run pytest on all inpaint modules

Add platform-specific integration tests for FFmpeg wrapper

The repo bundles FFmpeg binaries for Windows (3 versions), macOS, and Linux (backend/ffmpeg/) but lacks tests verifying they work correctly on each platform. The multiple Windows executables suggest known compatibility issues. Tests would prevent regression when updating FFmpeg versions and validate the fs_manifest.csv integrity.

[ ] Create tests/ffmpeg/ directory with platform detection utilities
[ ] Add tests verifying backend/ffmpeg/{platform}/ffmpeg executes and returns valid version info
[ ] Add tests for Windows variant selection logic (ffmpeg_1.exe, ffmpeg_2.exe, ffmpeg_3.exe)
[ ] Add test to validate fs_manifest.csv checksum/integrity for each binary
[ ] Add GitHub Actions matrix build testing FFmpeg on Windows, macOS, and Linux runners

Create API documentation and type hints for backend/inpaint interface

The inpainting module is the core of this tool but lacks clear interface documentation. There are 4 different inpaint classes (lama, opencv, propainter, sttn) with likely different input/output signatures. Adding type hints and docstrings would help contributors understand how to add new inpainting algorithms and prevent integration errors.

[ ] Add comprehensive docstrings and type hints to backend/inpaint/init.py defining the inpaint interface contract
[ ] Add type hints to backend/inpaint/lama_inpaint.py, opencv_inpaint.py, propainter_inpaint.py methods
[ ] Create docs/INPAINT_API.md documenting the expected signature for custom inpaint implementations
[ ] Add return type validation tests in tests/ to ensure all inpaint modules return consistent output shapes
[ ] Document the mask input format and expected numpy/torch tensor requirements

🌿Good first issues

Add unit tests for backend/inpaint/utils/utils.py: The utilities module (device detection, tensor conversions, preprocessing) lacks test coverage. Create tests/test_utils.py with fixtures for CPU/GPU device mocking and test utility functions used across all inpainting backends.
Document OCR region configuration in README_en.md: The --rect '[x1,y1,x2,y2]' flag for custom subtitle regions is mentioned but lacks examples. Add a section with 2–3 concrete examples showing how to measure and pass coordinates for different subtitle placements (bottom-center, sidebar, etc.).
Extend batch processing to support recursive directory traversal: backend/main.py currently processes images in a flat directory; add --recursive flag to handle nested folder structures, matching tools like imagemagick. Modify backend/main.py to enumerate subdirectories when flag is set.

⭐Top contributors

Click to expand

@YaoFANGUK — 68 commits
@eritpchy — 31 commits
@Brikarl — 1 commits

📝Recent commits