YaoFANGUK/video-subtitle-remover
基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.
Healthy across all four use cases
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 4w ago
- ✓3 active contributors
- ✓Apache-2.0 licensed
Show all 7 evidence items →Show less
- ✓CI configured
- ✓Tests present
- ⚠Small team — 3 contributors active in recent commits
- ⚠Concentrated ownership — top contributor handles 68% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/yaofanguk/video-subtitle-remover)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/yaofanguk/video-subtitle-remover on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: YaoFANGUK/video-subtitle-remover
Generated by RepoPilot · 2026-05-07 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/YaoFANGUK/video-subtitle-remover shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across all four use cases
- Last commit 4w ago
- 3 active contributors
- Apache-2.0 licensed
- CI configured
- Tests present
- ⚠ Small team — 3 contributors active in recent commits
- ⚠ Concentrated ownership — top contributor handles 68% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live YaoFANGUK/video-subtitle-remover
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/YaoFANGUK/video-subtitle-remover.
What it runs against: a local clone of YaoFANGUK/video-subtitle-remover — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in YaoFANGUK/video-subtitle-remover | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | Last commit ≤ 56 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of YaoFANGUK/video-subtitle-remover. If you don't
# have one yet, run these first:
#
# git clone https://github.com/YaoFANGUK/video-subtitle-remover.git
# cd video-subtitle-remover
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of YaoFANGUK/video-subtitle-remover and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "YaoFANGUK/video-subtitle-remover(\\.git)?\\b" \\
&& ok "origin remote is YaoFANGUK/video-subtitle-remover" \\
|| miss "origin remote is not YaoFANGUK/video-subtitle-remover (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 56 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~26d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/YaoFANGUK/video-subtitle-remover"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Video-subtitle-remover (VSR) is an AI-powered tool that removes hard-coded subtitles and text watermarks from videos and images at lossless resolution by detecting text regions and inpainting them with multiple neural network backends (LaMa, ProPainter, STTN). It runs entirely locally without requiring third-party APIs, supporting both single-file and batch processing with automatic or user-defined subtitle region detection. Monolithic Python backend structured as backend/inpaint/ (multiple inpainting strategies: lama_inpaint.py, propainter_inpaint.py, sttn_*_inpaint.py), backend/inpaint/video/ (training core with dataset.py, trainer.py, model definitions), and backend/ffmpeg/ (platform-specific binaries). Entry point is backend/main.py with config in backend/config.py.
👥Who it's for
Video editors, content creators, and developers who need to remove burned-in subtitles or watermarks from video files without quality loss, and who prefer local processing over cloud-based solutions for privacy or cost reasons.
🌱Maturity & risk
Actively maintained with pre-built binaries for Windows (CPU/CUDA 11.8/12.6/12.8/DirectML), Docker support, and cross-platform compatibility (Windows/macOS/Linux). The project has structured CI/CD workflows, multiple hardware acceleration paths, and recent versioning (1.4.0+), indicating production-ready status with ongoing development.
Heavy dependency on PyTorch, PaddleOCR, and FFmpeg with specific version pinning (e.g., onnxruntime-directml 1.20.1 locked for Windows 10/11 compatibility per code comment); GPU memory requirements and CUDA version specificity create deployment friction. Single primary maintainer (YaoFANGUK) with Chinese-language documentation may slow community contributions.
Active areas of work
Recent work includes CUDA 12.8 support (build-windows-cuda-12.8.yml workflow), DirectML GPU acceleration path for AMD/Intel, and stabilization of multiple inpainting backends with version-specific dependency management. The project is actively releasing pre-built binaries and maintaining Docker images across hardware platforms.
🚀Get running
Clone and install: git clone https://github.com/YaoFANGUK/video-subtitle-remover.git && cd video-subtitle-remover. Create a conda environment from .condarc and install dependencies: pip install -r requirements.txt (or use pre-built binaries from Releases). Run: python backend/main.py -i <input_video> -o <output_video> or python backend/main.py -i <input_image> -o <output_image> for image mode.
Daily commands:
CLI (primary): python backend/main.py -i input.mp4 -o output.mp4 (auto-detect subtitles) or with region: python backend/main.py -i input.mp4 -o output.mp4 --rect '[x1,y1,x2,y2]'. Docker: docker run -it --gpus all eritpchy/video-subtitle-remover:1.4.0-cuda12.6 python backend/main.py -i test/test.mp4 -o test/test_no_sub.mp4. Batch images: Specify directory instead of file. Check backend/main.py and backend/config.py for all CLI flags.
🗺️Map of the codebase
- backend/main.py: Entry point: CLI argument parsing, orchestrates text detection, inpainting backend selection, and FFmpeg frame extraction/re-encoding.
- backend/config.py: Configuration and hyperparameters: inpainting model paths, OCR settings, device selection (CUDA/CPU/DirectML), frame processing options.
- backend/inpaint/lama_inpaint.py: LaMa (resolution-agnostic) inpainting implementation; likely the highest-quality backend for subtitle removal based on paper citations.
- backend/inpaint/propainter_inpaint.py: ProPainter backend for video-aware inpainting with temporal consistency; critical for motion-aware subtitle removal.
- backend/inpaint/sttn_det_inpaint.py: STTN with detection-guided inpainting; bridges fast detection and spatio-temporal inpainting for video.
- backend/inpaint/video/core/trainer.py: Training loop for video inpainting models; needed to understand model training pipeline and loss computation.
- backend/inpaint/video/core/dataset.py: Frame dataset loader for video training; defines how frames, masks, and flows are loaded and augmented.
- .github/workflows/build-windows-cuda-12.8.yml: Example CI/CD pipeline showing dependency resolution, CUDA toolkit setup, and artifact packaging for Windows builds.
🛠️How to make changes
Adding inpainting backends: Implement interface in backend/inpaint/__init__.py and create module (e.g., backend/inpaint/new_model_inpaint.py). Text detection/OCR: Edit backend/config.py OCR settings or extend PaddleOCR usage in detection pipeline. Video processing: Modify backend/inpaint/video/core/dataset.py for frame I/O or backend/inpaint/video/core/trainer.py for training logic. Hardware acceleration: Add device-specific code in backend/inpaint/utils/utils.py or create new device handler (e.g., hardware_accelerator.py mentioned in comments).
🪤Traps & gotchas
CUDA version strictness: .condarc and Dockerfiles pin specific CUDA versions (11.8, 12.6, 12.8); using wrong version causes 'CUDA device not found' even if GPU is present. onnxruntime-directml locked to 1.20.1: Code comment warns against upgrading due to Windows 10/11 compatibility; upgrading silently breaks DirectML inference. FFmpeg bundling: Platform-specific ffmpeg binaries in backend/ffmpeg/win_x64/ (multiple exe files) are required; missing or outdated ffmpeg causes frame extraction to fail. GPU memory: Inpainting models (especially ProPainter) require 6–10 GB VRAM; CPU fallback is very slow. PaddleOCR model download: First run downloads multi-hundred-MB OCR weights on-the-fly; network interruption during setup breaks initialization. Path assumptions: Code may assume relative paths from repo root; running from different directories can break model/ffmpeg lookups.
💡Concepts to learn
- Inpainting (image/video generative infilling) — Core technique VSR uses: after detecting subtitle regions, neural networks 'hallucinate' plausible content to fill voids. Understanding inpainting quality metrics (LPIPS, FID) is critical to evaluating backend choice.
- Spatio-Temporal Consistency (video) — VSR's STTN and ProPainter backends enforce temporal coherence so inpainted frames don't flicker or drift between frames; essential concept for video-specific inpainting.
- Optical Flow estimation — Tracked in
backend/inpaint/video/core/trainer_flow_w_edge.pyandloss.py; optical flow guides the network on object motion to maintain visual continuity when filling subtitle regions. - Mask-guided diffusion / conditional generation — LaMa and other backends use masked regions to condition the generative process, ensuring only subtitle areas are modified while preserving surrounding content.
- Canny edge detection (computer vision) — Visible in
backend/inpaint/model/canny/(custom Canny implementation); used to preserve fine edges and text boundaries when computing inpainting masks, preventing over-smoothing. - Spectral Normalization (neural network training) — Found in
backend/inpaint/utils/spectral_norm.py; stabilizes discriminator training in GANs used for video inpainting, improving convergence and output quality. - Platform-specific GPU abstraction (CUDA, DirectML, Metal, CPU) — VSR bundles multiple hardware backends; understanding device detection and fallback logic in
backend/config.pyis critical for cross-platform deployment.
🔗Related repos
YaoFANGUK/video-subtitle-extractor— Official companion tool (VSE) for extracting subtitles from videos; users often apply VSE to extract text before using VSR to remove it.facebookresearch/detectron2— Underlying object detection library potentially used by PaddleOCR for text region proposals; understanding Detectron2 helps optimize text detection.openai/DALL-E-2— Inspiration for inpainting quality; LaMa (used in VSR) was developed as competitor to DALL-E inpainting for resolution-agnostic removal.PaddlePaddle/PaddleOCR— Direct dependency for text detection and recognition; fork or extend to improve subtitle detection accuracy specific to burned-in video subtitles.gmalivenko/pytorch_stylegan2— Similar local-first AI processing philosophy; reference for how to bundle pre-trained models and avoid cloud APIs.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for inpainting modules
The repo has multiple inpainting implementations (lama, opencv, propainter, sttn) in backend/inpaint/ but no visible test suite. This is critical for a production AI tool where different inpainting algorithms produce different quality results. Testing edge cases (empty masks, full masks, various resolutions) would catch regressions and improve code reliability.
- [ ] Create tests/inpaint/ directory structure mirroring backend/inpaint/
- [ ] Add unit tests for backend/inpaint/lama_inpaint.py covering mask validation and tensor outputs
- [ ] Add tests for backend/inpaint/propainter_inpaint.py with various video frame dimensions
- [ ] Add integration tests for backend/inpaint/video/core/dataset.py to verify frame loading and masking
- [ ] Add GitHub Actions workflow to run pytest on all inpaint modules
Add platform-specific integration tests for FFmpeg wrapper
The repo bundles FFmpeg binaries for Windows (3 versions), macOS, and Linux (backend/ffmpeg/) but lacks tests verifying they work correctly on each platform. The multiple Windows executables suggest known compatibility issues. Tests would prevent regression when updating FFmpeg versions and validate the fs_manifest.csv integrity.
- [ ] Create tests/ffmpeg/ directory with platform detection utilities
- [ ] Add tests verifying backend/ffmpeg/{platform}/ffmpeg executes and returns valid version info
- [ ] Add tests for Windows variant selection logic (ffmpeg_1.exe, ffmpeg_2.exe, ffmpeg_3.exe)
- [ ] Add test to validate fs_manifest.csv checksum/integrity for each binary
- [ ] Add GitHub Actions matrix build testing FFmpeg on Windows, macOS, and Linux runners
Create API documentation and type hints for backend/inpaint interface
The inpainting module is the core of this tool but lacks clear interface documentation. There are 4 different inpaint classes (lama, opencv, propainter, sttn) with likely different input/output signatures. Adding type hints and docstrings would help contributors understand how to add new inpainting algorithms and prevent integration errors.
- [ ] Add comprehensive docstrings and type hints to backend/inpaint/init.py defining the inpaint interface contract
- [ ] Add type hints to backend/inpaint/lama_inpaint.py, opencv_inpaint.py, propainter_inpaint.py methods
- [ ] Create docs/INPAINT_API.md documenting the expected signature for custom inpaint implementations
- [ ] Add return type validation tests in tests/ to ensure all inpaint modules return consistent output shapes
- [ ] Document the mask input format and expected numpy/torch tensor requirements
🌿Good first issues
- Add unit tests for
backend/inpaint/utils/utils.py: The utilities module (device detection, tensor conversions, preprocessing) lacks test coverage. Createtests/test_utils.pywith fixtures for CPU/GPU device mocking and test utility functions used across all inpainting backends. - Document OCR region configuration in README_en.md: The
--rect '[x1,y1,x2,y2]'flag for custom subtitle regions is mentioned but lacks examples. Add a section with 2–3 concrete examples showing how to measure and pass coordinates for different subtitle placements (bottom-center, sidebar, etc.). - Extend batch processing to support recursive directory traversal:
backend/main.pycurrently processes images in a flat directory; add--recursiveflag to handle nested folder structures, matching tools like imagemagick. Modifybackend/main.pyto enumerate subdirectories when flag is set.
⭐Top contributors
Click to expand
Top contributors
- @YaoFANGUK — 68 commits
- @eritpchy — 31 commits
- @Brikarl — 1 commits
📝Recent commits
Click to expand
Recent commits
5b780e3— Document solution for Mac 'bad CPU type' error (YaoFANGUK)577b64a— Add Trendshift badge to README (YaoFANGUK)c63f6b5— Update README.md (YaoFANGUK)b6644f3— Add Trendshift badge to README (YaoFANGUK)c189f83— Add subtitle extraction feature and documentation link (YaoFANGUK)3f3b061— 修复拖动进度条时偶现闪退:video_cap线程安全保护 (YaoFANGUK)bdf333c— 修复拖动进度条时偶现闪退:video_cap线程安全保护 (YaoFANGUK)0bae013— 优化字幕检测算法、添加多语言翻译支持 (YaoFANGUK)93d822d— 修复LAMA模式100%卡死:帧区间扩展超出视频总帧数导致FramePrefetcher死锁 (YaoFANGUK)99788e8— 图片模式锁定LAMA算法、未选区时全屏处理 (YaoFANGUK)
🔒Security observations
- High · Outdated and Potentially Vulnerable Dependencies —
requirements.txt / dependencies specification. Several dependencies have known vulnerabilities or are significantly outdated. Notably: opencv-python==4.11.0.86 (latest is 4.8+), Pillow (pinned version not specified, may be outdated), scipy (no version specified), and requests (no version specified). These unspecified/pinned versions may contain known CVEs. Fix: 1. Pin all dependencies to specific, recent versions. 2. Run 'pip-audit' or 'safety check' to identify known vulnerabilities. 3. Implement automated dependency scanning in CI/CD pipeline. 4. Use version constraints like 'scipy>=1.10.0' instead of leaving unspecified. - High · Embedded Binary Executables in Repository —
backend/ffmpeg/. Multiple FFmpeg binary executables are committed directly to the repository (backend/ffmpeg/win_x64/ffmpeg_*.exe, backend/ffmpeg/linux_x64/ffmpeg, backend/ffmpeg/macos/ffmpeg). This creates security risks: difficult to verify integrity, potential supply chain attacks, and binaries may contain vulnerabilities that are hard to update. Fix: 1. Remove binary files from git history using BFG or git-filter-branch. 2. Download FFmpeg binaries at build/installation time from official sources. 3. Verify binary integrity using checksums/GPG signatures. 4. Use Docker to manage dependencies instead of committing binaries. - High · Unspecified PyTorch/TorchVision Versions —
requirements.txt (comments), Dockerfile. The requirements indicate PyTorch and TorchVision are installed separately (see Dockerfile comments), but no version pinning is visible. This creates non-deterministic builds and potential compatibility issues with vulnerable versions. Fix: 1. Explicitly pin torch and torchvision versions in requirements.txt or Dockerfile. 2. Test against known good versions. 3. Include a separate requirements-lock.txt with exact versions for reproducible builds. - Medium · Potential Path Traversal in Media Processing —
backend/main.py, backend/inpaint/. The application processes user-supplied video/image files with multiple backend modules (OpenCV, FFmpeg, PaddleOCR). Without visible input validation in the file structure, there's a risk of path traversal attacks if file paths are not properly sanitized. Fix: 1. Implement strict input validation for file paths. 2. Use os.path.basename() and avoid allowing directory traversal sequences ('..', '//'). 3. Use absolute paths and validate against a whitelist of allowed directories. 4. Implement proper error handling without exposing system paths. - Medium · Uncontrolled Resource Consumption —
backend/inpaint/video/, backend/inpaint/propainter_inpaint.py. Video and image processing operations (especially with models like ProPainter, STTN) can consume significant memory and CPU. No visible rate limiting, timeout mechanisms, or resource quotas in the provided code structure. This could lead to DoS vulnerabilities. Fix: 1. Implement file size limits for uploads. 2. Set timeouts for processing operations. 3. Use resource limits (ulimit, cgroups in Docker). 4. Implement queue mechanisms for long-running tasks. 5. Add memory usage monitoring and graceful degradation. - Medium · Hardcoded Configuration and Potential Secrets —
backend/config.py, backend/interface/*.ini. Backend configuration files (backend/config.py) and interface configs exist but content is not visible. INI files in backend/interface/ could contain sensitive configuration. No .env.example or documentation on handling secrets is apparent. Fix: 1. Never hardcode API keys, credentials, or sensitive paths. 2. Use environment variables for all sensitive configuration. 3. Create .env.example file documenting required environment variables. 4. Add .env to .gitignore. 5. Use a secrets management library if needed. - Medium · PaddleOCR Remote Model Downloads —
backend/models/, paddleocr dependency. PaddleOCR (paddleocr==3.4.0) downloads pre-trained models from remote sources. Without visible integrity verification or pinned model versions, this presents a supply chain attack vector. Fix: undefined
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.