Tencent-Hunyuan/HunyuanVideo

Item: Tencent-Hunyuan/HunyuanVideo
Rating: 3
Author: RepoPilot

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Mixed

Slowing — last commit 6mo ago

weakest axis

Use as dependencyConcerns

non-standard license (Other); no CI workflows detected

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 6mo ago
✓14 active contributors
✓Distributed ownership (top contributor 38% of recent commits)

Show all 8 evidence items →

✓Other licensed
✓Tests present
⚠Slowing — last commit 6mo ago
⚠Non-standard license (Other) — review terms
⚠No CI workflows detected

What would change the summary?

→Use as dependency Concerns → Mixed if: clarify license terms

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Forkable](https://repopilot.app/api/badge/tencent-hunyuan/hunyuanvideo?axis=fork)](https://repopilot.app/r/tencent-hunyuan/hunyuanvideo)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/tencent-hunyuan/hunyuanvideo on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: Tencent-Hunyuan/HunyuanVideo

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/Tencent-Hunyuan/HunyuanVideo shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Slowing — last commit 6mo ago

Last commit 6mo ago
14 active contributors
Distributed ownership (top contributor 38% of recent commits)
Other licensed
Tests present
⚠ Slowing — last commit 6mo ago
⚠ Non-standard license (Other) — review terms
⚠ No CI workflows detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live Tencent-Hunyuan/HunyuanVideo repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/Tencent-Hunyuan/HunyuanVideo.

What it runs against: a local clone of Tencent-Hunyuan/HunyuanVideo — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in Tencent-Hunyuan/HunyuanVideo | Confirms the artifact applies here, not a fork | | 2 | License is still Other | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 197 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>Tencent-Hunyuan/HunyuanVideo</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of Tencent-Hunyuan/HunyuanVideo. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/Tencent-Hunyuan/HunyuanVideo.git
#   cd HunyuanVideo
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of Tencent-Hunyuan/HunyuanVideo and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "Tencent-Hunyuan/HunyuanVideo(\\.git)?\\b" \\
  && ok "origin remote is Tencent-Hunyuan/HunyuanVideo" \\
  || miss "origin remote is not Tencent-Hunyuan/HunyuanVideo (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
  && ok "license is Other" \\
  || miss "license drift — was Other at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py" \\
  && ok "hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py" \\
  || miss "missing critical file: hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py"
test -f "hyvideo/modules/models.py" \\
  && ok "hyvideo/modules/models.py" \\
  || miss "missing critical file: hyvideo/modules/models.py"
test -f "hyvideo/vae/autoencoder_kl_causal_3d.py" \\
  && ok "hyvideo/vae/autoencoder_kl_causal_3d.py" \\
  || miss "missing critical file: hyvideo/vae/autoencoder_kl_causal_3d.py"
test -f "hyvideo/inference.py" \\
  && ok "hyvideo/inference.py" \\
  || miss "missing critical file: hyvideo/inference.py"
test -f "sample_video.py" \\
  && ok "sample_video.py" \\
  || miss "missing critical file: sample_video.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 197 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~167d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/Tencent-Hunyuan/HunyuanVideo"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

HunyuanVideo is a large-scale diffusion-based video generation model framework that converts text prompts into high-quality videos using a systematic approach combining a 3D VAE, text encoders, and a flow-matching diffusion backbone. It implements end-to-end inference pipelines in hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py with support for prompt rewriting, frame interpolation, and efficient causal 3D processing. Modular monorepo structure: hyvideo/modules/ contains reusable components (attention, embedding, normalization layers), hyvideo/vae/ implements the 3D causal VAE autoencoder, hyvideo/diffusion/pipelines/ orchestrates the inference pipeline, hyvideo/text_encoder/ handles prompt encoding, and top-level scripts (sample_video.py, gradio_server.py, hyvideo/inference.py) provide entry points.

👥Who it's for

ML researchers and practitioners building video generation systems, Tencent engineers extending the HunyuanVideo ecosystem, and developers integrating video synthesis into applications via the Hugging Face Diffusers API or local inference.

🌱Maturity & risk

Actively developed and production-ready: the main repo shows recent feature additions (HunyuanVideo-1.5 in Nov 2025, Avatar/Custom variants in May 2025), is integrated into Hugging Face Diffusers ecosystem, has pre-trained weights on Hugging Face, and runs a public playground. This is not experimental—it's a mature Tencent framework with multiple derivative projects.

Dependency risk is moderate: requires torch 2.6.0, diffusers 0.31.0, and transformers 4.46.3 (relatively recent but not cutting-edge), creating potential breaking-change exposure. Single-maintainer risk is low due to Tencent backing, but the repo itself lacks visible CI/test infrastructure in the file listing, making regression detection harder. GPU memory and compute demands are high for inference, restricting practical deployment.

Active areas of work

The codebase is actively maintained with recent releases: HunyuanVideo-1.5 (Nov 2025) focuses on efficiency improvements, HunyuanVideo-Avatar (May 2025) added audio-driven animation, HunyuanCustom (May 2025) introduced multimodal customization, and HunyuanVideo-I2V (Mar 2025) added image-to-video capability. Development is clearly moving beyond base text-to-video.

🚀Get running

git clone https://github.com/Tencent-Hunyuan/HunyuanVideo.git
cd HunyuanVideo
pip install -r requirements.txt
# For inference, download checkpoint from huggingface.co/tencent/HunyuanVideo
python sample_video.py --prompt "A penguin walking on ice" --output ./video.mp4

Daily commands:

# Text-to-video inference
python sample_video.py --prompt "your text prompt" --output video.mp4

# Web UI via Gradio
python gradio_server.py

# Programmatic inference
from hyvideo.inference import HunyuanVideoSampler
sampler = HunyuanVideoSampler(...)
video = sampler.sample(prompt="...", ...)

🗺️Map of the codebase

hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py — Core inference pipeline orchestrating the diffusion process; all video generation flows through this entry point.
hyvideo/modules/models.py — Defines the main HunyuanVideo transformer backbone and modular components; fundamental to understanding model architecture.
hyvideo/vae/autoencoder_kl_causal_3d.py — Implements the 3D VAE encoder/decoder for video tokenization; critical for latent space operations.
hyvideo/inference.py — High-level inference API and model loading logic; primary entry point for programmatic video generation.
sample_video.py — Main entry script demonstrating end-to-end video generation workflow; reference implementation for integration.
hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py — Flow matching scheduler controlling the diffusion timestep schedule; critical for generation quality.
hyvideo/config.py — Centralized configuration management for model parameters, paths, and generation settings.

🛠️How to make changes

Add a new text encoder variant

Register new model ID in constants (hyvideo/constants.py)
Implement encoder class or wrapper in text_encoder module (hyvideo/text_encoder/__init__.py)
Add tokenization logic in preprocess utilities (hyvideo/utils/preprocess_text_encoder_tokenizer_utils.py)
Update inference.py to load new encoder variant (hyvideo/inference.py)

Add a new attention mechanism

Implement attention class in modules (hyvideo/modules/attenion.py)
Reference new attention in HunyuanVideo model definition (hyvideo/modules/models.py)
Add unit tests for attention logic (tests/test_attention.py)

Add a new sampling strategy or scheduler

Create new scheduler class in diffusion/schedulers (hyvideo/diffusion/schedulers/__init__.py)
Integrate scheduler into pipeline (hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py)
Add CLI flag or config option to select scheduler (hyvideo/config.py)

Deploy with web UI improvements

Add new Gradio component or tab in web server (gradio_server.py)
Connect UI callbacks to inference.py functions (hyvideo/inference.py)

🔧Why these technologies

PyTorch 2.6.0 — Deep learning framework for model training and inference with built-in optimization and distributed support
Diffusers 0.31.0 — Provides standardized diffusion pipeline abstractions and scheduler implementations
Transformers 4.46.3 — Pre-trained text encoders (CLIP, etc.) for prompt embedding and semantic conditioning
OpenCV + ImageIO — Video I/O, frame processing, and codec handling for multi-format support
Gradio 5.0.0 — Rapid web UI development for demo and interactive deployment without frontend code
Accelerate 1.1.1 — Multi-GPU and distributed inference with minimal code changes

⚖️Trade-offs already made

3D Causal VAE instead of frame-by-frame encoding
- Why: Captures temporal coherence and reduces redundancy across frames
- Consequence: Higher memory footprint during VAE operations but better video quality and compression
Flow matching discrete scheduler vs. continuous DDPM
- Why: Flow matching provides better sample efficiency and quality with fewer steps
- Consequence: Requires custom scheduler implementation but achieves comparable quality at fewer steps
Single monolithic pipeline vs. modular micro-services
- Why: Simplifies deployment for end-users and researchers; easier reproducibility
- Consequence: All inference stages (VAE, text encoder, diffusion model) run in single process; limits horizontal scaling
FP8 quantization as optional module
- Why: Reduces memory and compute cost for inference while maintaining quality
- Consequence: Optional complexity; not enabled by default, requires explicit flag

🚫Non-goals (don't propose these)

Does not provide training code or data loading pipelines (inference-only codebase)
Does not include audio processing or synchronization
Does not support real-time streaming video generation (batch processing only)
Does not implement custom authentication or multi-tenant isolation
Not designed for mobile or edge deployment (GPU/VRAM intensive)

🪤Traps & gotchas

Model checkpoint downloads are not bundled—users must manually fetch weights from Hugging Face (tencent/HunyuanVideo). GPU memory requirements are high (typically 20-40GB for full inference); inference is optimized for A100 or H100 GPUs. The fp8_optimization.py module suggests FP8 quantization support but may require specific hardware/driver versions. Text encoder weights must be downloaded separately (tencent/HunyuanVideo). No visible environment variable setup (.env.example) for API keys or paths—users must infer correct directory structure. Gradio server at gradio_server.py lacks documented port/auth configuration.

🏗️Architecture

💡Concepts to learn

Flow Matching (Continuous Normalizing Flows) — HunyuanVideo uses flow-matching instead of DDPM/DPM-Solver; understanding this scheduler in scheduling_flow_match_discrete.py is essential for tweaking generation quality and inference speed.
Causal 3D Convolutions — The VAE in vae/unet_causal_3d_blocks.py uses causal masking to prevent temporal information leakage; this is non-obvious and critical for consistent frame generation.
Cross-Attention (Transformer Conditioning) — Text prompts are injected via cross-attention layers in modules/attenion.py; understanding this mechanism is key to extending the model for other conditioning inputs (images, audio).
Latent Diffusion (Latent Space Generation) — HunyuanVideo performs diffusion in the VAE latent space, not pixel space; this dramatically reduces compute. The VAE maps video frames to latents in autoencoder_kl_causal_3d.py.
Modulation Layers (Adaptive Layer Norm) — The modules/modulate_layers.py implements FiLM-style modulation for condition injection; this is a lightweight alternative to cross-attention for timestep and class conditioning.
FP8 Quantization (Float8 Precision) — The modules/fp8_optimization.py hints at 8-bit floating-point inference for speed/memory gains; understanding quantization tradeoffs is relevant for production deployment.
Positional Embeddings (Rotary / RoPE) — The modules/posemb_layers.py encodes spatial and temporal position information; this affects how the model understands frame ordering and spatial layout.

openai/whisper — Audio-to-text encoder used alongside HunyuanVideo-Avatar for conditioning on speech in audio-driven animation tasks.
huggingface/diffusers — HunyuanVideo is integrated as an official pipeline in Diffusers; this is the upstream dependency providing scheduler and pipeline base classes.
Tencent-Hunyuan/HunyuanVideo-I2V — Official sibling repo extending HunyuanVideo for image-to-video generation; shares the same backbone and VAE architecture.
Tencent-Hunyuan/HunyuanVideo-Avatar — Official extension adding audio-driven human animation on top of HunyuanVideo; demonstrates how to condition the base model on speech.
replicate/cog-hunyuan-video — Community containerization of HunyuanVideo for serverless inference; useful reference for deployment patterns and dependency resolution.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for VAE module (hyvideo/vae/)

The VAE module is critical for video encoding/decoding but lacks test coverage. Currently only hyvideo/modules/ has partial tests (test_attention.py). The VAE contains complex 3D causal operations (autoencoder_kl_causal_3d.py, unet_causal_3d_blocks.py) that need validation for tensor shape handling, latent space consistency, and frame causality constraints.

[ ] Create tests/test_vae.py with fixtures for sample input tensors
[ ] Add tests for AutoencoderKLCausal3D forward/backward passes with various input shapes
[ ] Test UNet3DBlocks for proper causal masking and feature map handling
[ ] Add integration test verifying encode-decode round-trip preserves video semantics
[ ] Test FP8 optimization compatibility (referenced in requirements but not validated in VAE context)

Add GitHub Actions CI workflow for multi-GPU and FP8 validation

The repo includes shell scripts for multi-GPU (run_sample_video_multigpu.sh) and FP8 inference (run_sample_video_fp8.sh) but has no automated testing. There's no CI directory or workflow files visible. This makes it easy to accidentally break these critical optimization paths during development.

[ ] Create .github/workflows/ci.yml with matrix testing for torch==2.6.0 compatibility
[ ] Add workflow step to validate hyvideo/modules/fp8_optimization.py against sample inference
[ ] Include test for scripts/run_sample_video_fp8.sh with mock checkpoint loading
[ ] Add linting for hyvideo/ modules using flake8 or ruff (common in transformers/diffusers ecosystems)
[ ] Test backward compatibility with requirements.txt versions (diffusers==0.31.0, transformers==4.46.3)

Add integration tests for diffusion pipeline and scheduler (hyvideo/diffusion/)

The pipeline_hunyuan_video.py and scheduling_flow_match_discrete.py are core inference logic but completely untested. The scheduler uses custom flow-matching discrete logic that differs from standard diffusers schedulers, creating high risk for regression. No test coverage exists for the end-to-end generation flow.

[ ] Create tests/test_diffusion_pipeline.py with mock models to test pipeline_hunyuan_video.py
[ ] Add tests for FlowMatchDiscreteScheduler timestep calculations and noise scaling
[ ] Test prompt encoding integration via hyvideo/text_encoder/ with actual tokenizer
[ ] Add test for gradient/inference-only modes with torch.no_grad() validation
[ ] Test edge cases: empty prompts, max sequence length, latent shape mismatches between VAE and pipeline

🌿Good first issues

Add unit tests for hyvideo/vae/unet_causal_3d_blocks.py—the 3D convolutional blocks lack test coverage, making refactoring risky. Tests should validate layer outputs match expected shapes for various input video resolutions.
Document the VAE latent space dimensions and compression ratio in a new ARCHITECTURE.md—currently only assets/3dvae.png visualizes the VAE, but no text explains how input video (e.g., 1280×720×121 frames) maps to latent tensors.
Implement missing error handling in sample_video.py—add validation for unsupported video formats, check GPU memory before loading models, and provide actionable error messages when checkpoints fail to download from Hugging Face.

⭐Top contributors

Click to expand

@ckczzj — 38 commits
@JacobKong — 18 commits
@TianQi-777 — 13 commits
@kathrinawu — 6 commits
@zhoudaquan — 6 commits

📝Recent commits