We’re excited to announce that HunyuanVideo, a groundbreaking 13-billion-parameter open-source video foundation model, is now natively supported in ComfyUI!
Highlights of HunyuanVideo
1. Unified Image & Video Generation
A "Dual-stream to Single-stream" Transformer efficiently fuses text and visuals, enhancing motion consistency, image quality, and alignment.
2. Superior Text-Video Alignment
The MLLM text encoder outperforms CLIP and T5, offering better instruction following, detail capture, and complex reasoning.
3. Efficient Video Compression
A custom 3D VAE compresses videos into a compact latent space, preserving resolution and frame rate while reducing tokens.
4. Enhanced Prompt Control
The Prompt Rewrite model includes:
Normal Mode: Improves user intent interpretation.
Master Mode: Optimizes composition, lighting, and visual quality.
Getting Started with HunyuanVideo in ComfyUI
1. Text-to-video Example Workflow
You can seamlessly generate videos and still images with HunyuanVideo. Here’s how to get started:
Update to the latest version of ComfyUI or download the latest Desktop app.
Download the following model files:
hunyuan_video_t2v_720p_bf16.safetensors → Place in
ComfyUI/models/diffusion_models
.clip_l.safetensors and llava_llama3_fp8_scaled.safetensors → Place in
ComfyUI/models/text_encoders
.hunyuan_video_vae_bf16.safetensors → Place in
ComfyUI/models/vae
.
Load the provided workflow JSON file into ComfyUI or drag and drop it into the interface.
2. Generate an Image Using the Same Workflow
This model can generate still images by setting the video length to 1.
To keep track of updates to the Hunyuan series of models, subscribe to our blog and example workflow page.
Enjoy your creation!
Awesome job on this! Works flawlessly and I love this clean and easy to understand documentation. 🙌🏾
The problem with this workflow is that only after wasting a huge amount of time, the VAE Decode node will finally realize if the process fails due to insufficient VRAM, or succeeds. With an 8GB card, you'll get out of memory errors too often until you finally calibrate what settings your card can handle. The sample workflow will (supposedly) run on 12GB cards. With less VRAM, forget about getting that frame count in that resolution. Don't even bother to try this with less than 24 or 40GB VRAM.