HunyuanVideo Native Support in ComfyUI

Dec 20, 2024

Prompt: On a busy Tokyo street, the camera descends to show the vibrant city. Modern buildings and shops line the street, with a neon-lit convenience store. The shot moves to a vending machine…..

We’re excited to announce that HunyuanVideo, a groundbreaking 13-billion-parameter open-source video foundation model, is now natively supported in ComfyUI!

Highlights of HunyuanVideo

1. Unified Image & Video Generation

A "Dual-stream to Single-stream" Transformer efficiently fuses text and visuals, enhancing motion consistency, image quality, and alignment.

2. Superior Text-Video Alignment

The MLLM text encoder outperforms CLIP and T5, offering better instruction following, detail capture, and complex reasoning.

3. Efficient Video Compression

A custom 3D VAE compresses videos into a compact latent space, preserving resolution and frame rate while reducing tokens.

4. Enhanced Prompt Control

The Prompt Rewrite model includes:

Normal Mode: Improves user intent interpretation.
Master Mode: Optimizes composition, lighting, and visual quality.

Getting Started with HunyuanVideo in ComfyUI

1. Text-to-video Example Workflow

You can seamlessly generate videos and still images with HunyuanVideo. Here’s how to get started:

Update to the latest version of ComfyUI or download the latest Desktop app.
Download the following model files:
- hunyuan_video_t2v_720p_bf16.safetensors → Place in ComfyUI/models/diffusion_models.
- clip_l.safetensors and llava_llama3_fp8_scaled.safetensors → Place in ComfyUI/models/text_encoders.
- hunyuan_video_vae_bf16.safetensors → Place in ComfyUI/models/vae.
Load the provided workflow JSON file into ComfyUI or drag and drop it into the interface.

2. Generate an Image Using the Same Workflow

This model can generate still images by setting the video length to 1.

To keep track of updates to the Hunyuan series of models, subscribe to our blog and example workflow page.

Enjoy your creation!

Nate from Black Mixture

Dec 22, 2024

Awesome job on this! Works flawlessly and I love this clean and easy to understand documentation. 🙌🏾

Expand full comment

KotatsuAi

Dec 24, 2024Edited

The problem with this workflow is that only after wasting a huge amount of time, the VAE Decode node will finally realize if the process fails due to insufficient VRAM, or succeeds. With an 8GB card, you'll get out of memory errors too often until you finally calibrate what settings your card can handle. The sample workflow will (supposedly) run on 12GB cards. With less VRAM, forget about getting that frame count in that resolution. Don't even bother to try this with less than 24 or 40GB VRAM.

17 more comments...

ComfyUI Blog

Discussion about this post