Get ready for a big new wave of open video model releases! Today we are excited to share ComfyUI’s native support for Wan2.1, the latest video generation model in town!
Wan2.1 is a series of 4 video generation models, including:
Text-to-video 14B: Supports both 480P and 720P
Image-to-video 14B 720P: Supports 720P
Image-to-video 14B 480P: Supports 480P
Text-to-video 1.3B: Supports 480P
Highlights of the Wan2.1 Family
Supports Consumer-grade GPUs: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without quantization).
Multiple Tasks: Wan2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation.
Visual Text Generation: Wan2.1 is the first video model capable of generating both Chinese and English text.
Powerful Video VAE: Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.
Get Started in ComfyUI
Update ComfyUI to the latest or download the Desktop app.
Download the following 4 files:
Choose one of the diffusion models → Place in
ComfyUI/models/diffusion_models
umt5_xxl_fp8_e4m3fn_scaled.safetensors → Place in
ComfyUI/models/text_encoders
clip_vision_h.safetensors → Place in
ComfyUI/models/clip_vision
wan_2.1_vae.safetensors → Place in
ComfyUI/models/vae
Use the example workflows or click the example videos below
Examples
Image-to-video 14B 720P Workflow
Image-to-video 14B 480P Workflow
Text-to-video 14B 720P Workflow
Text-to-video 1.3B 480P Workflow
Enjoy creating!
Just ran the t2i workflow on an RTX2060 with 6GB and it worked.
tbh we have had the hunyuan (bit i have spelled that wrong) model for a while now. Though that can not do image to video. Looking at the size of the model files I rather doubt this will work on my 3090 let alone standard consumer gpus