Wan2.2 S2V in ComfyUI: Audio-Driven Video Generation from Static Images
Advanced Speech-to-Video with Text Control
We’re excited to announce that Wan2.2-S2V, the advanced audio-driven video generation model, is now natively supported in ComfyUI! This powerful AI model can transform static images and audio inputs into dynamic video content, supporting dialogue, singing, performance, and various creative content needs.
Model Highlights
Audio-Driven Video Generation: Transforms static images and audio into synchronized videos
Cinematic-Grade Quality: Generates film-quality videos with natural expressions and movements
Minute-Level Generation: Supports long-form video creation
Multi-Format Support: Works with full-body and half-body characters
Enhanced Motion Control: Generates actions and environments from text instructions
Getting Started
Update ComfyUI to the latest version (Desktop will be ready soon)
Access Workflows: Go to Workflow → Browse Templates → Video → Select Wan2.2 S2V workflow
Download the model as guided by the pop-up dialog
Follow the guide in the template, then run the workflow
Example Outputs
Check our documentation for more details: https://docs.comfy.org/tutorials/video/wan/wan2-2-s2v
As always, enjoy creating!