Wan2.2 S2V in ComfyUI: Audio-Driven Video Generation from Static Images

Advanced Speech-to-Video with Text Control

Aug 29, 2025

We’re excited to announce that Wan2.2-S2V, the advanced audio-driven video generation model, is now natively supported in ComfyUI! This powerful AI model can transform static images and audio inputs into dynamic video content, supporting dialogue, singing, performance, and various creative content needs.

Model Highlights

Audio-Driven Video Generation: Transforms static images and audio into synchronized videos

Cinematic-Grade Quality: Generates film-quality videos with natural expressions and movements

Minute-Level Generation: Supports long-form video creation

Multi-Format Support: Works with full-body and half-body characters

Enhanced Motion Control: Generates actions and environments from text instructions

Getting Started

Update ComfyUI to the latest version (Desktop will be ready soon)
Access Workflows: Go to Workflow → Browse Templates → Video → Select Wan2.2 S2V workflow
Download the model as guided by the pop-up dialog
Follow the guide in the template, then run the workflow

Download Workflow

Example Outputs

Check our documentation for more details: https://docs.comfy.org/tutorials/video/wan/wan2-2-s2v

As always, enjoy creating!

ComfyUI Blog

Discussion about this post