Stable Diffusion Moment of Audio?? Ace-Step Audio Model Native Support in ComfyUI!
The ground-breaking audio generation model in the community!
Hi Comfy community! As many of you might know, a high-quality audio model just dropped this week. We would like to share that ComfyUI now supports Ace-Step natively!
ACE-Step is an open-source music generation model jointly developed by ACE Studio and StepFun. It generates various music genres, including General Songs, Instrumentals, and Experimental Inputs, all supported by multiple languages.
ACE-Step provides rich extensibility for the OSS community: Through fine-tuning techniques like LoRA and ControlNet, developers can customize the model according to their needs, whether audio editing, vocal synthesis, accompaniment production, voice cloning, or style transfer applications. The model is a meaningful milestone for the music/audio generation genre.
The model is released under the Apache-2.0 license and is free for commercial use. It also has good inference speed: the model synthesizes up to 4 minutes of music in just 20 seconds on an A100 GPU.
Get Started
Update ComfyUI or Desktop to the latest
Download the workflow below and drag it into ComfyUI.
Download the models as instructed and run!
Check our documentation for more detailed instructions!
Recently, we also added native supports for HiDream-E1 and Wan FLF FP8.
Hidream E1 Native Support
HiDream-E1 (image editing) is an interactive image editing model officially open-sourced by HiDream-ai on April 28, 2025, and is built on the HiDream-I1 model. The model is released under the MIT License ,supporting personal projects, scientific research, and commercial use.
You can now try the template in ComfyUI: Workflow → Browse Template → Image → HiDream E1, click and run! Or download the workflow below!
Example Outputs:
For more detailed tutorial please visit our documentation
Wan2.1 FLF2V FP8 Update
Following Wan2.1 FLF2V fp16 support, we recently uploaded the fp8 version here. Now you can run the model with less VRAM. Please note that this model works best with 720P resolution; a smaller resolution might not produce decent results.
See our Documentation for more Community resources and detailed instructions.
Enjoy generating!
"The model is released under the Apache-2.0 license and is free for commercial use. It also has good inference speed: the model synthesizes up to 4 minutes of music in just 20 seconds on an A100 GPU."
hmmm wonder if it works on 3090/ 4090, not sure many of us have a spare a100 lol
EDIT: works ok on a 3090, VERY fast. low quality audio but does do music and lyrics. VERY interesting! Will be keen to see where this goes, maybe next year I wont need my annual suno subscription.