ACE-Step 1.5 XL: Commercial-Grade Music Generation in ComfyUI
A 4B-parameter open-source music model that generates full songs in seconds — locally on consumer hardware
Music generation just got a serious upgrade. ACE-Step 1.5 XL brings a 4B-parameter Diffusion Transformer decoder to the ACE-Step framework, delivering audio quality that competes with commercial music models — and it runs locally on your GPU.
It comes in 3 flavors: xl-base for maximum versatility, xl-sft for peak audio quality, and xl-turbo for speed. All are commercially licensed under the MIT license and use legally compliant training data.
Dark Synthwave (Instrumental)
Melodic Dubstep (Female Vocal)
Ambient Electronic (Female Vocal)
Key Highlights
Commercial-Grade Quality — Evaluation metrics place output between Suno v4.5 and v5, with 4B parameters delivering richer audio than the 2B predecessors
Ultra-Fast Generation — Under 2 seconds per full song on an A100, under 10 seconds on an RTX 3090. xl-turbo cuts inference to just 8 steps (~6x faster than base/sft)
Flexible Duration — Generate anything from 10-second loops to full 10-minute compositions
1000+ Instruments and Styles — Fine-grained timbre description across a massive range of musical genres
50+ Language Lyrics — Prompt with lyrics for structure and style control in over 50 languages
Commercially Licensed — MIT license, trained on licensed music, royalty-free/public domain, and synthetic MIDI-to-Audio data
Pick Your Variant
All three XL models share the same 4B-parameter architecture.
XL-Base — Most versatile, highest diversity. For maximum creative range.
XL-SFT — Peak audio quality, some loss in diversity. For clean final outputs.
XL-Turbo— 8 steps, ~6x faster, no CFG. For fast iteration.
Getting Started
Download or update ComfyUI to the latest version, or visit Comfy Cloud
Open the Template Library and search for “ACE Step”
Select a workflow
Following the guide in the workflow to download the model
update prompt, then hit Run
As always, enjoy creating!



I downloaded the XL-base model. No satisfaction. Singer sounds like she's in a tin shower. Instruments seem like they have no coherence, compared to the same song on Suno.
even your provided samples don't sound very good to me.