ACE-Step 1.5 is Now Available in ComfyUI

Playback speed

Share post at current time

Share from 0:00

0:00

Generate transcript

A transcript unlocks clips, previews, and editing.

ACE-Step 1.5 is Now Available in ComfyUI

Commercial-grade music generation on consumer hardware

Purz

Feb 03, 2026

We’re excited to share that ACE-Step 1.5 is now available in ComfyUI! This major update to the open-source music generation model brings commercial-grade quality to your local machine—generating full songs in under 10 seconds on consumer hardware.

Try on Comfy Cloud

What’s New in ACE-Step 1.5

ACE-Step 1.5 introduces a novel hybrid architecture that fundamentally changes how AI generates music. At its core, a Language Model acts as an omni-capable planner, transforming simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions.

Commercial-Grade Quality
On standard evaluation metrics, ACE-Step 1.5 achieves quality beyond most commercial music models, scoring 4.72 on musical coherence.
Blazing Fast Generation
Generate a full 4-minute song in ~1 second on a RTX 5090, or under 10 seconds on an RTX 3090.
Runs on Consumer Hardware
50+ Language Support
Strict adherence to prompts across 50+ languages, with particularly strong support for English, Chinese, Japanese, Korean, Spanish, German, French, Portuguese, Italian, and Russian.

Chain-of-Thought Planning

The model synthesizes metadata, lyrics, and captions via Chain-of-Thought reasoning to guide the diffusion process, resulting in more coherent long-form compositions.

LoRA Fine-Tuning

ACE-Step 1.5 supports lightweight personalization through LoRA training. With just a few songs—or a few dozen—you can train a LoRA that captures a specific style.

LoRAs let creators fine-tune toward a specific style using their own music. It learns from your songs and captures your sound. And because you run it locally, you own the LoRA and don’t have to worry about data leakage.

How It Works

ACE-Step 1.5 combines several architectural innovations:

Hybrid LM + DiT Architecture: A Language Model plans the song structure while a Diffusion Transformer (DiT) handles audio synthesis.
Distribution Matching Distillation: Leverage Z-Image's DMD2 to realise both fast generation (2 secs on an A100) and better quality.
Intrinsic Reinforcement Learning: Alignment is achieved through the model’s internal mechanisms, eliminating biases from external reward models.
Self-Learning Tokenizer: The audio tokenizer is learned during DiT training, to close the gap between generation and tokenizing

Try it on Comfy Cloud!

Coming Soon

ACE-Step 1.5 has a few more tricks up its sleeve. These aren’t yet supported in ComfyUI, but we have no doubt the community will figure it out.

Cover

Give the model any song as input along with a new prompt and lyrics, and it will reimagine the track in a completely different style.

Repaint

Sometimes a generated track is 90% perfect and 10% not quite right. Repaint fixes that. Select a segment, regenerate just that section, and the model stitches it back in while keeping everything else untouched.