ElevenLabs is now available in ComfyUI

World-class voice cloning, text-to-speech, and sound effects generation

Mar 07, 2026

We’re excited to announce that ElevenLabs is now available inside ComfyUI via Partner Nodes! This brings world-class voice AI directly into your node graph — no external tools, no juggling browser tabs, just drag, connect, and run.

Whether you’re building a podcast pipeline, adding voiceover to AI-generated video, isolating dialogue from noisy footage, or cloning a voice for a character, it all happens right on your canvas now.

ElevenLabs Nodes

🗣️ Text to Speech

Type a prompt, get a voice. Generate natural-sounding speech from any text input — perfect for voiceovers, narration, and automated audio pipelines. Pair it with your video generation nodes for a full end-to-end content workflow.

Try Text to Speech

🔄 Speech to Speech

Feed in one voice, get back another. Transform the style, tone, or identity of a voice recording while keeping the original pacing and emotion. Great for dubbing, voice acting, and creative remixing.

Try Speech to Speech

📝 Speech to Text

Transcribe audio to text directly in your workflow. Use it to create subtitles, feed dialogue into an LLM node for analysis, or build audio-to-text-to-image pipelines that react to spoken content.

Try Speech to Text

🎧 Voice Isolation

Got a noisy recording? This node isolates the voice from background noise, music, or ambient sound. Perfect for cleaning up field recordings or pulling clean dialogue out of complex audio scenes before further processing.

Try Voice Isolation

💬 Text to Dialogue

Generate multi-speaker conversations from a single text input. Assign different speakers, control the back-and-forth, and produce realistic dialogue scenes — ideal for podcasts, audiobooks, explainer content, or game dialogue.

Try Text to Dialogue

🔊 Text to Sound Effects

Describe a sound and generate it. Explosions, footsteps, rain, sci-fi ambience — whatever your project needs. Perfect for adding atmospheric audio to video workflows, building soundscapes, or prototyping game audio without digging through sample libraries.

Try Text to Sound Effects

🎛️ Voice Selector

Browse and select from ElevenLabs’ library of premade voices. Pick the right tone, accent, and style for your project without any setup.

Why This Matters

Audio has been the missing piece in a lot of ComfyUI workflows. You could generate images, video, 3D assets, and text — but voice was always a separate step. With ElevenLabs as a Partner Node, you can now build truly multimodal pipelines:

Prompt → Image → Video → Voiceover — all in one graph
Audio cleanup → Transcription → LLM processing — no exports, no context switching
Generate dialogue → Lay it over generated video — end-to-end character pipelines

These nodes run in parallel with all your other Partner Nodes, so you can trigger multiple generations at once and iterate fast.

Text to Speech Examples

Children’s Story - The Lonely KSampler