ElevenLabs is now available in ComfyUI
World-class voice cloning, text-to-speech, and sound effects generation
We’re excited to announce that ElevenLabs is now available inside ComfyUI via Partner Nodes! This brings world-class voice AI directly into your node graph — no external tools, no juggling browser tabs, just drag, connect, and run.
Whether you’re building a podcast pipeline, adding voiceover to AI-generated video, isolating dialogue from noisy footage, or cloning a voice for a character, it all happens right on your canvas now.
ElevenLabs Nodes
🗣️ Text to Speech
Type a prompt, get a voice. Generate natural-sounding speech from any text input — perfect for voiceovers, narration, and automated audio pipelines. Pair it with your video generation nodes for a full end-to-end content workflow.
🔄 Speech to Speech
Feed in one voice, get back another. Transform the style, tone, or identity of a voice recording while keeping the original pacing and emotion. Great for dubbing, voice acting, and creative remixing.
📝 Speech to Text
Transcribe audio to text directly in your workflow. Use it to create subtitles, feed dialogue into an LLM node for analysis, or build audio-to-text-to-image pipelines that react to spoken content.
🎧 Voice Isolation
Got a noisy recording? This node isolates the voice from background noise, music, or ambient sound. Perfect for cleaning up field recordings or pulling clean dialogue out of complex audio scenes before further processing.
💬 Text to Dialogue
Generate multi-speaker conversations from a single text input. Assign different speakers, control the back-and-forth, and produce realistic dialogue scenes — ideal for podcasts, audiobooks, explainer content, or game dialogue.
🔊 Text to Sound Effects
Describe a sound and generate it. Explosions, footsteps, rain, sci-fi ambience — whatever your project needs. Perfect for adding atmospheric audio to video workflows, building soundscapes, or prototyping game audio without digging through sample libraries.
🎛️ Voice Selector
Browse and select from ElevenLabs’ library of premade voices. Pick the right tone, accent, and style for your project without any setup.
Why This Matters
Audio has been the missing piece in a lot of ComfyUI workflows. You could generate images, video, 3D assets, and text — but voice was always a separate step. With ElevenLabs as a Partner Node, you can now build truly multimodal pipelines:
Prompt → Image → Video → Voiceover — all in one graph
Audio cleanup → Transcription → LLM processing — no exports, no context switching
Generate dialogue → Lay it over generated video — end-to-end character pipelines
These nodes run in parallel with all your other Partner Nodes, so you can trigger multiple generations at once and iterate fast.
Text to Speech Examples
Children’s Story - The Lonely KSampler
Tongue Twister
Sound Effects Examples
Babbling brook, gentle stream
Police siren going by
Train station ambience
Get Started
Update ComfyUI or ComfyUI Desktop to the latest version.
Find the ElevenLabs nodes in the Node Library or Templates in the left sidebar.
Drop a node on your canvas and start creating.
As always, enjoy creating!

