Run Mochi in ComfyUI with consumer GPU
We are excited to announce that ComfyUI now has optimized support for Genmo’s latest model, Mochi! This integration brings state-of-the-art video generation capabilities to the ComfyUI community, even if you're working with consumer-grade GPUs.
The weights and architecture for Mochi 1 (480P) are open and available, and Mochi 1 HD is coming later this year.
Unpacking Mochi: Highlights of the model
1. State-of-the-art performance:
Mochi sets a new benchmark for open-source video generation, delivering high-fidelity motion. It also stands out for exceptional prompt adherence.
2. Apache 2.0 License
Mochi is released under Apache 2.0 license, making it a fantastic choice for developers and creators. This means you can use, modify, and integrate Mochi into your workflows without restrictive licensing hurdles.
3. Runs on consumer GPUs, fast
Mochi can now fit on consumer GPUs like a 4090. The Mochi node in ComfyUI supports multiple attention backends, letting it fit in <24GB of VRAM.
Try Mochi on ComfyUI
Try the following steps to run the Mochi model immediately with a standard workflow.
Update to the latest version of ComfyUI
Download Mochi weights(the diffusion models) into
models/diffusion_models
folderMake sure a text encoder is in your
models/clip
folderDownload VAE to:
ComfyUI/models/vae
Find the example workflow here and start your generation!
Low RAM Solution
As always, if you are in a situation with insufficient RAM, we recommend the following steps when you’re using the workflow above:
Switch encoder: Try fp8 scaled model as an alternative to t5xxl_fp16
Switch Diffusion model: Use the fp8_scaled Diffusion model as an alternative to the bf16 model.
A Simplified Way to Start
This time, we also prepared you with an all-in-one packaged checkpoint to skip the text encoder&VAE configuration. Try these steps:
Download the packaged checkpoint to the
models/checkpoint
folderRun a simplified video generation workflow as above.
Note: This checkpoint packaged the fp8_scaled Diffusion model and fp8 scaled model text encoder by default.
Enjoy your creation!