This year starts off with a great open model release from Nvidia who released their confusingly marketed Cosmos family of models a few days ago.
These models which Nvidia calls "World Models" are actually extremely good SOTA video models. Currently ComfyUI supports specifically the 7B and 14B text to video and image to video diffusion models.
For most users I recommend the 7B models. These ones should fit on a 24GB GPU at full 16 bit precision without offloading but will also work on a 12GB GPU with the automatic ComfyUI weight offloading.
This new release also comes with a new sampler available now in your favorite sampler node: res_multistep which was used by Nvidia in their Cosmos implementation, this sampler can be used with every model supported by ComfyUI and I heard it also gives good results on hunyuan video.
What makes the Nvidia Cosmos the best open video models so far:
Their VAE is by the most compute/memory efficent video VAE yet. Their VAE is so efficent that you can encode/decode a 1280x704 121 frame video on a 12GB vram GPU without any tiling tricks while being very high quality. This makes it a massive ~50x more memory efficient than the hunyuan video VAE.
Non distilled: negative prompts will work and should be easier to train than distilled models like hunyuan video.
Image to video that works very well and can be controlled by a prompt. The image to video model behaves like an inpainting model so you can do things like generate from the last frame instead of the first frame or generate the video between two images.
This model will always make a video with movement if you generate the required 121 frames. I have never seen it generate a video without movement.
Some downsides:
The model really likes 121 frames and starts breaking if you generate less or more frames.
The lowest resolution the model can handle is 704x704.
Long prompts (a few sentences) are required. The model will not follow the prompt if it is too short.
It’s slow. It takes over 10 minutes to generate a 1280x704 121 frame video on a 4090 (perfect for heating your room in winter)
For basic workflows and examples see the: Nvidia Cosmos examples page
I’ll leave you with a few examples of what Cosmos can do:
As a reminder you can check the Nvidia Cosmos examples page for workflows.
For another piece of confusing marketing make sure to check out our 2 year anniversary post where we compare ComfyUI to an operating system:
“The best model”... not really. Hunyuan far surpasses it in terms of rendering quality, speed, flexibility, lora support, etc... If you're using the Kijai's wrapper you can go even further with optimizations. Not to mention the forthcoming arrival of their I2V version. I think this model will soon be forgotten unless they come out with a new more accomplished version.