Blow errors came out while running the WanAnimate worklow based clip model umt5-xxl-enc-bf16 (umt5-xxl-fp8-e4m3fn_scaled was ok but the video quality was bad).
#### It seems that models and clips are mixed and interconnected between SDXL Base, SDXL Refiner, SD1.x, and SD2.x. Please verify. ####
Are you using Comfy and 5090 in Windows or in Linux? I tried on windows and looks like there is no pytorch for Blackwell. Please advice if there is a chance to run Comfy and 5090 in Windows.
I'm using an NVIDIA 5070 Ti, but I used to run SDXL models on a 3060 Ti. I focused on speed improvements, using multiple LoRAs together and generating images with image-to-image (I2I).
Now, I’d love to use Flux with the same setup—or even better performance. I’ve been checking here every day, hoping for the official release of CUDA 12.8. Any idea when it might be available?
I'm using the sytem like below:
pytorch version: 2.5.1+cu124
xformers version: 0.0.28.post3
Device: cuda:0 NVIDIA GeForce RTX 4080
Python version: 3.11.9
ComfyUI version: 0.3.60
ComfyUI frontend version: 1.26.13
Blow errors came out while running the WanAnimate worklow based clip model umt5-xxl-enc-bf16 (umt5-xxl-fp8-e4m3fn_scaled was ok but the video quality was bad).
#### It seems that models and clips are mixed and interconnected between SDXL Base, SDXL Refiner, SD1.x, and SD2.x. Please verify. ####
Hi,
Are you using Comfy and 5090 in Windows or in Linux? I tried on windows and looks like there is no pytorch for Blackwell. Please advice if there is a chance to run Comfy and 5090 in Windows.
Thanks,
Try the portable windows version on GitHub releases!
I'm using an NVIDIA 5070 Ti, but I used to run SDXL models on a 3060 Ti. I focused on speed improvements, using multiple LoRAs together and generating images with image-to-image (I2I).
Now, I’d love to use Flux with the same setup—or even better performance. I’ve been checking here every day, hoping for the official release of CUDA 12.8. Any idea when it might be available?
Any speed improvements for 40series card users?
It would be great to see the end result and speed to be honest, but as I understand the fp16 Hunyuan model fits in the VRAM.
Good for who have big pockets and can buy a 5090 which costs a fortune.
i made a speed test with the cuda 12.8 torch build, here are the results i got :
for Hunyuan : https://www.reddit.com/r/comfyui/comments/1ig53if/comment/malrmyx/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
for Flux Dev : https://www.reddit.com/r/comfyui/comments/1ifzblp/rtx_5090_fe_performance_on_comfyui_cuda_128_torch/
for SDXL : https://www.reddit.com/r/StableDiffusion/comments/1ifzial/comment/mal0bc3/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button