I'm using an NVIDIA 5070 Ti, but I used to run SDXL models on a 3060 Ti. I focused on speed improvements, using multiple LoRAs together and generating images with image-to-image (I2I).
Now, I’d love to use Flux with the same setup—or even better performance. I’ve been checking here every day, hoping for the official release of CUDA 12.8. Any idea when it might be available?
I'm using an NVIDIA 5070 Ti, but I used to run SDXL models on a 3060 Ti. I focused on speed improvements, using multiple LoRAs together and generating images with image-to-image (I2I).
Now, I’d love to use Flux with the same setup—or even better performance. I’ve been checking here every day, hoping for the official release of CUDA 12.8. Any idea when it might be available?
Any speed improvements for 40series card users?
It would be great to see the end result and speed to be honest, but as I understand the fp16 Hunyuan model fits in the VRAM.
Good for who have big pockets and can buy a 5090 which costs a fortune.
i made a speed test with the cuda 12.8 torch build, here are the results i got :
for Hunyuan : https://www.reddit.com/r/comfyui/comments/1ig53if/comment/malrmyx/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
for Flux Dev : https://www.reddit.com/r/comfyui/comments/1ifzblp/rtx_5090_fe_performance_on_comfyui_cuda_128_torch/
for SDXL : https://www.reddit.com/r/StableDiffusion/comments/1ifzial/comment/mal0bc3/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button