The RX 7800 XT is overwhelmingly underutilized while using UNET node (GGUF file) in any workflow regardless of quantized model and ComfyUI version used.
The lastest version of ComfyUI works flawlessly when using .safetensor models and the GPU power draw is good at 272 watts. But this is not the case with GGUF file. When using GGUF in a workflow, the GPU power draw does not go beyond 120 watts and the speed is 500s/iteration which is completely unusable. I tried Adranaline v26.1.1 and v25.20.01.17 as well but to no avail. Please help me fix this.
Following details might help:
ComfyUI version v0.11
Workflow includes :
Wan 2.1 Vace 14B GGUF Q5_0
Steps : 15
Length : 81
FPS : 15
Sampler : UniPC
This problem exists regardless of quantized models, workflow and ComfyUI version used.
I was happy to saw this, but after clean install on AMD rocm drivers, Comfyui Desktop and Portable, every Workflow crashs. Iam going back to rocm 6.4 on WSL :/
I still try to get Comfy to work with my AMD card at Windows 11 at all. Portable starts, and looks fine. But in the middle of the process it simply quits. Comfy goes "pause, press any key to continue". No warning no error, nothing. Currently at installing the desktop version. In hope it behaves different.
What i completely miss is a detailed instruction how to set Comfy up with an AMD card. Rocm for example. Do i need to install it by myself? Or does it come with Comfy portable or desktop? The advice for Ubuntu is setting up an venv, install rocm, install Comfy ...
Huge win for AMD users who've been stuck with suboptimal performance! The automatic ROCm selection during installtion is a nice touch since manually configuring GPU backends can be a pain. I'm curious how this compares to DirectML perfomance on the same hardware, since that was the previous workaround. The cross-attention flag support in 7.1.1 should help a lot with memory-intensive workflows.
Should be a night-and-day performance improvement with ROCm vs DirectML. With DirectML, I had a W7800 go toe-to-toe with a GTX 1060 (not a typo) on Windows. You should get close to a 10x performance boost, at least.
I’ve been running some initial tests on my system, and it’s working very smoothly and perfectly stable with Python 3.12.11.
As a side note : I noticed a significant performance boost when running it on my manual ComfyUI setup with Python 3.13.9. I don't know if it is just the Python version.
I vaguely remember the comfyui README saying somewhere that some dependencies don't support py3.13 yet so py3.12 was recommended, is this still the case or does it work with no issues already?
I guess there's no "no issues" option for AMD users yet. For example, I haven't tested the above version for training yet (Flux FP16/BF16), but I have a lot of trouble with my py3.13.
That's not happening, it's an Nvidia proprietary format, RDNA shaders can't do anything with it. You really should use types that are supported by the GPU.
On the 7900XTX I can't even run FP8. BF16 FP16 and INT8 work tough.
The RX 7800 XT is overwhelmingly underutilized while using UNET node (GGUF file) in any workflow regardless of quantized model and ComfyUI version used.
The lastest version of ComfyUI works flawlessly when using .safetensor models and the GPU power draw is good at 272 watts. But this is not the case with GGUF file. When using GGUF in a workflow, the GPU power draw does not go beyond 120 watts and the speed is 500s/iteration which is completely unusable. I tried Adranaline v26.1.1 and v25.20.01.17 as well but to no avail. Please help me fix this.
Following details might help:
ComfyUI version v0.11
Workflow includes :
Wan 2.1 Vace 14B GGUF Q5_0
Steps : 15
Length : 81
FPS : 15
Sampler : UniPC
This problem exists regardless of quantized models, workflow and ComfyUI version used.
I was happy to saw this, but after clean install on AMD rocm drivers, Comfyui Desktop and Portable, every Workflow crashs. Iam going back to rocm 6.4 on WSL :/
Which GPU are you using?
I'm going with 7900XTX ROCm 7.1 preview driver windows, and it's lots better than WSL2. I finally got rid of the 800GB WSL2 EXT4.
Some notes if it can be of help, I don't use portable, I go with UV/git
https://github.com/OrsoEric/HOWTO-ComfyUI
Thanks for sharing, perhaps ill try this again with rocm7, but so far my WSL Instance runs smoothly.
Ive a 9070xt and take the same driver as you for the Windows test.
I still try to get Comfy to work with my AMD card at Windows 11 at all. Portable starts, and looks fine. But in the middle of the process it simply quits. Comfy goes "pause, press any key to continue". No warning no error, nothing. Currently at installing the desktop version. In hope it behaves different.
What i completely miss is a detailed instruction how to set Comfy up with an AMD card. Rocm for example. Do i need to install it by myself? Or does it come with Comfy portable or desktop? The advice for Ubuntu is setting up an venv, install rocm, install Comfy ...
If you're using AMD, these are important lines from comfy start to check if it is running correctly
....
Total VRAM 16368 MB, total RAM 65142 MB
pytorch version: 2.9.0+rocmsdk20251116
Set: torch.backends.cudnn.enabled = False for better AMD performance.
AMD arch: gfx1101
ROCm version: (7, 1)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 7800 XT : native
Yeah, got it working thanks. The trick is, under windows you need a special AMD Pytorch driver. It will not work with the standard AMD driver
You can find it here: https://www.amd.com/en/resources/support-articles/release-notes/RN-AMDGPU-WINDOWS-PYTORCH-7-1-1.html
Huge win for AMD users who've been stuck with suboptimal performance! The automatic ROCm selection during installtion is a nice touch since manually configuring GPU backends can be a pain. I'm curious how this compares to DirectML perfomance on the same hardware, since that was the previous workaround. The cross-attention flag support in 7.1.1 should help a lot with memory-intensive workflows.
Should be a night-and-day performance improvement with ROCm vs DirectML. With DirectML, I had a W7800 go toe-to-toe with a GTX 1060 (not a typo) on Windows. You should get close to a 10x performance boost, at least.
A huge thank you for providing such a great tool!
I’ve been running some initial tests on my system, and it’s working very smoothly and perfectly stable with Python 3.12.11.
As a side note : I noticed a significant performance boost when running it on my manual ComfyUI setup with Python 3.13.9. I don't know if it is just the Python version.
in seconds
Py3.13.9 Py3.12.11 #1...Py3.12.11 #2
1........89,43.........317,63.............176,56
2........87,9...........211,21............223,56
3........88,53..........201,74.............318,05
4........84,86.........189,76.............219,25
AVG..87,6975.....230,085...........239,39
%.......100,00 .....262,36.............272,97 %
system:
Win 11
AMD Ryzen 5 7500F 6-Core Processor
AMD Radeon RX 7800 XT
64GB RAM.
test workflow:
https://openart.ai/workflows/aura_111_111/fantasy-elf-sdxl-cinematic-nsfwsfw-portrait-workflow-hires-fix-vae-ksampler-workflow/JiDA2LC9sKQnvBUbNFRQ
I vaguely remember the comfyui README saying somewhere that some dependencies don't support py3.13 yet so py3.12 was recommended, is this still the case or does it work with no issues already?
I guess there's no "no issues" option for AMD users yet. For example, I haven't tested the above version for training yet (Flux FP16/BF16), but I have a lot of trouble with my py3.13.
Please support NVFP4 Models
That's not happening, it's an Nvidia proprietary format, RDNA shaders can't do anything with it. You really should use types that are supported by the GPU.
On the 7900XTX I can't even run FP8. BF16 FP16 and INT8 work tough.