Jump to content

[TrueNAS SCALE 22.12.3.2] GPU Passthrough to VM Hangs

Hello all. I am trying one final time to get TrueNAS working on this box before I throw in the towel and try unRAID or wait for the stealth NAS software company Linus keeps hinting about. I'm trying to set up a Windows 11 VM for use over Tailscale. VM settings are attached, and I believe I have everything right in BIOS with VT-D and such.

 

Without GPU passthrough the VM functions fine, but when I try to passthrough the dedicated GPU (GTX 1080) it hangs indefinitely at the "please wait" spinning wheel until I refresh the UI page. I have tried isolating the GPU, pinning vcpus, enabling/disabling Hyper-V enlightenments, even completely reinstalling TrueNAS SCALE, makes no difference.
I recently got the attached error message after making a new VM and trying to add the GPU before installing the OS, but all other times there's been no error message. Just hangs or crashes. Attached is the log file associated with the VM that got this error.
 

Sometimes it crashes the whole machine. A picture of the CLI on the box is attached when I tried to start a VM that had previously attempted a GPU, but no longer had a GPU enabled in the settings. I've found only one other thread with a similar issue and it had no responses. Anyhow I have absolutely no idea why this is happening so any help would be greatly appreciated. 

hung task.jpg

VM GPU Passthrough error (1).png

VM settings.png

VM_Win11_Log_File.log.txt

Link to comment
Share on other sites

Link to post
Share on other sites

On 7/8/2023 at 9:42 PM, UraniumEagle said:

Hello all. I am trying one final time to get TrueNAS working on this box before I throw in the towel and try unRAID or wait for the stealth NAS software company Linus keeps hinting about. I'm trying to set up a Windows 11 VM for use over Tailscale. VM settings are attached, and I believe I have everything right in BIOS with VT-D and such.

 

Without GPU passthrough the VM functions fine, but when I try to passthrough the dedicated GPU (GTX 1080) it hangs indefinitely at the "please wait" spinning wheel until I refresh the UI page. I have tried isolating the GPU, pinning vcpus, enabling/disabling Hyper-V enlightenments, even completely reinstalling TrueNAS SCALE, makes no difference.
I recently got the attached error message after making a new VM and trying to add the GPU before installing the OS, but all other times there's been no error message. Just hangs or crashes. Attached is the log file associated with the VM that got this error.
 

Sometimes it crashes the whole machine. A picture of the CLI on the box is attached when I tried to start a VM that had previously attempted a GPU, but no longer had a GPU enabled in the settings. I've found only one other thread with a similar issue and it had no responses. Anyhow I have absolutely no idea why this is happening so any help would be greatly appreciated. 

hung task.jpg

VM GPU Passthrough error (1).png

VM settings.png

VM_Win11_Log_File.log.txt 37.88 kB · 0 downloads

Have you checked ebay iommu group the card is in? 

I had similar problems when I tried with my 9900k system. I had to put the card in the bottom slot to get it to work. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, m9x3mos said:

Have you checked ebay iommu group the card is in? 

I had similar problems when I tried with my 9900k system. I had to put the card in the bottom slot to get it to work. 

I tried moving it to the PCI x8 slot and got the same exact error. How would I check the iommu group? 

image.png.955246a5dd2fa0581291812374efb9a1.png

Link to comment
Share on other sites

Link to post
Share on other sites

20 hours ago, UraniumEagle said:

I tried moving it to the PCI x8 slot and got the same exact error. How would I check the iommu group? 

image.png.955246a5dd2fa0581291812374efb9a1.png

You can use lspci via the command line to list everything out and then check the output. 

They will have addresses attached on grouping like xx.yy.zz

The gpu cannot be on anything else that has the sand yy if I remember correctly. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, m9x3mos said:

You can use lspci via the command line to list everything out and then check the output. 

They will have addresses attached on grouping like xx.yy.zz

The gpu cannot be on anything else that has the sand yy if I remember correctly. 

Hmm. If I'm reading this right everything below Serial Attached SCSI controller is on the same YY. How would I change that?

image.png

Link to comment
Share on other sites

Link to post
Share on other sites

22 hours ago, UraniumEagle said:

Hmm. If I'm reading this right everything below Serial Attached SCSI controller is on the same YY. How would I change that?

image.png

I think I got the grouping check wrong. I found this script which is worth a try. 

#!/bin/bash
shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 7/11/2023 at 9:12 PM, m9x3mos said:

I think I got the grouping check wrong. I found this script which is worth a try. 

#!/bin/bash
shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;

 

Hmm...It seems there's more than the VGA controller and audio device in the IOMMU group. Not sure if that's a problem though as they seem to all be PCI related? Can't find anything on how I'd go about changing that...

image.thumb.png.d947598de7e6bb73f1ac34eacb28a77c.png

Link to comment
Share on other sites

Link to post
Share on other sites

If you try to pass through your VGA card the host system is not able to communicate with the rest of the devices in the IOMMU group.

You can check your motherboard manual for the IOMMU mappings. Sometimes it is also mentioned in the block diagrams for your motherboard and may be variable depending on the inserted hardware.

 

Last time I had those issues it was caused by not all RAM slots being populated as groups got combined. Adding more RAM solved my problem. The other time I could move the device to an PCIe slot that was not in any other group.

Link to comment
Share on other sites

Link to post
Share on other sites

On 7/19/2023 at 5:49 AM, ISharkI said:

If you try to pass through your VGA card the host system is not able to communicate with the rest of the devices in the IOMMU group.

You can check your motherboard manual for the IOMMU mappings. Sometimes it is also mentioned in the block diagrams for your motherboard and may be variable depending on the inserted hardware.

 

Last time I had those issues it was caused by not all RAM slots being populated as groups got combined. Adding more RAM solved my problem. The other time I could move the device to an PCIe slot that was not in any other group.

Yea there isn't anything in the manual regarding IOMMU.

I ended up installing unRAID to see if it'd work but am unhappy with the way I have to have an array not just a zfs pool so want to switch back. I did get GPU passthrough working by enabling ACS override (which is a convenient dropdown in unRAID...common iX systems...), but I cannot find any good way of doing this on TrueNAS SCALE. I found this thread on the TrueNAS forums but cannot for the life me figure out how to get the ACS override set to downstream and the IOMMU groups to reflect that. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×