Jump to content

F@H GTX 1650 BAD WORK UNIT on Fedora

Anyone else running a GTX 1650 on Linux and having issues?

 

Its been running fine for months but suddenly I'm getting a lot of BAD WORK UNIT jobs and it will basically get disabled after running for a while.

 

The strange thing is, if I reboot it will trigger a few then work fine for hours.  I had it running for 6 hours last night, paused it when I got up, then when I tried to resume it immediately triggered a BAD WORK UNIT.  Something seems very off with the GPU itself or the drivers.

 

What's worrying is at first it only happened after leaving the PC up for a few days, but now its happening the same day with the last reboot being the worst yet (the first time I saw BAD WUs immediately after a clean boot).

Is there a particular benchmark or something that might identify if the card is going bad?  Though I get no graphics corruption, freezes or anything obvious, its just F@H that is failing.  Though I DO get Firefox crashing randomly.  Desktop compositing is disabled as I've always had stability problems with that on KDE Plasma no matter what GPU is used.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Alex Atkin UK said:

Anyone else running a GTX 1650 on Linux and having issues?

 

Its been running fine for months but suddenly I'm getting a lot of BAD WORK UNIT jobs and it will basically get disabled after running for a while.

 

The strange thing is, if I reboot it will trigger a few then work fine for hours.  I had it running for 6 hours last night, paused it when I got up, then when I tried to resume it immediately triggered a BAD WORK UNIT.  Something seems very off with the GPU itself or the drivers.

 

What's worrying is at first it only happened after leaving the PC up for a few days, but now its happening the same day with the last reboot being the worst yet (the first time I saw BAD WUs immediately after a clean boot).

Is there a particular benchmark or something that might identify if the card is going bad?  Though I get no graphics corruption, freezes or anything obvious, its just F@H that is failing.  Though I DO get Firefox crashing randomly.  Desktop compositing is disabled as I've always had stability problems with that on KDE Plasma no matter what GPU is used.

Alex,

 

I've had this happen on some of my Rigs (Ubuntu 18.04 LTS). I can't confirm, but I suspect what happened was the Nvidia support packages got updated and were out-of-sync with driver package (kernel module). I'm running driver 510.54 currently.

 

I eventually figured this out after attempting to run nvidia-smi and getting an error.

 

I have noticed that when I got the "Failed" slot message I often had to delete the slot, reboot, then add it again to clear it.

 

I'd suggest deleting the GPU slot; updating your drivers, rebooting & add the slot back.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

So after the reboot:

Quote

23:09:22:WU00:FS01:0x22:  Configuring platform CUDA
23:09:22:WU00:FS01:0x22:Failed to create CUDA context:
23:09:22:WU00:FS01:0x22:The requested CUDA device could not be loaded
23:09:22:WU00:FS01:0x22:Attempting to create OpenCL context:
23:09:22:WU00:FS01:0x22:  Configuring platform OpenCL
23:09:24:WU00:FS01:0x22:Failed to create OpenCL context:
23:09:24:WU00:FS01:0x22:Error uploading array atomIndexDevice: clEnqueueWriteBuffer (-4)
23:09:24:WU00:FS01:0x22:ERROR:125: Failed to create a GPU-enabled OpenMM Context.

Quote

23:10:42:WU00:FS01:0x22:Attempting to create CUDA context:
23:10:42:WU00:FS01:0x22:  Configuring platform CUDA
23:10:43:WU00:FS01:0x22:Failed to create CUDA context:
23:10:43:WU00:FS01:0x22:The requested CUDA device could not be loaded
23:10:43:WU00:FS01:0x22:Attempting to create OpenCL context:
23:10:43:WU00:FS01:0x22:  Configuring platform OpenCL
23:10:43:WU00:FS01:0x22:Failed to create OpenCL context:
23:10:43:WU00:FS01:0x22:Error initializing context: clCreateContext (-6)
23:10:43:WU00:FS01:0x22:ERROR:125: Failed to create a GPU-enabled OpenMM Context.

Then suddenly, one of the WUs worked and it continued to work until I paused.

 

Resuming later:

Quote

11:50:20:WU01:FS01:0x22:Failed to create CUDA context:
11:50:20:WU01:FS01:0x22:Error creating array posq: CUDA_ERROR_OUT_OF_MEMORY (2)

Its a bit strange as the log says 5GB of system RAM was available and I'm fairly sure at least 2GB of VRAM would have been.

 

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Alex Atkin UK said:

23:09:22:WU00:FS01:0x22:The requested CUDA device could not be loaded

This suggests you might have a permissions issue with the device

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×