Jump to content

Think my GPU has messed the bed (hopefully solved)

Marbo

I have a Gigabyte 3090 vision. I took a gamble on a used one (probably ex mining) about 10 months ago, it now crashes when using ove around 6GB of vram. It's not been overclocked by me and is used for image rendering.

It doesn't crash my pc it just stops the render after a short time and shows vram errors in the render log.

 

Apart from rendering large vram usage scenes it seems to work fine. It will run Time spy, Heavan and the hairy doughnut without problems.

Is there a diagnostic tool I can use to test it?

Any ideas, is it salvagable or am I listing it on ebay for spares?

 

Thanks

Link to comment
Share on other sites

Link to post
Share on other sites

50 minutes ago, Marbo said:

it now crashes when using ove around 6GB of vram.

That's a very weird way for it to die. There are four power phases for the VRAM which would split it into 6gb "chunks" that way, but it seems odd that 3 of the 4 would die at the same time. 

I'm not actually trying to be as grumpy as it seems.

I will find your mentions of Ikea or Gnome and I will /s post. 

Project Hot Box

CPU 13900k, Motherboard Gigabyte Aorus Elite AX, RAM CORSAIR Vengeance 4x16gb 5200 MHZ, GPU Zotac RTX 4090 Trinity OC, Case Fractal Pop Air XL, Storage Sabrent Rocket Q4 2tbCORSAIR Force Series MP510 1920GB NVMe, CORSAIR FORCE Series MP510 960GB NVMe, PSU CORSAIR HX1000i, Cooling Corsair XC8 CPU block, Bykski GPU block, 360mm and 280mm radiator, Displays Odyssey G9, LG 34UC98-W 34-Inch,Keyboard Mountain Everest Max, Mouse Mountain Makalu 67, Sound AT2035, Massdrop 6xx headphones, Go XLR 

Oppbevaring

CPU i9-9900k, Motherboard, ASUS Rog Maximus Code XI, RAM, 48GB Corsair Vengeance LPX 32GB 3200 mhz (2x16)+(2x8) GPUs Asus ROG Strix 2070 8gb, PNY 1080, Nvidia 1080, Case Mining Frame, 2x Storage Samsung 860 Evo 500 GB, PSU Corsair RM1000x and RM850x, Cooling Asus Rog Ryuo 240 with Noctua NF-12 fans

 

Why is the 5800x so hot?

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, IkeaGnome said:

That's a very weird way for it to die. There are four power phases for the VRAM which would split it into 6gb "chunks" that way, but it seems odd that 3 of the 4 would die at the same time. 

 you think this could be a PSU issue?

 

I have a 650w psu, I know this isn't recomended for a 3090 but the rest of the system sips power and the gpu shows around 300w when running at 100%. The whole system when rendering is around 450w max and its been running fine until recently.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Marbo said:

 you think this could be a PSU issue?

 

I have a 650w psu, I know this isn't recomended for a 3090 but the rest of the system sips power and the gpu shows around 300w when running at 100%. The whole system when rendering is around 450w max and its been running fine until recently.

What exact PSU and what are the rest of your system specs? There are ways to diagnose with software, but I'm not completely sure it would work here. Could be worth a shot if everything else seems okay. 

Does the GPU artifact or anything like that before it throws errors?

I'm not actually trying to be as grumpy as it seems.

I will find your mentions of Ikea or Gnome and I will /s post. 

Project Hot Box

CPU 13900k, Motherboard Gigabyte Aorus Elite AX, RAM CORSAIR Vengeance 4x16gb 5200 MHZ, GPU Zotac RTX 4090 Trinity OC, Case Fractal Pop Air XL, Storage Sabrent Rocket Q4 2tbCORSAIR Force Series MP510 1920GB NVMe, CORSAIR FORCE Series MP510 960GB NVMe, PSU CORSAIR HX1000i, Cooling Corsair XC8 CPU block, Bykski GPU block, 360mm and 280mm radiator, Displays Odyssey G9, LG 34UC98-W 34-Inch,Keyboard Mountain Everest Max, Mouse Mountain Makalu 67, Sound AT2035, Massdrop 6xx headphones, Go XLR 

Oppbevaring

CPU i9-9900k, Motherboard, ASUS Rog Maximus Code XI, RAM, 48GB Corsair Vengeance LPX 32GB 3200 mhz (2x16)+(2x8) GPUs Asus ROG Strix 2070 8gb, PNY 1080, Nvidia 1080, Case Mining Frame, 2x Storage Samsung 860 Evo 500 GB, PSU Corsair RM1000x and RM850x, Cooling Asus Rog Ryuo 240 with Noctua NF-12 fans

 

Why is the 5800x so hot?

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, IkeaGnome said:

What exact PSU and what are the rest of your system specs? There are ways to diagnose with software, but I'm not completely sure it would work here. Could be worth a shot if everything else seems okay. 

Does the GPU artifact or anything like that before it throws errors?

The psu is an NZXT 650w 80+ gold from the h1 case.

I have a 5700x and an asrock a520m-ITX/ac

Arctic 280 liquid freezer

it has 1 m.2 ssd and 1 hdd

 

the cpu is pretty much at idle when rendering

 

Edit: No artifacts

 

This is the part of the log that shows the errors

 

2023-08-03 15:08:54.764 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Received update to 00625 iterations after 124.291s.
2023-08-03 15:09:02.922 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.11  IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): Kernel [6] (LightSamplingSr   ) failed after 2.561s
2023-08-03 15:09:02.922 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.11  IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while launching CUDA renderer in <internal>:951)
2023-08-03 15:09:02.922 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.11  IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): Failed to launch renderer
2023-08-03 15:09:02.923 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): Device failed while rendering
2023-08-03 15:09:02.923 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [WARNING] - IRAY:RENDER ::   1.6   IRAY   rend warn : CUDA device 0 (NVIDIA GeForce RTX 3090) is no longer available for rendering.
2023-08-03 15:09:02.924 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [WARNING] - IRAY:RENDER ::   1.6   IRAY   rend warn : All available GPUs failed.
2023-08-03 15:09:02.924 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: Fallback to CPU not allowed.
2023-08-03 15:09:02.924 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while initializing memory buffer)
2023-08-03 15:09:02.924 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: All workers failed: aborting render
2023-08-03 15:09:02.924 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): [Guidance sync] Failed slave device (remaining 0, done 0).
2023-08-03 15:09:02.924 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.924 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.924 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.924 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.925 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.925 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.925 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.925 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.925 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.925 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)
2023-08-03 15:09:02.925 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [ERROR] - IRAY:RENDER ::   1.6   IRAY   rend error: CUDA device 0 (NVIDIA GeForce RTX 3090): the launch timed out and was terminated (while de-allocating memory)

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Marbo said:

The psu is an NZXT 650w 80+ gold from the h1 case.

Is it in the H1 case? If so, go into your BIOS and force PCIE into gen 3 instead of Auto. If you're using GPU-Z, what is it showing for the link speed when you're having rendering errors? 

I'm not actually trying to be as grumpy as it seems.

I will find your mentions of Ikea or Gnome and I will /s post. 

Project Hot Box

CPU 13900k, Motherboard Gigabyte Aorus Elite AX, RAM CORSAIR Vengeance 4x16gb 5200 MHZ, GPU Zotac RTX 4090 Trinity OC, Case Fractal Pop Air XL, Storage Sabrent Rocket Q4 2tbCORSAIR Force Series MP510 1920GB NVMe, CORSAIR FORCE Series MP510 960GB NVMe, PSU CORSAIR HX1000i, Cooling Corsair XC8 CPU block, Bykski GPU block, 360mm and 280mm radiator, Displays Odyssey G9, LG 34UC98-W 34-Inch,Keyboard Mountain Everest Max, Mouse Mountain Makalu 67, Sound AT2035, Massdrop 6xx headphones, Go XLR 

Oppbevaring

CPU i9-9900k, Motherboard, ASUS Rog Maximus Code XI, RAM, 48GB Corsair Vengeance LPX 32GB 3200 mhz (2x16)+(2x8) GPUs Asus ROG Strix 2070 8gb, PNY 1080, Nvidia 1080, Case Mining Frame, 2x Storage Samsung 860 Evo 500 GB, PSU Corsair RM1000x and RM850x, Cooling Asus Rog Ryuo 240 with Noctua NF-12 fans

 

Why is the 5800x so hot?

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, IkeaGnome said:

Is it in the H1 case? If so, go into your BIOS and force PCIE into gen 3 instead of Auto. If you're using GPU-Z, what is it showing for the link speed when you're having rendering errors? 

Not in the h1 case.

 

I'll have to get back to you on the link speed, I've chucked a 2060 super in for now so i'll have to change it over.

Link to comment
Share on other sites

Link to post
Share on other sites

@IkeaGnome was going to say you're not going to believe this, but you probably will.

 

I've reinstalled the 3090 and as hard as I try I can't recreate the issue, it seems to be working fine. I'd previously removed and reinstalled the card and updated drivers (after ddu) and it still crashed. Now no issues I've loaded up the vram to 17GB and it just works.

 

I'll stress test it for a few hours and see what happens.

 

Thank you for your time helping.

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×