Jump to content

rtx 3080 crashing possibly due to capacitor choice

spartaman64
1 minute ago, RejZoR said:

I wonder if crashing to desktop is a part of GPU recovery trying to recover a GPU crash or is it something else. Should be easy to check in Event Viewer. I'm wondering if such hardware level fault is still being recovered by GPU recovery (resulting in dropping you on desktop over full on BSOD) or would it fail unconditionally to a point whole system would be gone instead. What I'm essentially wondering if this might be a driver level borkup where GPU shoots up its clock but forgets to properly follow the voltage curve at that point and is not actually really related to the capacitors on the back. Either this or someone sticks an oscilloscope to the back of those capacitors...

This intrigues me.  Does tech Jesus have an oscilloscope?  It seems like the kind of thing he’d have.  He just bought a big fancy aluminum mirror.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, BiG StroOnZ said:

98szbsdzwhp51.jpg.49eefda0e9ae07d61d672d88ab7ea500.jpg

 

Do you realize that it's not Nvidia's fault that their AIBs are cheap-asses?

It doesn't happen on Founders Edition cards,nor on EVGA and ASUS cards.

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

@Vishera

Yes it is. It's NVIDIA that specifies minimum power delivery requirements on a circuitry level and after that, they also approve a certain custom PCB design before it goes into production. Is it AIB's fault that NVIDIA underspeced power delivery and then approved the design based on said minimum design? Technically speaking, no. That's like only blaming a worker even though his supervisor approved his badly done work. No, it's NVIDIA's fault, because they do the design and they are approving them in the end. AIB's are just trying to save a buck because apparently the expenses on NVIDIA's GPU's are very high, leaving AIB's with very little room to make any profit on them so of course they either try to cut costs somewhere or they go the other direction and charge hefty premium for totally overbuilding it.

 

I always thought NVIDIA just says "this is what you need to run our GPU properly" and then it's entirely down to AIB's. I never thought approval process goes so far that NVIDIA actually checks every single design in the end. That was brought up by people like Buildzoid who are much closer to the production stuff and who understand the whole PCB design and manufacturing process much better.

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, Suika said:

Oh, see, I'm on the opposite side of the spectrum and I firmly believe a majority of this is NVIDIA's fault.

 

1) All board partner designs have to be approved by NVIDIA, so while Zotac, EVGA, or Palit could submit a shit design, if it gets through, then surely NVIDIA thought it was fine.

2) NVIDIA gave board partners very little time to test boards.

3) The time board partners did have to test boards, the majority of it was without drivers to actually run games or actual loads on, so partners could have been binning improper chips too.

4) Not making the reference design a bit more strict, like a 1+5 config should have been the bare minimum with 2+4 being recommended. ASUS just went overkill by the looks of it.

 

Some partners managed to do better than others but it definitely sounds like NVIDIA holds most of the fault, board partners just won't admit it because they don't want NVIDIA to wack them in the face with their massive dingly.

@Vishera

Main Rig :

Ryzen 7 2700X | Powercolor Red Devil RX 580 8 GB | Gigabyte AB350M Gaming 3 | 16 GB TeamGroup Elite 2400MHz | Samsung 750 EVO 240 GB | HGST 7200 RPM 1 TB | Seasonic M12II EVO | CoolerMaster Q300L | Dell U2518D | Dell P2217H | 

 

Laptop :

Thinkpad X230 | i5 3320M | 8 GB DDR3 | V-Gen 128 GB SSD |

Link to comment
Share on other sites

Link to post
Share on other sites

And I've canceled the order. If it was just availability I'd still wait, but now that hardware level issue might exist, it's just not worth it. And while I'm at it, I might just as well wait for AMD to release RDNA2 based GPU. If it's lackluster, I can still grab RTX 3080 and if it's better, chances are, RTX 3080 prices might drop in case I'd still want NVIDIA again. Which might be the case considering I like some of the features they have like Fast V-Sync which is not found on AMD cards and which I need to eliminate tearing since I don't have FreeSync or G-Sync monitor... But I may just as well go AMD this time around if they turn out to release a kick ass card. We'll see.

Link to comment
Share on other sites

Link to post
Share on other sites

Have there been widespread reports of the EVGA cards crashing? EVGA has made it impossible to google thanks to everyone reporting on EVGA's official response.

OBSIDIAN: CPU AMD Ryzen 9 3900X | MB ASUS ROG Crosshair VIII Hero Wifi | RAM Corsair Dominator RGB 32gb 3600 | GPU ASUS ROG Strix RTX 2080 Ti OC |

Cooler Corsair Hydro X | Storage Samsung 970 Evo 1tb | Samsung 860 QVO 2tb x2 | Seagate Barracuda 4tb x2 | Case Cosair Obsidian 500D RGB SE |

PSU Corsair HX750 | Cablemod Cables | Monitor Asus PG35VQAsus PG279Q | HID Corsair K70 Rapidfire RGB low profile | Corsair Dark Core Pro RGB SE | Xbox One Elite Controller Series 2

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, Energycore said:

I'm still annoyed that everyone calls them POSCAPs

 

It's like looking at a parking lot full of cars and saying "Dammit, the lot is full of chevrolets"

Or like saying "I have a Toyota compact Chevrolet"

Do you see how little sense this makes?

 

POSCAP is one of Panasonic's SMD Cap brands, please stop calling every SMD Cap that :(

Igor's lab, who I think started the POSCAP thing, just posted the following:

 

Quote

The fact that engineers like to refer to all the polymer capacitors (regardless of their exact design) as POS-CAPs (and not just those from Panasonic) is simply due to the way these components are distributed and also because developers like to call them Piece-Of-Shit CAPs. What exactly was installed on the circuit boards as a polymer capacitor does not play a primary role in the mode of operation, because the principle is always the same for each variant.

https://www.igorslab.de/en/nvidia-geforce-rtx-3080-und-rtx-3090-and-the-crash-why-the-capacitors-are-so-important-and-what-are-the-object-behind/2/

 

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

@porina

It's probably just how it's called coz of popularity or how people in the field commonly refer to them. Same as how in Europe (or at least in my country) a lot of people refer to ALL pressure washers as "Wap" (Wap was a brand name of pressure washers). "I washed my car with wap". Even though the pressure washer they actually have is made by Kärcher... I don't know how it's referred to exactly in English, but in my language, it's allowed to use such terms when something is so widely used or so popular that it replaces a general term. Even if it's a brand name.

 

So, POS-CAPs might be Panasonic's brand name or a specific item name, but was so widely used in industry the name stuck with everyone, making it fine to use even though it might technically be wrong. Just like that Kärcher being called Wap...

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Vishera said:

Do you realize that it's not Nvidia's fault that their AIBs are cheap-asses?

It doesn't happen on Founders Edition cards,nor on EVGA and ASUS cards.

There are reports that it does indeed happen on FE too.

 

And yes, I do believe it's Nvidias fault mostly to send out drivers and stuff way too late to their AIB partners...

 

 

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, RejZoR said:

So, POS-CAPs might be Panasonic's brand name or a specific item name, but was so widely used in industry the name stuck with everyone, making it fine to use even though it might technically be wrong. Just like that Kärcher being called Wap...

This is a thing for a long time where a trademark gets used as a generic term for the thing if it becomes popular enough. Well, it is a problem for whoever owns the trademark, because it can invalidate that trademark. At one point I recall Google issued guidance that people should say Google Search for something, as they didn't want "Google it" to become the generic term for a web search.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, porina said:

This is a thing for a long time where a trademark gets used as a generic term for the thing if it becomes popular enough. Well, it is a problem for whoever owns the trademark, because it can invalidate that trademark. At one point I recall Google issued guidance that people should say Google Search for something, as they didn't want "Google it" to become the generic term for a web search.

Which becomes a problem when you want to make it brand neutral. Everyone just says "google for it". And me using DuckDuckGo can't call it "duck for it" resorted to "search online" and I sometimes have a sense people look at me funny like they don't quite understand what I mean.

Link to comment
Share on other sites

Link to post
Share on other sites

Hopefully nVidia will learn from this shitty launch and bullshit. Also from the scalpers and botters to prevent all of this shit happening next time.

The RTX3080 looked so promising! I even wanted to upgrade but after all these fuckups, nah, fuck that. I'll keep my 1080Ti for the next 2-3 years.

 

And btw Jensen Huang, it's NOT safe for us Pascal gamers to upgrade!

DAC/AMPs:

Klipsch Heritage Headphone Amplifier

Headphones: Klipsch Heritage HP-3 Walnut, Meze 109 Pro, Beyerdynamic Amiron Home, Amiron Wireless Copper, Tygr 300R, DT880 600ohm Manufaktur, T90, Fidelio X2HR

CPU: Intel 4770, GPU: Asus RTX3080 TUF Gaming OC, Mobo: MSI Z87-G45, RAM: DDR3 16GB G.Skill, PC Case: Fractal Design R4 Black non-iglass, Monitor: BenQ GW2280

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, RejZoR said:

I wonder if crashing to desktop is a part of GPU recovery trying to recover a GPU crash or is it something else. Should be easy to check in Event Viewer. I'm wondering if such hardware level fault is still being recovered by GPU recovery (resulting in dropping you on desktop over full on BSOD) or would it fail unconditionally to a point whole system would be gone instead. What I'm essentially wondering if this might be a driver level borkup where GPU shoots up its clock but forgets to properly follow the voltage curve at that point and is not actually really related to the capacitors on the back. Either this or someone sticks an oscilloscope to the back of those capacitors...

From what I've read, it's not a power issue. Meaning, it has enough capacitance for boosting. No, the problem is that POSCAPs can't filter out higher frequencies that MLCCs can. So once the GPU boosts the clock past a certain point, noise is introduced and causes the GPU to malfunction due to added ingress noise.

 

So, the long-term "fix" is to limit the GPU freq boost through a V-BIOS update. What I suspect will be a PCB rev 2.0 board that will use MLCCs while remaining silent about the phase out of rev 1.0

Link to comment
Share on other sites

Link to post
Share on other sites

29 minutes ago, StDragon said:

From what I've read, it's not a power issue. Meaning, it has enough capacitance for boosting. No, the problem is that POSCAPs can't filter out higher frequencies that MLCCs can. So once the GPU boosts the clock past a certain point, noise is introduced and causes the GPU to malfunction due to added ingress noise.

 

So, the long-term "fix" is to limit the GPU freq boost through a V-BIOS update. What I suspect will be a PCB rev 2.0 board that will use MLCCs while remaining silent about the phase out of rev 1.0

Well, if I was a vendor or a seller, I'd want to let users abundantly aware what they are buying and that whatever it is being sold doesn't have that problem. Graphic card images are useless coz they are generic and for the most part not even photos of a real product but a render. Just for Inno3D RTX 3080 iChill X4 I've seen 3 different variations. One with all 6 black chips, one with 4x2 configuration with yellow ceramics and one with 5x1. Then go figure which one you'll get when store selling them only shows a 3D render with all blacks...

Link to comment
Share on other sites

Link to post
Share on other sites

It looks like a couple vendors may have already updated their PCBs to account for the capacitor issues. As others in this thread have already speculated, it appears the hardware revisions will be attended to be snuck in under the radar.

F#$k timezone programming. Use UTC! (See XKCD #1883)

PC Specs:

Ryzen 5900x, MSI 3070Ti, 2 x 1 TiB SSDs, 32 GB 3400 DDR4, Cooler Master NR200P

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Qub3d said:

It looks like a couple vendors may have already updated their PCBs to account for the capacitor issues. As others in this thread have already speculated, it appears the hardware revisions will be attended to be snuck in under the radar.

 

(I thought it was a big enough development to warrant a separate post but the mods disagreed.)

i think the asus update is prior to production since every market card ive seen reported 6 mlccs. and reddit also reported 1 mlcc on the ventus. but the gaming x trio got updated to 2 mlcc i guess

Link to comment
Share on other sites

Link to post
Share on other sites

On 9/27/2020 at 7:32 AM, RejZoR said:

Yes it is. It's NVIDIA that specifies minimum power delivery requirements on a circuitry level 

Yes, but these designed to provide a minimum standard required to hit base clocks and that alone. NVIDIA has no control over the vBIOS that AIBs use or how hard they push boost clocks on their cards which seems to represent the overwhelming majority of issues here.

 

AIBs are chasing boost clock speeds equal to or higher than the FE cards because that's what sells. They're simultaneously trying to undercut or match these cards in price which entails cheaping out on components. They reasonably should have foreseen stability issues being a potential outcome of this but frankly it smacks of trying desperately to get stuff out of the door and on the market. 

[ P R O J E C T _ M E L L I F E R A ]

[ 5900X @4.7GHz PBO2 | X570S Aorus Pro | 32GB GSkill Trident Z 3600MHz CL16 | EK-Quantum Reflection ]
[ ASUS RTX4080 TUF OC @3000MHz | O11D-XL | HardwareLabs GTS and GTX 360mm | XSPC D5 SATA ]

[ TechN / Phanteks G40 Blocks | Corsair AX750 | ROG Swift PG279Q | Q-Acoustics 2010i | Sabaj A4 ]

 

P R O J E C T | S A N D W A S P

6900K | RTX2080 | 32GB DDR4-3000 | Custom Loop 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, HM-2 said:

Yes, but these designed to provide a minimum standard required to hit base clocks and that alone. NVIDIA has no control over the vBIOS that AIBs use or how hard they push boost clocks on their cards which seems to represent the overwhelming majority of issues here.

 

AIBs are chasing boost clock speeds equal to or higher than the FE cards because that's what sells. They're simultaneously trying to undercut or match these cards in price which entails cheaping out on components. They reasonably should have foreseen stability issues being a potential outcome of this but frankly it smacks of trying desperately to get stuff out of the door and on the market. 

Rubbish. Base clock is 1.44GHz. You damn well know not a single RTX 3080 card will run at that. Advertised Boost Clock is 1.71 GHz. You also damn well know not a single RTX 3080 card will run at this clock. They ALL boost FAR beyond advertised boost clock. Also ALL reviewers test cards at boost clocks with FE cards. We're basing all the buying decisions on these over the top boost clocks and not on advertised 1.71 GHz boost clock. They are selling us on results of those over the top scores and framerates. If anything you say was true, then all RTX 3080 cards were tested at 1.71GHz and that would be advertised as "this is the performance you're promised". And anything beyond that is a bonus which may vary. Instead we're sold on the promise of varying final performance with an excuse that base clock is some number 200 kilometers back that no one measures anything at.

Link to comment
Share on other sites

Link to post
Share on other sites

*cries in no GPU.

 

I would buy any AMD GPU if their stack didn’t suck; Intel is a monopoly but at least they’ll take your money and give you the products; one of these companies needs to give us a CUDA alternative.

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, Jet_ski said:

*cries in no GPU.

 

I would buy any AMD GPU if their stack didn’t suck; Intel is a monopoly but at least they’ll take your money and give you the products; one of these companies needs to give us a CUDA alternative.

AMD has had a pretty good CUDA alternative for many years.  

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

The reason NVIDIA's CUDA is more popular is because NVIDIA put more effort into software side of things which is why more software is optimized for compute via CUDA than AMD's compute. All the SDK's and stuff that lets devs integrate their thing to run on CUDA is just better with NVIDIA. I think that's Jet_ski's real problem. I haven't worked with either, but that's the perception I have as a regular consumer or bystander if I can call it that way. In general, NVIDIA when it makes some new feature they go all out on software support and they quickly throw all the goodies at the devs to speed up adoption. Where AMD seems quite passive. They do maintain their open source webpage for the important features and with some SDK's, but it just feels like they sort of offer things and then you're on your own where NVIDIA feels like they are a lot more involved with devs. But that's just my observation, not actual experience as developer.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Bombastinator said:

AMD has had a pretty good CUDA alternative for many years.  

Like @RejZoR said, AMD’s alternative to CUDA, which is called ROCm, is out there but it’s up to the users to make it work. I’ve asked experts all of whom told me not to bother and “AMD’s stack sucks.” Basically the software that’s needed to communicate with the hardware doesn’t exist or it’s incomplete because AMD never built it. For instance if you send instructions to an AMD GPU, some of it will still be in C++ code and it will have to compile live as the user is trying to run it. This is extremely inefficient and also nobody wants to give their code away for free.

 

This talk was given a while back but it seems to still be the case.


But mummy I don’t want to use CUDA - Dave Airlie at the Linux conference

 

Link to comment
Share on other sites

Link to post
Share on other sites

 

Long story short, Der8aur replaced two 'POSCAPs' with MLCC arrays and managed to fix the crashing issue. I still suspect that we're still going to see GPUs binned too high and crashing anyway, but that's a different story as a result of NVIDIA rushing out a product/keeping their board partners in the dark to prevent all the leaks that happened anyway.

if you have to insist you think for yourself, i'm not going to believe you.

Link to comment
Share on other sites

Link to post
Share on other sites

I don't even care if manual OC crashes. But I want to be assured it NEVER happens when running "stock" and GPU on its own makes the boost. You always expect out of the factory to be unconditionally stable. It should NEVER crash. If you're expected to do a small underclock, that's already stupid.

 

The thing is, manual overclock is really not that useful these days. You may get few MHz or stabilize the auto boosting to a slightly higher point. But it always gets limited by something. Either voltage limit, power limit or thermal limit. Trying on my GTX 1080Ti, you're always hitting something. Even if you run fan at 100%, raise the power limit to infinity and same with temperature limit, you'll start hitting voltage limit. And if you rise voltage, you'll probably soon start hitting power limit or something. It's just always something so your headroom for doing anything is just silly tiny.

 

I've canceled the preorder now and I think it was a good decision. And since I didn't get the 3080 in first wave, I might just as well wait for RX 6000 series and wait for vendors to get on top of this caps thing even if I then still decide for the GeForce over Radeon. No one wants to have a ticking bomb in a PC, especially when no manual OC done to it.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×