rtx 3080 crashing possibly due to capacitor choice

Bombastinator · September 27, 2020

1 minute ago, RejZoR said:

I wonder if crashing to desktop is a part of GPU recovery trying to recover a GPU crash or is it something else. Should be easy to check in Event Viewer. I'm wondering if such hardware level fault is still being recovered by GPU recovery (resulting in dropping you on desktop over full on BSOD) or would it fail unconditionally to a point whole system would be gone instead. What I'm essentially wondering if this might be a driver level borkup where GPU shoots up its clock but forgets to properly follow the voltage curve at that point and is not actually really related to the capacitors on the back. Either this or someone sticks an oscilloscope to the back of those capacitors...

This intrigues me. Does tech Jesus have an oscilloscope? It seems like the kind of thing he’d have. He just bought a big fancy aluminum mirror.

BiG StroOnZ · September 27, 2020

Vishera · September 27, 2020

1 hour ago, BiG StroOnZ said:

Do you realize that it's not Nvidia's fault that their AIBs are cheap-asses?

It doesn't happen on Founders Edition cards,nor on EVGA and ASUS cards.

RejZoR · September 27, 2020

@Vishera

Yes it is. It's NVIDIA that specifies minimum power delivery requirements on a circuitry level and after that, they also approve a certain custom PCB design before it goes into production. Is it AIB's fault that NVIDIA underspeced power delivery and then approved the design based on said minimum design? Technically speaking, no. That's like only blaming a worker even though his supervisor approved his badly done work. No, it's NVIDIA's fault, because they do the design and they are approving them in the end. AIB's are just trying to save a buck because apparently the expenses on NVIDIA's GPU's are very high, leaving AIB's with very little room to make any profit on them so of course they either try to cut costs somewhere or they go the other direction and charge hefty premium for totally overbuilding it.

I always thought NVIDIA just says "this is what you need to run our GPU properly" and then it's entirely down to AIB's. I never thought approval process goes so far that NVIDIA actually checks every single design in the end. That was brought up by people like Buildzoid who are much closer to the production stuff and who understand the whole PCB design and manufacturing process much better.

Fatih19 · September 27, 2020

14 hours ago, Suika said:

Oh, see, I'm on the opposite side of the spectrum and I firmly believe a majority of this is NVIDIA's fault.

1) All board partner designs have to be approved by NVIDIA, so while Zotac, EVGA, or Palit could submit a shit design, if it gets through, then surely NVIDIA thought it was fine.

2) NVIDIA gave board partners very little time to test boards.

3) The time board partners did have to test boards, the majority of it was without drivers to actually run games or actual loads on, so partners could have been binning improper chips too.

4) Not making the reference design a bit more strict, like a 1+5 config should have been the bare minimum with 2+4 being recommended. ASUS just went overkill by the looks of it.

Some partners managed to do better than others but it definitely sounds like NVIDIA holds most of the fault, board partners just won't admit it because they don't want NVIDIA to wack them in the face with their massive dingly.

@Vishera

RejZoR · September 27, 2020

And I've canceled the order. If it was just availability I'd still wait, but now that hardware level issue might exist, it's just not worth it. And while I'm at it, I might just as well wait for AMD to release RDNA2 based GPU. If it's lackluster, I can still grab RTX 3080 and if it's better, chances are, RTX 3080 prices might drop in case I'd still want NVIDIA again. Which might be the case considering I like some of the features they have like Fast V-Sync which is not found on AMD cards and which I need to eliminate tearing since I don't have FreeSync or G-Sync monitor... But I may just as well go AMD this time around if they turn out to release a kick ass card. We'll see.

DELTAprime · September 27, 2020

Have there been widespread reports of the EVGA cards crashing? EVGA has made it impossible to google thanks to everyone reporting on EVGA's official response.

porina · September 27, 2020

15 hours ago, Energycore said:

I'm still annoyed that everyone calls them POSCAPs

It's like looking at a parking lot full of cars and saying "Dammit, the lot is full of chevrolets"

Or like saying "I have a Toyota compact Chevrolet"

Do you see how little sense this makes?

POSCAP is one of Panasonic's SMD Cap brands, please stop calling every SMD Cap that

Igor's lab, who I think started the POSCAP thing, just posted the following:

Quote

The fact that engineers like to refer to all the polymer capacitors (regardless of their exact design) as POS-CAPs (and not just those from Panasonic) is simply due to the way these components are distributed and also because developers like to call them Piece-Of-Shit CAPs. What exactly was installed on the circuit boards as a polymer capacitor does not play a primary role in the mode of operation, because the principle is always the same for each variant.

https://www.igorslab.de/en/nvidia-geforce-rtx-3080-und-rtx-3090-and-the-crash-why-the-capacitors-are-so-important-and-what-are-the-object-behind/2/

RejZoR · September 27, 2020

@porina

It's probably just how it's called coz of popularity or how people in the field commonly refer to them. Same as how in Europe (or at least in my country) a lot of people refer to ALL pressure washers as "Wap" (Wap was a brand name of pressure washers). "I washed my car with wap". Even though the pressure washer they actually have is made by Kärcher... I don't know how it's referred to exactly in English, but in my language, it's allowed to use such terms when something is so widely used or so popular that it replaces a general term. Even if it's a brand name.

So, POS-CAPs might be Panasonic's brand name or a specific item name, but was so widely used in industry the name stuck with everyone, making it fine to use even though it might technically be wrong. Just like that Kärcher being called Wap...

Mark Kaine · September 27, 2020

4 hours ago, Vishera said:

Do you realize that it's not Nvidia's fault that their AIBs are cheap-asses?

It doesn't happen on Founders Edition cards,nor on EVGA and ASUS cards.

There are reports that it does indeed happen on FE too.

And yes, I do believe it's Nvidias fault mostly to send out drivers and stuff way too late to their AIB partners...

porina · September 27, 2020

1 minute ago, RejZoR said:

So, POS-CAPs might be Panasonic's brand name or a specific item name, but was so widely used in industry the name stuck with everyone, making it fine to use even though it might technically be wrong. Just like that Kärcher being called Wap...

This is a thing for a long time where a trademark gets used as a generic term for the thing if it becomes popular enough. Well, it is a problem for whoever owns the trademark, because it can invalidate that trademark. At one point I recall Google issued guidance that people should say Google Search for something, as they didn't want "Google it" to become the generic term for a web search.

RejZoR · September 27, 2020

6 minutes ago, porina said:

This is a thing for a long time where a trademark gets used as a generic term for the thing if it becomes popular enough. Well, it is a problem for whoever owns the trademark, because it can invalidate that trademark. At one point I recall Google issued guidance that people should say Google Search for something, as they didn't want "Google it" to become the generic term for a web search.

Which becomes a problem when you want to make it brand neutral. Everyone just says "google for it". And me using DuckDuckGo can't call it "duck for it" resorted to "search online" and I sometimes have a sense people look at me funny like they don't quite understand what I mean.

CTR640 · September 27, 2020

Hopefully nVidia will learn from this shitty launch and bullshit. Also from the scalpers and botters to prevent all of this shit happening next time.

The RTX3080 looked so promising! I even wanted to upgrade but after all these fuckups, nah, fuck that. I'll keep my 1080Ti for the next 2-3 years.

And btw Jensen Huang, it's NOT safe for us Pascal gamers to upgrade!

StDragon · September 27, 2020

15 hours ago, RejZoR said:

I wonder if crashing to desktop is a part of GPU recovery trying to recover a GPU crash or is it something else. Should be easy to check in Event Viewer. I'm wondering if such hardware level fault is still being recovered by GPU recovery (resulting in dropping you on desktop over full on BSOD) or would it fail unconditionally to a point whole system would be gone instead. What I'm essentially wondering if this might be a driver level borkup where GPU shoots up its clock but forgets to properly follow the voltage curve at that point and is not actually really related to the capacitors on the back. Either this or someone sticks an oscilloscope to the back of those capacitors...

From what I've read, it's not a power issue. Meaning, it has enough capacitance for boosting. No, the problem is that POSCAPs can't filter out higher frequencies that MLCCs can. So once the GPU boosts the clock past a certain point, noise is introduced and causes the GPU to malfunction due to added ingress noise.

So, the long-term "fix" is to limit the GPU freq boost through a V-BIOS update. What I suspect will be a PCB rev 2.0 board that will use MLCCs while remaining silent about the phase out of rev 1.0

RejZoR · September 27, 2020

29 minutes ago, StDragon said:

From what I've read, it's not a power issue. Meaning, it has enough capacitance for boosting. No, the problem is that POSCAPs can't filter out higher frequencies that MLCCs can. So once the GPU boosts the clock past a certain point, noise is introduced and causes the GPU to malfunction due to added ingress noise.

So, the long-term "fix" is to limit the GPU freq boost through a V-BIOS update. What I suspect will be a PCB rev 2.0 board that will use MLCCs while remaining silent about the phase out of rev 1.0

Well, if I was a vendor or a seller, I'd want to let users abundantly aware what they are buying and that whatever it is being sold doesn't have that problem. Graphic card images are useless coz they are generic and for the most part not even photos of a real product but a render. Just for Inno3D RTX 3080 iChill X4 I've seen 3 different variations. One with all 6 black chips, one with 4x2 configuration with yellow ceramics and one with 5x1. Then go figure which one you'll get when store selling them only shows a 3D render with all blacks...

Qub3d · September 28, 2020

It looks like a couple vendors may have already updated their PCBs to account for the capacitor issues. As others in this thread have already speculated, it appears the hardware revisions will be attended to be snuck in under the radar.

spartaman64 · September 28, 2020

7 minutes ago, Qub3d said:

It looks like a couple vendors may have already updated their PCBs to account for the capacitor issues. As others in this thread have already speculated, it appears the hardware revisions will be attended to be snuck in under the radar.

(I thought it was a big enough development to warrant a separate post but the mods disagreed.)

i think the asus update is prior to production since every market card ive seen reported 6 mlccs. and reddit also reported 1 mlcc on the ventus. but the gaming x trio got updated to 2 mlcc i guess

HM-2 · September 28, 2020

On 9/27/2020 at 7:32 AM, RejZoR said:

Yes it is. It's NVIDIA that specifies minimum power delivery requirements on a circuitry level

Yes, but these designed to provide a minimum standard required to hit base clocks and that alone. NVIDIA has no control over the vBIOS that AIBs use or how hard they push boost clocks on their cards which seems to represent the overwhelming majority of issues here.

AIBs are chasing boost clock speeds equal to or higher than the FE cards because that's what sells. They're simultaneously trying to undercut or match these cards in price which entails cheaping out on components. They reasonably should have foreseen stability issues being a potential outcome of this but frankly it smacks of trying desperately to get stuff out of the door and on the market.

RejZoR · September 28, 2020

2 hours ago, HM-2 said:

Yes, but these designed to provide a minimum standard required to hit base clocks and that alone. NVIDIA has no control over the vBIOS that AIBs use or how hard they push boost clocks on their cards which seems to represent the overwhelming majority of issues here.

AIBs are chasing boost clock speeds equal to or higher than the FE cards because that's what sells. They're simultaneously trying to undercut or match these cards in price which entails cheaping out on components. They reasonably should have foreseen stability issues being a potential outcome of this but frankly it smacks of trying desperately to get stuff out of the door and on the market.

Rubbish. Base clock is 1.44GHz. You damn well know not a single RTX 3080 card will run at that. Advertised Boost Clock is 1.71 GHz. You also damn well know not a single RTX 3080 card will run at this clock. They ALL boost FAR beyond advertised boost clock. Also ALL reviewers test cards at boost clocks with FE cards. We're basing all the buying decisions on these over the top boost clocks and not on advertised 1.71 GHz boost clock. They are selling us on results of those over the top scores and framerates. If anything you say was true, then all RTX 3080 cards were tested at 1.71GHz and that would be advertised as "this is the performance you're promised". And anything beyond that is a bonus which may vary. Instead we're sold on the promise of varying final performance with an excuse that base clock is some number 200 kilometers back that no one measures anything at.

Jet_ski · September 29, 2020

*cries in no GPU.

I would buy any AMD GPU if their stack didn’t suck; Intel is a monopoly but at least they’ll take your money and give you the products; one of these companies needs to give us a CUDA alternative.

Bombastinator · September 29, 2020

8 minutes ago, Jet_ski said:

*cries in no GPU.

I would buy any AMD GPU if their stack didn’t suck; Intel is a monopoly but at least they’ll take your money and give you the products; one of these companies needs to give us a CUDA alternative.

AMD has had a pretty good CUDA alternative for many years.

RejZoR · September 29, 2020

The reason NVIDIA's CUDA is more popular is because NVIDIA put more effort into software side of things which is why more software is optimized for compute via CUDA than AMD's compute. All the SDK's and stuff that lets devs integrate their thing to run on CUDA is just better with NVIDIA. I think that's Jet_ski's real problem. I haven't worked with either, but that's the perception I have as a regular consumer or bystander if I can call it that way. In general, NVIDIA when it makes some new feature they go all out on software support and they quickly throw all the goodies at the devs to speed up adoption. Where AMD seems quite passive. They do maintain their open source webpage for the important features and with some SDK's, but it just feels like they sort of offer things and then you're on your own where NVIDIA feels like they are a lot more involved with devs. But that's just my observation, not actual experience as developer.

Jet_ski · September 29, 2020

4 hours ago, Bombastinator said:

AMD has had a pretty good CUDA alternative for many years.

Like @RejZoR said, AMD’s alternative to CUDA, which is called ROCm, is out there but it’s up to the users to make it work. I’ve asked experts all of whom told me not to bother and “AMD’s stack sucks.” Basically the software that’s needed to communicate with the hardware doesn’t exist or it’s incomplete because AMD never built it. For instance if you send instructions to an AMD GPU, some of it will still be in C++ code and it will have to compile live as the user is trying to run it. This is extremely inefficient and also nobody wants to give their code away for free.

This talk was given a while back but it seems to still be the case.

But mummy I don’t want to use CUDA - Dave Airlie at the Linux conference

Suika · September 29, 2020

Long story short, Der8aur replaced two 'POSCAPs' with MLCC arrays and managed to fix the crashing issue. I still suspect that we're still going to see GPUs binned too high and crashing anyway, but that's a different story as a result of NVIDIA rushing out a product/keeping their board partners in the dark to prevent all the leaks that happened anyway.

RejZoR · September 29, 2020

I don't even care if manual OC crashes. But I want to be assured it NEVER happens when running "stock" and GPU on its own makes the boost. You always expect out of the factory to be unconditionally stable. It should NEVER crash. If you're expected to do a small underclock, that's already stupid.

The thing is, manual overclock is really not that useful these days. You may get few MHz or stabilize the auto boosting to a slightly higher point. But it always gets limited by something. Either voltage limit, power limit or thermal limit. Trying on my GTX 1080Ti, you're always hitting something. Even if you run fan at 100%, raise the power limit to infinity and same with temperature limit, you'll start hitting voltage limit. And if you rise voltage, you'll probably soon start hitting power limit or something. It's just always something so your headroom for doing anything is just silly tiny.

I've canceled the preorder now and I think it was a good decision. And since I didn't get the 3080 in first wave, I might just as well wait for RX 6000 series and wait for vendors to get on top of this caps thing even if I then still decide for the GeForce over Radeon. No one wants to have a ticking bomb in a PC, especially when no manual OC done to it.

Sign In

rtx 3080 crashing possibly due to capacitor choice

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites