Jump to content

Hot News? More Like... HWInfo64 Now Shows Hot VRAM for RTX 3080/3090!

Original source - https://www.tomshardware.com/news/hwinfo64-adds-gddr6x-temp-monitoring-rtx30series

 

Summary

With a very recent update to HWInfo64, users are now able to see directly via the popular hardware monitoring tool the temperature of their GDDR6X memory modules for any RTX 3080/3090 cards which they own, and boy will they (likely) see a surprise waiting for them.  Even with DLSS enabled, maximum temps of GDDR6X memory modules can reach as high as 100 C while gaming.  And this is even worse if someone is attempting to use their RTX 3080/3090 for memory intensive workloads like cryptocurrency mining; the VRAM in those cases would shoot up to 110 C and severely down-clock itself to keep itself from being literally roasted to death, which makes sense as 110C is already (sort of) higher than the "95+C" operating temperature listed on Micron's official website for GDDR6X.  Igor's Lab (the first afaik to bring up this "hot VRAM" issue back in September of 2020) estimates that the Micron chips themselves will likely have to hit 120 C before sustaining immediate permanent damage, but (also afaik) the general consensus among anyone who knows at least a little bit about computer hardware is that triple-digit positive (or negative too in many cases afaik) Celsius temperatures are typically never good for the lifespan of the hardware itself.

 

Quotes
Tom's hardware on GDDR6X temperatures while gaming -

Quote

The RTX 3080 with Metro Exodus at [4K Ultra settings] hit a peak temperature of 94C. We also tested an RTX 3090 Founders Edition in Cyberpunk 2077 with DLSS and Ray Tracing enabled. GDDR6X temperatures for that card peaked at 100C.

Same article detailing how this issue also affects various AIB-partner cards -

Quote

But when it comes to Ethereum mining, temperatures go to a whole other level: When mining on both the RTX 3080 and RTX 3090, we found that the GDDR6X modules would peak at a much higher 110C, and the GPU would downclock itself severely to compensate for the ridiculously high VRAM temperature. This occurred on multiple different boards, from various vendors. And that's before applying any overclocking settings, which some miners like to do in order to chase every last bit of hashing performance.

And finally, quote from Igor's Lab regarding the absolute max temps before immediate permanent damage:

Quote

In response to questions among colleagues, for example from the R&D departments, it was unanimously agreed that the maximum temperature Ttot before the start of a possible destruction of the chip should be 120 °C and that Tjunction should probably be set at 105 °C or for the GDDR6X even at 110 °C is specified as the maximum value.

 

My thoughts


1. Since I felt that this additional info wasn't necessarily part of the article summary, I'll include a link to a video posted within the past 24 hours by Classical Technology about this exact issue that demonstrates that this hot VRAM issue is essentially replicable for any VRAM-intensive workloads; that video also nicely includes some tips about how to solve this issue (and the pinned comment about whether or not it's a "GDDR6X-only" issue may or may not have been me lol):

2.  I daily drive a HP Zbook 15 (1st Gen) with Ubuntu installed on it, and before I re-pasted the i7-4800MQ the (bloody) thing would thermal throttle at just 40% load so yea I can definitely feel the pain of whoever's facing these thermal issues.  Yes it was definitely (at least mainly) a paste issue as thermal pumping had basically pushed most of the paste to the sides of the CPU (unfortunately I only learned about thermal pumping after re-pasting the CPU so I thought that it was just crappy thermal paste, which is also why I didn't bother taking a pic), and I am fairly confident that the heat-sink was still getting (at least somewhat) enough airflow as I did take a picture (see attached) of how clogged the heat-sink was before I cleaned it and did the re-pasting.  I couldn't even get that blob of dust (on the right) out despite multiple attempts of using canned air; heck I didn't even know it existed until I cracked everything open, so yea either way a re-pasting was kinda necessary regardless of how much it was actually a dust issue to begin with.

 

3. Sounds like something that the new and improved chiller should be able to handle with a bit of custom engineering. Video idea anyone?

4. I honestly think that those of us who weren't able to get our hands on the (at least as of right now) unicorns which are the RTX 3080 and 3090 might've actually lucked out given that these are essentially hardware issues that cannot be fixed without putting severe power limits on the cards themselves, unless if Nvidia comes out with a driver which allows VRAM-specific power-limiting tuning, which doesn't seem to be the case at least as of right now.

4(b?). You know, the fact that this issue was already known at least several months ago during a September kinda reminds me of a little pandemic which also had signs that it could have began as early as September of a different year...  Let's just hope that I didn't just jinx 2021 for computer hardware too lol.

5. Looks like @FaxedForward was onto something back in September 2020...

6.  Oh I almost forgot... Fermi
 

Sources


(Original source at top)
Micron's official GDDR6X website - https://www.micron.com/products/ultra-bandwidth-solutions/gddr6x
Igor's lab article - https://www.igorslab.de/en/gddr6x-am-limit-ueber-100-grad-bei-der-geforce-rtx-3080-fe-im-chip-gemessen-2/

before_dust_removal.jpeg

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, linuxChips2600 said:

users are now able to see directly via the popular hardware monitoring tool the temperature of their GDDR6X memory modules for any RTX 3080/3090 cards which they own, and boy will they (likely) see a surprise waiting for them. 

my card kept throttling and the backplate have been burning me while i was playing with my 3090 ytd

image.png

 

this is some next lvl BS, tbh

it's hitting 110c, throttling and dropping to 108c, and continue

 

the G6X modules produce too much heat, especially the ones that arent adequately cooled on the backside of the PCB for 3090

-sigh- feeling like I'm being too negative lately

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Moonzy said:

this is some next lvl BS, tbh

You mean something about my post or the temps? just curious.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, linuxChips2600 said:

You mean something about my post or the temps? just curious.

the temps

 

i have a fan blowing directly on my backplate in open air environment and it still overheats

-sigh- feeling like I'm being too negative lately

Link to comment
Share on other sites

Link to post
Share on other sites

Must be why EKWB released these babies:

 

https://www.ekwb.com/shop/ek-quantum-vector-fe-rtx-3090-d-rgb-silver-special-edition

 

Quite an engineering marvel really. Active backplate cooling to chill those memory modules on the back of the PCB. Though $330 would set you back more, considering you just spent that hefty $1800-2000 of yours on that RTX3090 🤣.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Moonzy said:

i have a fan blowing directly on my backplate in open air environment and it still overheats

If you hadn't taken a screenshot of your temp monitoring tool I would've asked you if it was a gtx 480 just to be sure...

No really though depending on how comfortable you're with hardware modding the best long-term solution is probably custom closed-looped liquid cooling with active refrigeration.  Or just power limit it to like 250 watts until Nvidia releases specific VRAM-power-limiting drivers lol.

Link to comment
Share on other sites

Link to post
Share on other sites

I have nothing to say about anything else, but I like that Hwinfo64 is now able to display even more information. Good addition, IMHO, regardless of how useful it is in practice or not.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, leeyz717 said:

Must be why EKWB released these babies:

 

https://www.ekwb.com/shop/ek-quantum-vector-fe-rtx-3090-d-rgb-silver-special-edition

 

Quite an engineering marvel really. Active backplate cooling to chill those memory modules on the back of the PCB. Though $330 would set you back more, considering you just spent that hefty $1800-2000 of yours on that RTX3090 🤣.

 

 

But what if I wanted my graphics card to double as a food warmer?

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, linuxChips2600 said:

No really though depending on how comfortable you're with hardware modding the best long-term solution is probably custom closed-looped liquid cooling with active refrigeration.  Or just power limit it to like 250 watts until Nvidia releases specific VRAM-power-limiting drivers lol.

From my very basic fiddling, it'll run at 106c at stock, any oc above +400-600 on VRAM seems to make it jump to 108-110c (it jumps in 2c steps)

 

Kinda moot point to let us know when we can't control (not that I know of) the voltage (and thus, the temperature) of these g6x modules

-sigh- feeling like I'm being too negative lately

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, WereCatf said:

Good addition, IMHO, regardless of how useful it is in practice or not.

I think showing that your graphics card might literally fry itself to death could be considered "useful info"...

Link to comment
Share on other sites

Link to post
Share on other sites

Crazy that we're going to need more advanced cooling for VRAM on these cards, I'm starting to wonder if nvidia just got too aggressive, or if this was some oversight in thermal design that VRAM just wasn't getting adequate cooling...

Given how quickly manufacturers had to get out their AIB cards it's not a real surprise so to speak, but somewhere along the way someone would have known about this, right?

My profile picure is real. That's what I look like in real life. I'm actually a blue and white African Wild Dog.

Ryzen 9 5900X - MSI Ventus 2x OC 3060 Ti - 2x8GB Corsair Vengeance LPX 3200MHz CL16 - ASRock B550 Phantom Gaming ITX/ax

EVGA CLC 280 + 2x140mm NF-A14 - Samsung 850 EVO 500GB + WD Black SN750 1TB - Windows 11/10 - EVGA Supernova G3 1000W

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, DaJakerBoss said:

but somewhere along the way someone would have known about this, right?

If HW Unboxed has taught us anything... It's that Nvidia is basically willing to do (almost) anything to remain "the o̶n̶l̶y̶ way y̶o̶u̶'̶r̶e̶ it's meant to be played."

i.e. it's very likely "the bean-counters" and almost certainly not "the engineers"

Edited by linuxChips2600
added clarification
Link to comment
Share on other sites

Link to post
Share on other sites

Just now, DaJakerBoss said:

somewhere along the way someone would have known about this, right?

Nvidia used the 19gbps g6x from micron when micron have 21gbps modules, from what I read.

Speculate that Nvidia knew about the heat problem and chose a lower performing one

 

https://www.techpowerup.com/forums/threads/the-reason-why-nvidias-geforce-rtx-3080-gpu-uses-19-gbps-gddr6x-memory-and-not-faster-variants.272420/

-sigh- feeling like I'm being too negative lately

Link to comment
Share on other sites

Link to post
Share on other sites

I mean the memory alone eats 140Watts of power so this is understandable.

Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB

            CPU: Ryzen 9 5900X          Case: Antec P8     PSU: Corsair RM850x                        Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM

            Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, williamcll said:

I mean the memory alone eats 140Watts of power so this is understandable.

Any sources, or just from experience?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, linuxChips2600 said:

If HW Unboxed has taught us anything... It's that Nvidia is basically willing to do (almost) anything to remain "the o̶n̶l̶y̶ way y̶o̶u̶'̶r̶e̶ it's meant to be played."

I mean yeah, but it's one thing to shut down a reviewer for not giving light to a feature of your card, and another to attempt to drive your products to extremely aggressive performance targets and not even cool it to the point where the card could quite literally kill itself in just normal fucking operation, let alone at this price point. If cards fail, nvidia and their board partners could probably face class-action lawsuits for this kind of major design oversight/flaw

My profile picure is real. That's what I look like in real life. I'm actually a blue and white African Wild Dog.

Ryzen 9 5900X - MSI Ventus 2x OC 3060 Ti - 2x8GB Corsair Vengeance LPX 3200MHz CL16 - ASRock B550 Phantom Gaming ITX/ax

EVGA CLC 280 + 2x140mm NF-A14 - Samsung 850 EVO 500GB + WD Black SN750 1TB - Windows 11/10 - EVGA Supernova G3 1000W

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, linuxChips2600 said:

Any sources, or just from experience?

Sorry I meant 120W
image.thumb.png.c20b45df1074cf998ebd9a287f567466.png

source: https://www.youtube.com/watch?v=et9FRdWPnvM

 

Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB

            CPU: Ryzen 9 5900X          Case: Antec P8     PSU: Corsair RM850x                        Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM

            Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Moonzy said:

Nvidia used the 19gbps g6x from micron when micron have 21gbps modules, from what I read.

Speculate that Nvidia knew about the heat problem and chose a lower performing one

 

https://www.techpowerup.com/forums/threads/the-reason-why-nvidias-geforce-rtx-3080-gpu-uses-19-gbps-gddr6x-memory-and-not-faster-variants.272420/

Implying that they knew about the thermal issues also implies that given it being enough of a problem they would:

  1. at the very least have provided a more active cooling for their in-house FE and reference cards since they were the ones who designed it
  2. provided AIB's with the information to account for those temperatures in cooling.

which is why I feel this muddies the waters... if nvidia knew about this, why wouldn't their own FE designs that had previously been criticized for providing unfair competition for their AIBs not include a more active cooling solution for a thermal problem they would have been aware of

My profile picure is real. That's what I look like in real life. I'm actually a blue and white African Wild Dog.

Ryzen 9 5900X - MSI Ventus 2x OC 3060 Ti - 2x8GB Corsair Vengeance LPX 3200MHz CL16 - ASRock B550 Phantom Gaming ITX/ax

EVGA CLC 280 + 2x140mm NF-A14 - Samsung 850 EVO 500GB + WD Black SN750 1TB - Windows 11/10 - EVGA Supernova G3 1000W

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, DaJakerBoss said:

If cards fail, nvidia and their board partners could probably face class-action lawsuits for this kind of major design oversight/flaw

This is not quite related, but hmmmmmm... https://www.theverge.com/circuitbreaker/2019/8/28/20837336/amd-12-million-false-advertising-class-action-lawsuit-bulldozer-chips

Also if you thought that was bad - https://www.autosafety.org/ford-knew-focus-fiesta-models-had-flawed-transmission-sold-them-anyway/

 

I'm pretty sure even Linus himself said that ultimately corporations are "not your friend" no matter who you are a "fanboy/fangirl/fan???" of.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Nacht said:

This is why you watercool it

Does anyone have water-cooled VRAM (note VRAM not VRM) temps to post for reference (Google mixes the two terms up with its "fuzzy search" functionality; no luck with using the term "memory" instead in Google search nor searching this forum directly either)?  And yes you must post the temps under sustained 100% GDDR6X load when thermal equilibrium has been (basically) reached; otherwise it doesn't count.

Link to comment
Share on other sites

Link to post
Share on other sites

it the GPU jnc temp not like the AMD junction temp where its the hottest part but not all of it at that temp?

Folding Stats

 

SYSTEM SPEC

AMD Ryzen 5 5600X | Motherboard Asus Strix B550i | RAM 32gb 3200 Crucial Ballistix | GPU Nvidia RTX 3070 Founder Edition | Cooling Barrow CPU/PUMP Block, EKWB Vector GPU Block, Corsair 280mm Radiator | Case NZXT H1 | Storage Sabrent Rocket 2tb, Samsung SM951 1tb

PSU NZXT S650 SFX Gold | Display Acer Predator XB271HU | Keyboard Corsair K70 Lux | Mouse Corsair M65 Pro  

Sound Logitech Z560 THX | Operating System Windows 10 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, shaz2sxy said:

it the GPU jnc temp not like the AMD junction temp where its the hottest part but not all of it at that temp?

If you're trying to say that we're not seeing up to 110 C as an average temperature of the VRAM temperature, then yes you'd be correct.  But remember when we're talking about triple-digit positive degrees Celsius temperatures, we're already venturing into rather dangerous territory.  I think Wikipedia's definition describes the concept of junction temperature (Tjunction) best when it says this:

Quote

Junction temperature, short for transistor junction temperature, is the highest operating temperature of the actual semiconductor in an electronic device

This means that any attempt at any operations higher than Tjunction will at least greatly reduce the lifespan of the transistors, if not outright kill it.  This means that if the temperature reported and used to throttle the VRAM was the average instead, then it means there are much more likely transistors operating beyond that maximum rated temperature (i.e. transistors which are very likely going to literally roast themselves to death; think the infamous video of an ancient AMD CPU releasing puffs of "magic smoke").

 
Maximum or average temperatures aside, I think most have agreed so far that the bigger issue is not just that oh boy these VRAM chips are sure toasty, but that it was also at stock with a completely redesigned reference cooler.  AMD's main issue was that they were still sticking to relatively-inefficient blower-style coolers which simply wasn't removing heat nearly as quickly enough as most other types of already existing thermal solutions.  As the RX 5700 XT was notorious for hitting high maximum junction Temperatures, with the reference model peaking at 110 C, attached is Gamer's Nexus thermal testing of a Sapphire Pulse Model of that card with the "Performance (i.e. regular)" VBIOS and the "Quiet" VBIOS.  As you can see, the only time that any of the temperatures reached 90+ degree C is when the fans were basically never allowed to ramp up, and that was just with a fairly average two-fan-equipped heat-sink.

In addition, GDDR6X and especially its performance is supposed to be a main selling feature of both the 3080 and 3090.  Furthermore, Nvidia has always (afaik) took pride in being also able to cater their high end GeForce GPUs towards content creators, who are much more likely doing VRAM intensive workloads than your average gamer.  So it's not simply just that it's a hot VRAM, it's that under basically any kind of meaningful VRAM-intensive workloads at stock the VRAM is having to thermally throttle itself constantly (and thus greatly hinder not only its actual performance but also its longevity) to again, prevent itself from being completely toast.

Hope this wasn't too long and that you understood my point with my OP.

 

sapphire-rx-5700-xt-pulse.png

Link to comment
Share on other sites

Link to post
Share on other sites

I can confirm as I just ran tests with my TUF 3080 that it does not reach close to this under max load. My memory peaked at 74 degrees and leveled out at 70 degrees.

GPU: XFX RX 7900 XTX

CPU: Ryzen 7 7800X3D

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Orangeator said:

I can confirm as I just ran tests with my TUF 3080 that it does not reach close to this under max load. My memory peaked at 74 degrees and leveled out at 70 degrees.

Could you post some screenshots? And what were the tests that you ran (i.e. was it memory-intensive workloads)? Also it seems that your TUF card is air-cooled and not liquid-cooled?

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×