Jump to content

Amazon's mmo New World is bricking 3090 gpus

spartaman64
6 hours ago, joaopt said:

Not sure how that proves that other cards than FTW3 died because of the New Worlds bug.

 

Again, I've supplied links showing people had the same issue but didn't have a FTW3. I never said there wasn't an issue with EVGA GPUs specifically. All I said is that calling people (that mention there are other GPUs that have the issue) liars is rather disingenuous when there's proof it does affect other GPUs than the FTW3.

 

AGAIN ; the fact that other GPUs are affected by that bug doesn't mean that EVGA is clear of any wrong doing with their designs, I've never argued that and even mentioned the GPUs other than the FTW3 that have died might have died anyway because of other issues.

If you need help with your forum account, please use the Forum Support form !

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, CarlBar said:

Whilst we have no idea yet if this is the failure mode for the majority of affected GPU's, if it is it raises all kinds of issues.

 

1. Why did the board get a hole blown in it. The voltage shouldn't go much above or below 12v since this is PSU feed, and arcing is voltage based. The only thing i can think of is a really odd fuse failure mode of some kind that made it blow up or only partially fail. But that then raises the question what it was and what triggered it.

 

2. How did the cards pull that much current. These fuses are generally rated with OC'ing in mind, i doubt everyone involved was raising the voltage on their cards, and without that the card shouldn't have been able to pull enough to blow the fuse. So WTH is happening here exactly.

That hole creeped the hell out of me too.  It appears to be where the fuse was supposed to be which implied the thing exploded.  That’s exactly what fuses ARENT supposed to do and why they exist. Takes a lotta voltage to do that if it was the fuse itself that exploded.  Maybe something underneath it. Either way “bad engineer! Bad!  No cookie!”   A fuse that explodes like that doesn’t save anything. Sure the downstream stuff might still be viable but with a crater like that it doesn’t matter.  The whole board is toast.  Might be able to desoldering the parts and use them on something else but the board is still dead.  It’s not like that fuse can be just replaced now.

Edited by Bombastinator

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, wkdpaul said:

Not sure how that proves that other cards than FTW3 died because of the New Worlds bug.

 

Again, I've supplied links showing people had the same issue but didn't have a FTW3. I never said there wasn't an issue with EVGA GPUs specifically. All I said is that calling people (that mention there are other GPUs that have the issue) liars is rather disingenuous when there's proof it does affect other GPUs than the FTW3.

 

AGAIN ; the fact that other GPUs are affected by that bug doesn't mean that EVGA is clear of any wrong doing with their designs, I've never argued that and even mentioned the GPUs other than the FTW3 that have died might have died anyway because of other issues.

if you are going to comment this issue you have to do some background on it, i was a lot on the EVGA forum because of the queues and this cards have been dying long before NW, people where on 2nd and 3rd RMA and one on the 6th, there was clearly a problem with the cards that NW runaway fps only made it more apparent, the other cards are just cards that joined the band, every day cards die, more apparently when a game behaves like this and everyone is playing it at the same time, it where cards that were marked to die anyway, in here or anywhere else.

I fell bad for people with EVGA cards, and washing this doesn't help their RMA loop in any way, it's sad no one pressures them like they do with problems in other brands, anyway it's just how the world works, EVGA got most of the influencers controled. Sorry for the rambling

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Forbidden Wafer said:

Buildzoid said they at least protected the rest of the components by doing that. Other OEMs don't include the fuse and blow up the VRM/GPU die.

Yeah,but in that case the PCB is toast.

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Forbidden Wafer said:

I'm not really sure, but I do think something went wrong other than the fuse. Maybe it didn't break the connection completely and created an arc that melted everything, or it got hot enough to melt the resin letting traces short (e.g. fuse with wrong rating). 

It's not supposed to blow-up like that,and that adds another thing for manufacturers to check - Will the fuse protect the card or turn the PCB into charcoal?

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

52 minutes ago, Vishera said:

It's not supposed to blow-up like that,and that adds another thing for manufacturers to check - Will the fuse protect the card or turn the PCB into charcoal?

I don’t know what was with that crater.  Needs to be found out though.  That this code knocked over a 590 even if it didn’t knock it out is very worrisome.  Knocking stuff out is worse of course. But even over is bad.  A 590 does NOT run a newly designed chip.  That’s some not-so-new architecture.   It is pretty hot rodded though. The 590 is that gpu pushed to the limit.  I suspect someone already knows exactly what is going on here, they just haven’t said so, but independant research happens too.  Keeping it secret may delay it but it won’t stop it.

 

Things I’m wondering about now: what DOESNT it knock over and why?

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, CarlBar said:

1. Why did the board get a hole blown in it. The voltage shouldn't go much above or below 12v since this is PSU feed, and arcing is voltage based. [...]

This is not necessarily true about the voltage. Having inductors in the circuit (power stages) running at high current can produce insanely high voltages if a popping fuse suddenly tries open the circuit on them. As I was thinking on the exploding fuse, this would be my explanation. Being unable to withstand cutting the current from a card drawing 100% power at this event because some of the power stored in the inductors is also getting dissipated on the popping fuse. But note that this is not the main problem here.

 

The fuse starts to pop because there is already a short circuit somewhere deeper, probably a power stage about to blow the same way. The power stage fail-safes (likely overcurrent protection) are the ones to be fixed first.

         \   ^__^ 
          \  (oo)\_______
             (__)\       )\/\
Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, grg994 said:

Here is a picture of a fried EVGA 3090 (https://www.guru3d.com/news-story/mmo-new-world-(closed-beta)-is-killing-geforce-rtx-3090-graphics-cards😞

https://tweakers.net/i/s57I50unQz11JSYsMEZcA66zh9c=/1280x/filters:strip_icc():strip_exif()/i/2004500540.jpeg

 

The visible damage is in the where I marked on this pcb layout. That component should be a fuse on one of the 12V lines, connected directly after the rightmost 8pin.

image.thumb.png.a9151b6597f7f7f371c9252f44306c1c.png

 

Looks like Nvida/EVGA is redefining the term "a blown fuse"...

Please note that image is where on the website they mentioned the board was modified to bypass the current protection.

 

image.thumb.png.9dfca93aad8ddeec35e83163e5983095.png

https://www.igorslab.de/en/evga-geforce-rtx-3080-rtx-3090-and-not-only-new-world-when-the-graphics-card-goes-amok-because-of-design-failures/2/

 

Please be careful, as the way this image was mentioned takes it completely out of context.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Kisai said:

Please note that image is where on the website they mentioned the board was modified to bypass the current protection.

Where exactly?

In the whole article i didn't see them mentioning that OCP was disabled.

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Vishera said:

Where exactly?

In the whole article i didn't see them mentioning that OCP was disabled.

image.thumb.png.dc107b7234b0a7902103070b6988ef38.png

On the first page. 

 

I'm pointing out that the example image is in the context of "this is what the damage looks like when you bypass the protection circuit"

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, grg994 said:

This is not necessarily true about the voltage. Having inductors in the circuit (power stages) running at high current can produce insanely high voltages if a popping fuse suddenly tries open the circuit on them. As I was thinking on the exploding fuse, this would be my explanation. Being unable to withstand cutting the current from a card drawing 100% power at this event because some of the power stored in the inductors is also getting dissipated on the popping fuse. But note that this is not the main problem here.

 

The fuse starts to pop because there is already a short circuit somewhere deeper, probably a power stage about to blow the same way. The power stage fail-safes (likely overcurrent protection) are the ones to be fixed first.

Another theory about this one: electricity CAN cause explosions but it isn’t the only thing that does.  This could be a weird one instigated by, say, heat.  You’re more likely to be right because PCB, but the balls roll funny for everybody.  *wants smarter everyday on the case* this very well could be a specifically electrical engineering thing, but it could also be a weird general physics thing about rapidly expanding gas or something.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

Please post the original picture, not the troll one overcirculating on "tech" sites and fanboy sites.

The seller of that card even said it was shunt modded and the buyer incorrectly applied the original heatsink, when that card was sold as watercooling compatible ONLY.

The original heatsink SHORTED the shunts.  You can look at the original picture and see for yourself.

 

 

image0 (1).jpg

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, Falkentyne said:

Please post the original picture, not the troll one overcirculating on "tech" sites and fanboy sites.

The seller of that card even said it was shunt modded and the buyer incorrectly applied the original heatsink, when that card was sold as watercooling compatible ONLY.

The original heatsink SHORTED the shunts.  You can look at the original picture and see for yourself.

 

 

image0 (1).jpg

So no bad engineer, so much as bad purchasing people?  Is this just one card single card? The level of what constitutes a “seller” and a “buyer” is a bit vague here. is the seller or the buyer evga? If so, which one?  Might take care of the particular model of 3090 blue smoke issue.  How bad is the “other cards” going down but not bricking thing? Seems the numbers are higher than one.  Not sure one what even.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Bombastinator said:

So no bad engineer, so much as bad purchasing people?  Might take care of the particular model of 3090 blue smoke issue.  How bad is the “other cards” going down but not bricking thing? 

This person's failure as in the picture was not directly related to the ICX controller.  This guy actually burned the VRM's/shunts from a short circuit with the stacked shunts and heatsink.  The original seller of that card was on Elmor's discord and explained exactly what happened.

 

Most of the cards that are dying are not burning up in smoke and fire like this one.

 

The UP9511 problem is real.  It's the ICX controller that's burning out the cards/VRM's/fuses.  It's causing the fan to report improper RPM and forcing the fan to move at maximum (beyond what you can set in software) fan speed when the super high reading occurs.  The exact same flaw is causing the card to trip OCP.  Without an oscilloscope hooked up to the VRM and logging everything, it's sort of difficult to determine what exact trigger is killing the card.  But it's clearly either overamperage or overvoltage of something.

 

There was a user over on evga forums who tested this specifically in Anno 1800 or whatever that game is called.  When he clicked a building tooltip, he randomly saw the fan report 2 million RPM in GPU-Z.  If he did not INSTANTLY close the tooltip when this happened, the card black screened (with max fan speed) very shortly after.  He saw the same thing happen in New World.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Falkentyne said:

This person's failure as in the picture was not directly related to the ICX controller.  This guy actually burned the VRM's/shunts from a short circuit with the stacked shunts and heatsink.  The original seller of that card was on Elmor's discord and explained exactly what happened.

 

Most of the cards that are dying are not burning up in smoke and fire like this one.

 

The UP9511 problem is real.  It's the ICX controller that's burning out the cards/VRM's/fuses.  It's causing the fan to report improper RPM and forcing the fan to move at maximum (beyond what you can set in software) fan speed when the super high reading occurs.  The exact same flaw is causing the card to trip OCP.  Without an oscilloscope hooked up to the VRM and logging everything, it's sort of difficult to determine what exact trigger is killing the card.  But it's clearly either overamperage or overvoltage of something.

 

There was a user over on evga forums who tested this specifically in Anno 1800 or whatever that game is called.  When he clicked a building tooltip, he randomly saw the fan report 2 million RPM in GPU-Z.  If he did not INSTANTLY close the tooltip when this happened, the card black screened (with max fan speed) very shortly after.  He saw the same thing happen in New World.

Here’s hoping I don’t come off like that whole injecting bleach thing with this one.  
 

I’m Having trouble finding out what up9511 refers to exactly. It sounds a bit like a model number for the specific evga 3090 model but might be only part of that card?

I can’t seem to determine it with casual internetting 
 

It is sounding to me that the claim is the problem shown in the photo applies to only one specific possibly not even retail card and is not an example of what is going on commonly and furthermore is unrelated. (No fault to the poster here though possibly to the original one)
However that does not mean that a permanent problem is not occurring, just not that one. 

 

That’s seperate from the explanation of what is happening though, which is a bit difficult to unravel.  (Specifically for me.  It could just be me not understanding) Perhaps it’s just the pronoun, or that there is more than one overcurrent protection circuit in the machine but only one is being tripped, or that word “tripped” itself which implies a reset. Fuses in my lexicon blow not trip.  Sometimes they melt so fast they explode, but it’s a non resetting thing. You have to replace the fuse.  My understanding is when overcurrent protection in a machine’s power supply box “trips” it at least often automatically resets when power is removed.  No fuse has to be replaced.  There very well might not be room or capacity for such things inside a video card though. 
 

is this ICX ACTUALLY burning things out or just causing behavior that makes people think it is burning things out because it’s tripping an over current protection every time the card is used or just once if the overcurrent protection is not a breaker but a fuse? In that case a fuse would still need replacing, so a fuse blowing but not catastrophically as implied in the photo. Blown fuse dead card though. 

 

OCP could refer to the overcurrent protection for the computers power supply or one onboard the video card. 
 

Might explain why some cards are failing but not permanently.  It depends which OCP gets tripped first.  It could then apply to multiple cards, which would mean the non up9511 cards(?) having issues are having near death experiences and only being saved because their PSU OCP trips. So bad for everyone? A piece of code that can make a computer reset to save itself is potentially a problem. 

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, Bombastinator said:

Here’s hoping I don’t come off like that whole injecting bleach thing with this one.  
 

I’m Having trouble finding out what up9511 refers to exactly. It sounds a bit like a model number for the specific evga 3090 model but might be only part of that card?

I can’t seem to determine it with casual internetting 
 

It is sounding to me that the claim is the problem shown in the photo applies to only one specific possibly not even retail card and is not an example of what is going on commonly and furthermore is unrelated. (No fault to the poster here though possibly to the original one)
However that does not mean that a permanent problem is not occurring, just not that one. 

 

That’s seperate from the explanation of what is happening though, which is a bit difficult to unravel.  (Specifically for me.  It could just be me not understanding) Perhaps it’s just the pronoun, or that there is more than one overcurrent protection circuit in the machine but only one is being tripped, or that word “tripped” itself which implies a reset. Fuses in my lexicon blow not trip.  Sometimes they melt so fast they explode, but it’s a non resetting thing. You have to replace the fuse.  My understanding is when overcurrent protection in a machine’s power supply box “trips” it at least often automatically resets when power is removed.  No fuse has to be replaced.  There very well might not be room or capacity for such things inside a video card though. 
 

is this ICX ACTUALLY burning things out or just causing behavior that makes people think it is burning things out because it’s tripping an over current protection every time the card is used or just once if the overcurrent protection is not a breaker but a fuse? In that case a fuse would still need replacing, so a fuse blowing but not catastrophically as implied in the photo. Blown fuse dead card though. 

 

OCP could refer to the overcurrent protection for the computers power supply or one onboard the video card. 
 

Might explain why some cards are failing but not permanently.  It depends which OCP gets tripped first.  It could then apply to multiple cards, which would mean the non up9511 cards(?) having issues are having near death experiences and only being saved because their PSU OCP trips. So bad for everyone? A piece of code that can make a computer reset to save itself is potentially a problem. 

No one knows. . Until someone mans up and sends a card to Elmor or Buildzoid, no one will ever know.

All I know is cause and effect.  I know observation.   If someone gets 2 million RPM fan speed (reported) then black screens a few seconds later, it's VERY clear that BOTH problems are related.  It doesn't take a high IQ to figure that out.

And what controls the fans and the black screen (OCP)?  The VRM controller / phases....

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, Falkentyne said:

No one knows. . Until someone mans up and sends a card to Elmor or Buildzoid, no one will ever know.

All I know is cause and effect.  I know observation.   If someone gets 2 million RPM fan speed (reported) then black screens a few seconds later, it's VERY clear that BOTH problems are related.  It doesn't take a high IQ to figure that out.

And what controls the fans and the black screen (OCP)?  The VRM controller / phases....

Or no one who is saying so on the internet does at least.  EA and evga are taking actions which imply they think they do. They seem to be opposed though at least partially.  I do not enjoy being a mushroom person. I doubt many do.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 month later...

Small update to this, apparently the problem was caused by bad soldering.

 

https://www.tomshardware.com/news/poor-soldering-killed-24-evga-geforce-rtx-3090-gpus

Quote

After analyzing the 24 deceased GeForce RTX 3090 graphics cards, the company discovered that real issue was due to "poor workmanship." Apparently, the soldering around the graphics card's MOSFET circuits leaves much to desire.

 

EVGA claimed that the soldering problem only affects a handful of GeForce RTX 3090 graphics cards that were part of the early production run in 2020. Although EVGA didn't reveal concrete numbers, the company affirmed that the affected batch is less than 1% of all the graphics cards that it has sold.

 

CPU: Intel i7 6700k  | Motherboard: Gigabyte Z170x Gaming 5 | RAM: 2x16GB 3000MHz Corsair Vengeance LPX | GPU: Gigabyte Aorus GTX 1080ti | PSU: Corsair RM750x (2018) | Case: BeQuiet SilentBase 800 | Cooler: Arctic Freezer 34 eSports | SSD: Samsung 970 Evo 500GB + Samsung 840 500GB + Crucial MX500 2TB | Monitor: Acer Predator XB271HU + Samsung BX2450

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Spotty said:

Small update to this, apparently the problem was caused by bad soldering

nice work EVGA. 3090s are a mess for being the top end card

Good luck, Have fun, Build PC, and have a last gen console for use once a year. I should answer most of the time between 9 to 3 PST

NightHawk 3.0: R7 5700x @, B550A vision D, H105, 2x32gb Oloy 3600, Sapphire RX 6700XT  Nitro+, Corsair RM750X, 500 gb 850 evo, 2tb rocket and 5tb Toshiba x300, 2x 6TB WD Black W10 all in a 750D airflow.
GF PC: (nighthawk 2.0): R7 2700x, B450m vision D, 4x8gb Geli 2933, Strix GTX970, CX650M RGB, Obsidian 350D

Skunkworks: R5 3500U, 16gb, 500gb Adata XPG 6000 lite, Vega 8. HP probook G455R G6 Ubuntu 20. LTS

Condor (MC server): 6600K, z170m plus, 16gb corsair vengeance LPX, samsung 750 evo, EVGA BR 450.

Spirt  (NAS) ASUS Z9PR-D12, 2x E5 2620V2, 8x4gb, 24 3tb HDD. F80 800gb cache, trueNAS, 2x12disk raid Z3 stripped

PSU Tier List      Motherboard Tier List     SSD Tier List     How to get PC parts cheap    HP probook 445R G6 review

 

"Stupidity is like trying to find a limit of a constant. You are never truly smart in something, just less stupid."

Camera Gear: X-S10, 16-80 F4, 60D, 24-105 F4, 50mm F1.4, Helios44-m, 2 Cos-11D lavs

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, GDRRiley said:

nice work EVGA. 3090s are a mess for being the top end card

pay more, get less.

Want to pay extra for some bad joints or bad components?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Quackers101 said:

Want to pay extra for some bad joints or bad components?

no thats why if you really were a baller you by quadros

Good luck, Have fun, Build PC, and have a last gen console for use once a year. I should answer most of the time between 9 to 3 PST

NightHawk 3.0: R7 5700x @, B550A vision D, H105, 2x32gb Oloy 3600, Sapphire RX 6700XT  Nitro+, Corsair RM750X, 500 gb 850 evo, 2tb rocket and 5tb Toshiba x300, 2x 6TB WD Black W10 all in a 750D airflow.
GF PC: (nighthawk 2.0): R7 2700x, B450m vision D, 4x8gb Geli 2933, Strix GTX970, CX650M RGB, Obsidian 350D

Skunkworks: R5 3500U, 16gb, 500gb Adata XPG 6000 lite, Vega 8. HP probook G455R G6 Ubuntu 20. LTS

Condor (MC server): 6600K, z170m plus, 16gb corsair vengeance LPX, samsung 750 evo, EVGA BR 450.

Spirt  (NAS) ASUS Z9PR-D12, 2x E5 2620V2, 8x4gb, 24 3tb HDD. F80 800gb cache, trueNAS, 2x12disk raid Z3 stripped

PSU Tier List      Motherboard Tier List     SSD Tier List     How to get PC parts cheap    HP probook 445R G6 review

 

"Stupidity is like trying to find a limit of a constant. You are never truly smart in something, just less stupid."

Camera Gear: X-S10, 16-80 F4, 60D, 24-105 F4, 50mm F1.4, Helios44-m, 2 Cos-11D lavs

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Spotty said:

Small update to this, apparently the problem was caused by bad soldering.

 

https://www.tomshardware.com/news/poor-soldering-killed-24-evga-geforce-rtx-3090-gpus

 

 

Aren't these things mostly done by automated machines? Sounds more likely to be bad solder or a poorly maintained machine than actual human screwup.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, CarlBar said:

 

Aren't these things mostly done by automated machines? Sounds more likely to be bad solder or a poorly maintained machine than actual human screwup.

Humans setup and maintain the machines though. 

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×