Jump to content

Allegations of Unusually High Failure Rate Among Ryzen 5000 and 500 Series Motherboards by System Integrator

SPARTAN VI

Summary

System integrator PowerGPU reported that they had purchased an estimated 320 units of Ryzen 5000 processors for assembly in custom builds. Of those 320 CPUs, 19 of them were dead on arrival (DOA), which represents a near 6% failure rate among units shipped to them. They classify "DOA" as cosmetically perfect processor (e.g. no bent pins) that did not POST during their initial build, but would POST after swapping to a different but identical processor.

 

Quotes

Quote

 AMD Ryzen 5000 Series processors and 500 Series motherboards reportedly have a higher than average failure rate. That’s according to custom gaming PC shop PowerGPU, which shared a tweet yesterday explaining how it had encountered a relatively high amount of Zen 3 processors that were dead on arrival. Contrastingly, PowerGPU noted that it only ran into one dead Intel CPU (an i9-9700K) over the course of its business.

“Before the 5000 series it was 80% intel and 20% AMD and we only had 1 Intel CPU die in the past 2 years,” Power GPU tweeted. “Also the boards from AMD have the highest failure rate. Every week it’s at least 3-5 boards DOA from B550 to X570’s.”

 

My thoughts

Granted a single data point - one system integrator's experience - does not remotely establish an overall trend, I hope this leads to an open conversation from other system integrators and DIY builders who can corroborate and/or challenge PowerGPU's figures relative to their volume. We simply need more data for the law of large numbers to work. Currently, it is more likely that there's something environmentally isolated to PowerGPU's situation than just being "extremely unlucky." PowerGPU did state that prior to Ryzen 5000, they typically experience a failure rate of less than 1%

 

Sources

FPS Review article

WCCFTech article

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, SPARTAN VI said:

Of those 320 CPUs, 19 of them were dead on arrival (DOA),

ouch

🌲🌲🌲

 

 

 

◒ ◒ 

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, SPARTAN VI said:

19 of them were dead on arrival (DOA), which represents a roughly 6% failure rate

Having talked with people in IT, I've come to understand it that you can expect to see (or should plan to see) around a 10% DOA/defect rate for bulk electronics purchases. 

Laptop: 2019 16" MacBook Pro i7, 512GB, 5300M 4GB, 16GB DDR4 | Phone: iPhone 13 Pro Max 128GB | Wearables: Apple Watch SE | Car: 2007 Ford Taurus SE | CPU: R7 5700X | Mobo: ASRock B450M Pro4 | RAM: 32GB 3200 | GPU: ASRock RX 5700 8GB | Case: Apple PowerMac G5 | OS: Win 11 | Storage: 1TB Crucial P3 NVME SSD, 1TB PNY CS900, & 4TB WD Blue HDD | PSU: Be Quiet! Pure Power 11 600W | Display: LG 27GL83A-B 1440p @ 144Hz, Dell S2719DGF 1440p @144Hz | Cooling: Wraith Prism | Keyboard: G610 Orion Cherry MX Brown | Mouse: G305 | Audio: Audio Technica ATH-M50X & Blue Snowball | Server: 2018 Core i3 Mac mini, 128GB SSD, Intel UHD 630, 16GB DDR4 | Storage: OWC Mercury Elite Pro Quad (6TB WD Blue HDD, 12TB Seagate Barracuda, 1TB Crucial SSD, 2TB Seagate Barracuda HDD)
Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, DrMacintosh said:

Having talked with people in IT, I've come to understand it that you can expect to see (or should plan to see) around a 10% DOA rate for bulk electronics purchases. 

10% is a bit high but that depends on what it is. Should be around 3% for CPUs, HDDs, SSDs, RAM etc and a little higher on motherboards and graphics cards.

Link to comment
Share on other sites

Link to post
Share on other sites

6% could be very high.  I don’t know what more common failure rates are.  That kind of thing varies a lot by sector.  One would expect testing to be done beforehand.  I don’t see how a failure rate that high could get through a testing system. I would also like to see rates of chips that booted but ran hot.  There is such a thing as “barely works” with CPUs. 

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, DrMacintosh said:

Having talked with people in IT, I've come to understand it that you can expect to see (or should plan to see) around a 10% DOA/defect rate for bulk electronics purchases. 

So they not go through testing or something?

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, leadeater said:

10% is a bit high but that depends on what it is.

That's true, I believe they were talking about completed systems. 

Laptop: 2019 16" MacBook Pro i7, 512GB, 5300M 4GB, 16GB DDR4 | Phone: iPhone 13 Pro Max 128GB | Wearables: Apple Watch SE | Car: 2007 Ford Taurus SE | CPU: R7 5700X | Mobo: ASRock B450M Pro4 | RAM: 32GB 3200 | GPU: ASRock RX 5700 8GB | Case: Apple PowerMac G5 | OS: Win 11 | Storage: 1TB Crucial P3 NVME SSD, 1TB PNY CS900, & 4TB WD Blue HDD | PSU: Be Quiet! Pure Power 11 600W | Display: LG 27GL83A-B 1440p @ 144Hz, Dell S2719DGF 1440p @144Hz | Cooling: Wraith Prism | Keyboard: G610 Orion Cherry MX Brown | Mouse: G305 | Audio: Audio Technica ATH-M50X & Blue Snowball | Server: 2018 Core i3 Mac mini, 128GB SSD, Intel UHD 630, 16GB DDR4 | Storage: OWC Mercury Elite Pro Quad (6TB WD Blue HDD, 12TB Seagate Barracuda, 1TB Crucial SSD, 2TB Seagate Barracuda HDD)
Link to comment
Share on other sites

Link to post
Share on other sites

The impression I got from the description was it was just CPUs not whole finished rigs.  Might be wrong though. 

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

Doesn't Ryzen have a problem where the CPU will not always make proper contact and needs a reseat? Or is that just for threadripper? If it's the former, that could explain the high rate of "DOA" since the CPUs might have worked if they were reseated. If it's the latter, well ignore what I said.

 

6 minutes ago, SPARTAN VI said:

Our friends at HardwareUnboxed checked with their local integrator/s and came back with a <2% failure rate for their Ryzen 5000 CPUs:

 

https://twitter.com/HardwareUnboxed/status/1361135468767715329?s=20

In the tweet directly below that one he said, "It's far more likely most of those CPUs were okay and just needed to be re-seated or they haven't been doing enough BIOS cross testing. For example there can be issues with certain RAM/ MB & CPU combos."

 

Not really sure why he said that last part since if the OP is correct, the only part PowerGPU replaces was the CPU with another one of the exact same model and the systems started working, so I'm not sure what different RAM/MB & CPU combos have to do with this. Anyone want to enlighten me?

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, The_russian said:

Doesn't Ryzen have a problem where the CPU will not always make proper contact and needs a reseat? Or is that just for threadripper?

Mostly Threadripper, PGA actually has far less of that type of issue compared to LGA. PGA has a solid base the CPU sits on where as LGA the CPU floats on top of the pins and uneven or too much pressure will lift the CPU away from the pins or push the pins off the contact pads. Larger the CPU the more this is a problem, leverage.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, leadeater said:

Mostly Threadripper, PGA actually has far less of that type of issue compared to LGA. PGA has a solid base the CPU sits on where as LGA the CPU floats on top of the pins and uneven or too much pressure will lift the CPU away from the pins or push the pins off the contact pads. Larger the CPU the more this is a problem, leverage.

That's what I was thinking, but for some reason I thought I remembered reseating the CPU being a fairly common troubleshooting step for Ryzen problems. Guess I was just thinking of Threadripper then. 

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, leadeater said:

Mostly Threadripper, PGA actually has far less of that type of issue compared to LGA. PGA has a solid base the CPU sits on where as LGA the CPU floats on top of the pins and uneven or too much pressure will lift the CPU away from the pins or push the pins off the contact pads. Larger the CPU the more this is a problem, leverage.

Intel keeps upping their pin count.  Iirc the newest one is 1700 pins.  One could see that turning into a serious problem.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

Aside from receiving something DOA, I do have to wonder how much of this failure rate is caused by poor handling and poor installation by this sole system integrator. Not to mention how poorly a complete system, or components are handled in shipping.

Link to comment
Share on other sites

Link to post
Share on other sites

42 minutes ago, leadeater said:

10% is a bit high but that depends on what it is. Should be around 3% for CPUs, HDDs, SSDs, RAM etc and a little higher on motherboards and graphics cards.

Puget's numbers are <1% for Intel CPUs.  https://www.pugetsystems.com/labs/articles/What-is-the-most-reliable-hardware-in-our-Puget-Systems-workstations-1550/

 

Makes sense though considering every single one is tested immediately before shipment.  Failure would almost always be bad handling practice (either directly by the builder or somewhere in the shipment chain)

Workstation:  14700nonk || Asus Z790 ProArt Creator || MSI Gaming Trio 4090 Shunt || Crucial Pro Overclocking 32GB @ 5600 || Corsair AX1600i@240V || whole-house loop.

LANRig/GuestGamingBox: 9900nonK || Gigabyte Z390 Master || ASUS TUF 3090 650W shunt || Corsair SF600 || CPU+GPU watercooled 280 rad pull only || whole-house loop.

Server Router (Untangle): 13600k @ Stock || ASRock Z690 ITX || All 10Gbe || 2x8GB 3200 || PicoPSU 150W 24pin + AX1200i on CPU|| whole-house loop

Server Compute/Storage: 10850K @ 5.1Ghz || Gigabyte Z490 Ultra || EVGA FTW3 3090 1000W || LSI 9280i-24 port || 4TB Samsung 860 Evo, 5x10TB Seagate Enterprise Raid 6, 4x8TB Seagate Archive Backup ||  whole-house loop.

Laptop: HP Elitebook 840 G8 (Intel 1185G7) + 3080Ti Thunderbolt Dock, Razer Blade Stealth 13" 2017 (Intel 8550U)

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, The_russian said:

That's what I was thinking, but for some reason I thought I remembered reseating the CPU being a fairly common troubleshooting step for Ryzen problems. Guess I was just thinking of Threadripper then. 

It is. Fairly common for all PGA, not something new to AM4....

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, ShrimpBrime said:

It is. Fairly common for all PGA, not something new to AM4....

I don't think I've ever had a bad CPU mount on PGA, not on LGA either for that matter 🤷‍♂️

 

But it should be less common on PGA not more.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, TempestCatto said:

Aside from receiving something DOA, I do have to wonder how much of this failure rate is caused by poor handling and poor installation by this sole system integrator. Not to mention how poorly a complete system, or components are handled in shipping.

There is a reason, but what that reason is is unknown.  

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Bombastinator said:

Intel keeps upping their pin count.  Iirc the newest one is 1700 pins.  One could see that turning into a serious problem.

It's no so bad, been using 1366 for a long ass time years ago (even today), 2011 more recently and now the current stuff is even more than that. HEDT and server mind you, but still not that much of a problem. I think the retention and cooling mounting designs have the most to do with it, you get a good mechanism to place even and correct pressure over the CPU and you will rarely have problems.

 

What worries me more about larger sockets is the bigger opening for something to drop in to it and ruin the pins (LGA), that's a death sentence to the motherboard. That I actually do worry about.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

It's no so bad, been using 1366 for a long ass time years ago (even today), 2011 more recently and now the current stuff is even more than that. HEDT and server mind you, but still not that much of a problem. I think the retention and cooling mounting designs have the most to do with it, you get a good mechanism to place even and correct pressure over the CPU and you will rarely have problems.

 

What worries me more about larger sockets is the bigger opening for something to drop in to it and ruin the pins (LGA), that's a death sentence to the motherboard. That I actually do worry about.

I was thinking in the future rather than the present.  Both LGA and PGA have pins.  It’s just a question of what side they are attached to.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Bombastinator said:

I was thinking in the future rather than the present.  Both LGA and PGA have pins.  It’s just a question of what side they are attached to.

Yea but we have a fair ways to go before that's going to be a problem on consumer platform, Xeon is 3647 and EPYC is 4094 neither of which have that bad mounting problems, they do however make a nice large target for old Linus Drop Tips over there to drop the CPU into while building the system 🤣

Link to comment
Share on other sites

Link to post
Share on other sites

I don't think that's too far-fetched from reality. There have been reports of some new CPUs being rather flaky. Not quite DOA in a lot of them but defective enough to warrant a replacement.

 

It's probably tied to the supply issues and such at TSMC. There's probably a chance that some defects may have been pushed out the door unnoticed.

The Workhorse (AMD-powered custom desktop)

CPU: AMD Ryzen 7 3700X | GPU: MSI X Trio GeForce RTX 2070S | RAM: XPG Spectrix D60G 32GB DDR4-3200 | Storage: 512GB XPG SX8200P + 2TB 7200RPM Seagate Barracuda Compute | OS: Microsoft Windows 10 Pro

 

The Portable Workstation (Apple MacBook Pro 16" 2021)

SoC: Apple M1 Max (8+2 core CPU w/ 32-core GPU) | RAM: 32GB unified LPDDR5 | Storage: 1TB PCIe Gen4 SSD | OS: macOS Monterey

 

The Communicator (Apple iPhone 13 Pro)

SoC: Apple A15 Bionic | RAM: 6GB LPDDR4X | Storage: 128GB internal w/ NVMe controller | Display: 6.1" 2532x1170 "Super Retina XDR" OLED with VRR at up to 120Hz | OS: iOS 15.1

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, leadeater said:

10% is a bit high but that depends on what it is. Should be around 3% for CPUs, HDDs, SSDs, RAM etc and a little higher on motherboards and graphics cards.

Where did you get those numbers from?

They seem very high from what I have seen. I base that on both the statistics released by the largest French PC component retailer, and my own experience working for a company that supplies all municipality schools with laptops for their students (although that's not my area so I have quite limited insight into that).

10% would be extremely high in my eyes, not just "a bit high".

 

 

 

I think these news doesn't really tell us much. There are several reasons why this system integrated could have high failure rates.

1) They might have gotten a bad batch from AMD with abnormally high failure rates.

2) They are doing something wrong when they are handling or building with AMD's processors. I kind of doubt that since they specifically say it is the 5000 and 500 series that are having issues. So if they know how to handle the previous generation then I don't see why they would fuck this up. Also, it's their job to do this. I think they know what they are doing.

3) AMD's new products might just have high failure rates, which would be a problem.

 

Might be other reasons I can't think of.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, LAwLz said:

Where did you get those numbers from?

Well I have to admit this figure is actually expected failure rate over life span not manufacturing defect failure rate (DOA). These are the kinds of numbers thrown around when I talk to vendor technicians when they come in to replace parts for what that vendor would consider abnormally high and start an investigation in to why. This was around the failure rate HP required before they would start an investigation in to why our HP EliteDesk 8200 SFF motherboards were failing which then lead to them issuing a global advisory and recall on the affected batch, all ours had to be replaced.

 

CPUs definitely are lower than this but 3% is the threshold used by many, at least myself, to define if something has an abnormally high failure rate. Saying that in my entire life I've only had a single CPU fail on me, a 8890v4 (RIP, you were too good to die).

 

The bathtub failure rate model applies here, DOA failures would be the left hand side.

 

3-s2.0-B9781782421214000113-f11-03-9781782421214.jpg?_

 

More detailed/breakdown.

 

12e8f5_dcda83d292b64c87a74e33183675c8ba~mv2.webp

 

44 minutes ago, LAwLz said:

I base that on both the statistics released by the largest French PC component retailer

There some very high ones in there too but I guess that will likely be lack of sample size causing it. The R9 290 ooff, the Saphire R9 290 OC mega ooooff.

 

50 minutes ago, LAwLz said:

10% would be extremely high in my eyes, not just "a bit high".

Well I was purposefully being lenient with my wording.

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, leadeater said:

-snip-

Ah, that makes sense.

I thought you were saying that it is typical for 3% of for example laptops bought to be DOA because of the CPU, 3% of laptops bought to be DOA because of HDD, etc, etc and thought "wait, so like 20% of the laptops you order are DOA? What vendor do you buy from!?".

 

 

 

 

17 minutes ago, leadeater said:

There some very high ones in there too but I guess that will likely be lack of sample size causing it. The R9 290 ooff, the Saphire R9 290 OC mega ooooff.

 

Well I was purposefully being lenient with my wording.

Yeah, it is not a perfect sample, and I do believe it is also based on "returns by customer" and not actual "we verified this and it is actually broken for sure". So there might be some false positives in there.

All parts on the list have sold at least 100 units though, and the ones in italics are those that sold less than 200 units so their stats are more unreliable.

 

 

Just noticed that I didn't post the most up to date article. Here is the last one published. It's for the second half of 2016.

Les taux de retour des composants (S2 2016) - HardWare.fr

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×