Jump to content

AMD Ryzen's FMA3 system freeze issue detailed, a confirmed fix by AMD is in the works

Morgan MLGman

So as @zMeul reported over a week ago in this thread :

TL;DR: There is an FMA3-related bug in Ryzen CPUs which caused a hard system lock when running certain FMA3 workloads. The problem was replicated across all three R7 processors released so far and tested on a variety of motherboards.

 

amd-ryzen-e2eb3ffa665cd9cb160c98.jpg

 

It all started here: http://forum.hwbot.org/showthread.php?t=167605

 

A short description about what FMA is: The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations

As for AMD's support for FMA3 so far: AMD introduced FMA3 support in processors starting with Piledriver architecture for compatibility reasons. The 2nd generation APU processors based on "Trinity" (32nm) supporting FMA3 instructions were launched May 15, 2012. The 2nd generation Bulldozer processors with Piledriver cores supporting FMA3 instructions were launched October 23, 2012. source: wikipedia.org

 

-------------------------------------------------------------

 

To the point: There is a BIOS fix confirmed for the issue and AMD is well-aware of it, a tech site digitaltrends.com reported first-hand talk with AMD about it. Full link: http://www.digitaltrends.com/computing/ryzen-amd-bios-fix-fma3-crash/

 

A quote from the article with AMD's own statement:

Quote

AMD's upcoming fix should alleviate any possible system hang caused by Ryzen processors and the FMA3 microprocessor instruction set.


AMD confirmed with Digital Trends on Monday that the company discovered why FMA3 code is causing system hangs on PCs using a new Ryzen desktop processor. Although AMD didn’t provide a detailed report on the problem’s root cause, the company said that BIOS changes will be distributed to motherboard manufacturers to resolve the issue. Customers are encouraged to keep an eye on their motherboard vendor’s website for an update.

“We are aware of select instances where FMA code can result in a system hang,” the company said. “We have identified the root cause.”

 

More detailed information about the issue:

Quote

Ryzen’s issue with FMA3 isn’t locked to the Flops benchmark. Simple apps with basic user privileges can crash a Ryzen-based machine. Even more, code using FMA3 could be executed on virtual machines running on AMD’s upcoming Zen-based “Naples” processors for the enterprise. Thus, finding the FMA3 issue in Flops now saved AMD and corporations from a lot of headache stemming from the security implications alone at the launch of Naples.

 

First signs of the issue were found by using open-source processor benchmark called Flops (v2), here's what its creator Alexander “Mystical” Yee had to say about it:

Quote

Don’t be fooled by the Haswell binary,” Yee said on HWBOT. “The benchmark is five years old and I’ve largely neglected it for the last three. So I haven’t updated it for Zen yet. Any processor will be able to run any of the binaries if it supports the underlying instruction sets. If it doesn’t, the program merely crashes with an ‘illegal instruction.’ Under no circumstances should a user-mode application be able to bring down an entire system.


The author of the benchmark tested Ryzen in different ways, resulting in (according to the article):

Quote

The multiple tests conducted to confirm the FMA3 problem relied on Ryzen CPUs running at their stock speeds. Yee also benchmarked each thread (ordered instruction sequence), and managed to freeze the PC each time no matter what processor core he used.

 

That last part shows the importance of the coming fix as this problem could be exploited in many harmful ways. Let's hope it comes ASAP.


Any thoughts on the topic? I hope the bug won't be used in a harmful way before the fix comes.

It's a good thing AMD was informed about the issue before Naples launched, this could go horribly wrong for AMD if a potential company using Naples-based servers got their entire server room crashed because of it.

 

 

-------

If this should be in another subforum and is not "good enough" for the News section, please move it mods ^_^ Thanks in advance.

Edited by Morgan MLGman
Fixed formatting!

CPU: AMD Ryzen 7 5800X3D GPU: AMD Radeon RX 6900 XT 16GB GDDR6 Motherboard: MSI PRESTIGE X570 CREATION
AIO: Corsair H150i Pro RAM: Corsair Dominator Platinum RGB 32GB 3600MHz DDR4 Case: Lian Li PC-O11 Dynamic PSU: Corsair RM850x White

Link to comment
Share on other sites

Link to post
Share on other sites

Formatting fixed.

CPU: AMD Ryzen 7 5800X3D GPU: AMD Radeon RX 6900 XT 16GB GDDR6 Motherboard: MSI PRESTIGE X570 CREATION
AIO: Corsair H150i Pro RAM: Corsair Dominator Platinum RGB 32GB 3600MHz DDR4 Case: Lian Li PC-O11 Dynamic PSU: Corsair RM850x White

Link to comment
Share on other sites

Link to post
Share on other sites

AMD really has a couple of early adopter issue's don't they. 

Really annoying for them as this couldn't have any worse of a timing (apart from launch day ofc)

 

On 11/19/2014 at 2:14 PM, Syntaxvgm said:
You would think Ubisoft would support the Bulldozer based architectures more given their digging themed names like bulldozer, Piledriver, Steamroller and Excavator.
Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Arcanekitten said:

AMD really has a couple of early adopter issue's don't they. 

Really annoying for them as this couldn't have any worse of a timing (apart from launch day ofc)

This is not even a big deal issue, quite small. The memory issues are getting rectified too, along with certified memory packs getting release too. Everything that's new be it platform or something completely from scratch haves issues in start so.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Arcanekitten said:

-snip-

These are relatively minor issues compared to the shit storm that was X99s launch. At least the motherboard VRMs don't blow up on AMDs platform at launch. xD

        Pixelbook Go i5 Pixel 4 XL 

  

                                     

 

 

                                                                           

                                                                              

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Citadelen said:

These are relatively minor issues compared to the shit storm that was X99s launch. At least the motherboard VRMs don't blow up on AMDs platform at launch. xD

Why did you feel the need to say this in two different threads? I get posting a snarky comment, but this seems like you are crusading to defend AMD by trying to down Intel. Dial it back some. 

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, MageTank said:

Why did you feel the need to say this in two different threads? I get posting a snarky comment, but this seems like you are crusading to defend AMD by trying to down Intel. Dial it back some. 

Perhaps they suffered the wrath of an exploding X99 board, and still upset about it. It took 8 weeks to sort the RMA, so I am going to bitch about it for that long! :P

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, MageTank said:

Why did you feel the need to say this in two different threads? I get posting a snarky comment, but this seems like you are crusading to defend AMD by trying to down Intel. Dial it back some. 

Because it was an early adopter issue with Intel and people seem to be forgetting that??

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, DrMikeNZ said:

Perhaps they suffered the wrath of an exploding X99 board, and still upset about it. It took 8 weeks to sort the RMA, so I am going to bitch about it for that long! :P

 

4 hours ago, TechGod said:

Because it was an early adopter issue with Intel and people seem to be forgetting that??

Well, yeah but that's not the point of this thread ^_^ IMO a new platform launch might have issues and it's nothing special if it does, Ryzen is a completely new architecture after all.

 

I created this thread to explain the issue better as not everyone understands how instruction sets work and to report that AMD already knows about the issue, found the cause and will be releasing a fix soon.

 

Apparently it's related to the CPU not getting enough voltage at stock when using those instructions as when the R7 CPU is at stock it will result in a system freeze when using those instructions, but when it was overclocked and overvolted, the issue did not occur in some instances (at least in the tests that I read about).

CPU: AMD Ryzen 7 5800X3D GPU: AMD Radeon RX 6900 XT 16GB GDDR6 Motherboard: MSI PRESTIGE X570 CREATION
AIO: Corsair H150i Pro RAM: Corsair Dominator Platinum RGB 32GB 3600MHz DDR4 Case: Lian Li PC-O11 Dynamic PSU: Corsair RM850x White

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Morgan MLGman said:

Well, yeah but that's not the point of this thread ^_^ IMO a new platform launch might have issues and it's nothing special if it does, Ryzen is a completely new architecture after all.

To be fair though, second generation X99 motherboards still had VRM failures during the Broadwell-E launch as well. I wouldn't call that a new platform launch. (And thus is even more off topic).

6 minutes ago, Morgan MLGman said:

Apparently it's related to the CPU not getting enough voltage at stock when using those instructions as when the R7 CPU is at stock it will result in a system freeze when using those instructions, but when it was overclocked and overvolted, the issue did not occur in some instances (at least in the tests that I read about).

I can believe that, my R7 1700 does seem much more unstable at stock than overclocked. Overclocking from 3.2GHz to 3.8GHz has only yielded me a -2% to 16% (median 4%) increase in actual performance, the only benefit of overclocking that I see is stability.

Link to comment
Share on other sites

Link to post
Share on other sites

inb4 zMuel turns this in to a huge fucking shit storm for no good reason.

 

Good to see AMD fixing their problems as they come and in a timely fashion... Hope all most of the problems are fixed time r5 launches and I hope they unlock @MageTank's beloved RAM timings so he can work that magic and see if tightening the timings yields the same kind of gainz as intel (for his specific work load ofc or in general)

CPU: Intel i7 7700K | GPU: ROG Strix GTX 1080Ti | PSU: Seasonic X-1250 (faulty) | Memory: Corsair Vengeance RGB 3200Mhz 16GB | OS Drive: Western Digital Black NVMe 250GB | Game Drive(s): Samsung 970 Evo 500GB, Hitachi 7K3000 3TB 3.5" | Motherboard: Gigabyte Z270x Gaming 7 | Case: Fractal Design Define S (No Window and modded front Panel) | Monitor(s): Dell S2716DG G-Sync 144Hz, Acer R240HY 60Hz (Dead) | Keyboard: G.SKILL RIPJAWS KM780R MX | Mouse: Steelseries Sensei 310 (Striked out parts are sold or dead, awaiting zen2 parts)

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, DrMikeNZ said:

Perhaps they suffered the wrath of an exploding X99 board, and still upset about it. It took 8 weeks to sort the RMA, so I am going to bitch about it for that long! :P

 

5 hours ago, TechGod said:

Because it was an early adopter issue with Intel and people seem to be forgetting that??

I completely understand that. What I don't understand was the need to say almost the exact same post in two different threads. Especially once you factor in the other thread being nearly 2 weeks old. At that point, it goes beyond educating about Intel's platform launching issues, and more about defending AMD by attempting to bring Intel down.

 

I don't think I can name a single platform from either side that launched perfectly smooth. Even to this very day, TSX support is hit or miss (with the errata list always showing TSX issues). A buddy of mine blew up 2 different boards with the same Thuban (both top of the line boards from ASUS and Asrock at the time). Skylake has a similar AVX issue (albeit pretty uncommon) that caused crashes, and nobody needs to be reminded of Bulldozer's launch at all, lol. 

 

My point is, every launch from every side can be shaky, and I doubt we all expect perfectly smooth product launches. People just need to tone down all of the mud slinging, especially if you have to go to two different threads to do it.

 

4 minutes ago, XenosTech said:

inb4 zMuel turns this in to a huge fucking shit storm for no good reason.

 

Good to see AMD fixing their problems as they come and in a timely fashion... Hope all most of the problems are fixed time r5 launches and I hope they unlock @MageTank's beloved RAM timings so he can work that magic and see if tightening the timings yields the same kind of gainz as intel (for his specific work load ofc or in general)

I just spoke with @done12many2 yesterday about ram timings, and showed him my 3600 C14-14-14-28-2 beating a DDR4 4000 kit in latency, while matching it's read/copy. The only thing faster frequency wins at, is write speeds (Writes are always the most efficient, easiest to work with). I'd take my lower latency over that 10% write speed boost. He also showed me a 4133mhz XMP that was literally half the speed of my kit (dual channel, but running single channel speeds due to extremely poor RTL/IO-L training, and likely bad auto-tertiary timings all around). This further proves my point that all of the gains are hidden in tertiary timings, and frequency/primary timings matter less. 

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, MageTank said:

I just spoke with @done12many2 yesterday about ram timings, and showed him my 3600 C14-14-14-28-2 beating a DDR4 4000 kit in latency, while matching it's read/copy. The only thing faster frequency wins at, is write speeds (Writes are always the most efficient, easiest to work with). I'd take my lower latency over that 10% write speed boost. He also showed me a 4133mhz XMP that was literally half the speed of my kit (dual channel, but running single channel speeds due to extremely poor RTL/IO-L training, and likely bad auto-tertiary timings all around). This further proves my point that all of the gains are hidden in tertiary timings, and frequency/primary timings matter less. 

I know, you've been spreading it like it was a religion. I really hope they give access to those timings though, would be very interesting to see what kind of latency between the ccx's exist after that tweak

CPU: Intel i7 7700K | GPU: ROG Strix GTX 1080Ti | PSU: Seasonic X-1250 (faulty) | Memory: Corsair Vengeance RGB 3200Mhz 16GB | OS Drive: Western Digital Black NVMe 250GB | Game Drive(s): Samsung 970 Evo 500GB, Hitachi 7K3000 3TB 3.5" | Motherboard: Gigabyte Z270x Gaming 7 | Case: Fractal Design Define S (No Window and modded front Panel) | Monitor(s): Dell S2716DG G-Sync 144Hz, Acer R240HY 60Hz (Dead) | Keyboard: G.SKILL RIPJAWS KM780R MX | Mouse: Steelseries Sensei 310 (Striked out parts are sold or dead, awaiting zen2 parts)

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, Sakkura said:

So it's going to be like AVX on Intel chips? Too bad they didn't catch it before release, but at least it won't affect their enterprise launch, where this could have been a big problem. 

I work in server-environments as I'm a technician in a company that specializes in storage/backups/data archiving/data protection etc and I can tell you first-hand that such an issue, if exploited, could be disastrous to some companies...

I mean, it's more serious than people realize, some companies have VERY short RTOs (Recovery Time Objective) in case something fails and very complex DR (Disaster Recovery) plans to deal with a potential data loss as they cannot afford to lose anything, but I don't think anyone expects all of their server machines to just randomly freeze because of someone exploiting an instruction set-based bug on all of their servers...

CPU: AMD Ryzen 7 5800X3D GPU: AMD Radeon RX 6900 XT 16GB GDDR6 Motherboard: MSI PRESTIGE X570 CREATION
AIO: Corsair H150i Pro RAM: Corsair Dominator Platinum RGB 32GB 3600MHz DDR4 Case: Lian Li PC-O11 Dynamic PSU: Corsair RM850x White

Link to comment
Share on other sites

Link to post
Share on other sites

Good to hear that they are working on a fix, but their list of "things to fix in future updates" is growing worryingly long. It's better to say "we will fix it later" than to stay silent, but I am getting worried that:

1) AMD is taking themselves water over the head. They are working on solving like 3-4 issues at once, when they really should have polished the product a bit more before launching it.

2) The hype train will once again pick up steam and we will never hear the end of "just wait for X and Y updates, then AMD will destroy Intel!" which is a really stupid mentality to have (as we saw with Bulldozer).

 

 

 

6 minutes ago, MageTank said:

At that point, it goes beyond educating about Intel's platform launching issues, and more about defending AMD by attempting to bring Intel down.

Welcome to fanboy-ism.

If you can't beat someone, trying and bring the competitor down instead.

 

It's like in Windows 10 threads where a handful of users always jumps in and goes "Google is bad too!" as soon as someone says anything negative about Microsoft.

Link to comment
Share on other sites

Link to post
Share on other sites

Nice to see they admit there is an issue and are working on a proper fix.

Some companies can learn from that :)

If you want my attention, quote meh! D: or just stick an @samcool55 in your post :3

Spying on everyone to fight against terrorism is like shooting a mosquito with a cannon

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, MageTank said:

This further proves my point that all of the gains are hidden in tertiary timings, and frequency/primary timings matter less. 

Still looking forward to a tuning guide :) 

 

1 hour ago, LAwLz said:

Good to hear that they are working on a fix, but their list of "things to fix in future updates" is growing worryingly long.

The list isn't that long, is it? As far as I'm aware, there is the FMA3 thing of this thread, and a general ram performance/compatibility improvement that's supposed to be in the works. Both of these will (eventually) be resolved in bios updates. While it can be serious in some scenarios, I don't think the FMA3 bug is that big a deal as the performance in that area significantly lags Intel, so if you wanted to do a lot of FMA3 intensive tasks, Ryzen wouldn't be my first choice. A lot of the other problems are around software optimisation and support, which I view separately.

 

 

I'm feeling a little burnout from all the Ryzen hype and my own testing which I should repost here some time... Taking a bit of a breather, let the bios mature, before I push harder again...

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, MageTank said:

-snip-

Doesn't make it either any less true or less relevant to the points I was responding too.

        Pixelbook Go i5 Pixel 4 XL 

  

                                     

 

 

                                                                           

                                                                              

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

To quote Squidward: "Snowball fights are for immature children."

I'm not picking a side, but at least AMD caught the bug now and not later where it really could've fucked up their enterprise launch and potentially laptop rollout, if it would theoretically cover all CPUs under Zen.

Check out my guide on how to scan cover art here!

Local asshole and 6th generation console enthusiast.

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, Doobeedoo said:

This is not even a big deal issue, quite small. The memory issues are getting rectified too, along with certified memory packs getting release too. Everything that's new be it platform or something completely from scratch haves issues in start so.

It's an issue not the less. Same with the RAM speed, it might not effect a whole lot but it's a early adopter issue which is fine as you also mentioned it is to be expected.

 

On 11/19/2014 at 2:14 PM, Syntaxvgm said:
You would think Ubisoft would support the Bulldozer based architectures more given their digging themed names like bulldozer, Piledriver, Steamroller and Excavator.
Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, Arcanekitten said:

It's an issue not the less. Same with the RAM speed, it might not effect a whole lot but it's a early adopter issue which is fine as you also mentioned it is to be expected.

As far as RAM there is 3200Mhz that work and also 3466Mhz certified kit which is great.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Citadelen said:

Doesn't make it either any less true or less relevant to the points I was responding too.

It does though. It changes the intent of your topic from pointing out similar flaws with intel, to crusading blindly as a loyal AMD fanboy. Your own signature makes it very clear. I wouldn't have cared, but you went back to a thread that had no replies in almost 2 weeks to repeat exactly what you said here. That's going out of your way for no reason other than to defend AMD by bringing Intel down.

 

Also, if your intent is to educate people with truth, provide sources as you do it. Others will simply blame board VRM and not attribute the blame to Intel themselves. 

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, MageTank said:

It does though. It changes the intent of your topic from pointing out similar flaws with intel, to crusading blindly as a loyal AMD fanboy. Your own signature makes it very clear. I wouldn't have cared, but you went back to a thread that had no replies in almost 2 weeks to repeat exactly what you said here. That's going out of your way for no reason other than to defend AMD by bringing Intel down.

 

Also, if your intent is to educate people with truth, provide sources as you do it. Others will simply blame board VRM and not attribute the blame to Intel themselves. 

Where's the fire emoji when I need it…

There it is. ?

Both sides have issues that shouldn't have happened, and both (at least tried to?) take care of it. 

Check out my guide on how to scan cover art here!

Local asshole and 6th generation console enthusiast.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, MageTank said:

-snip-

Part of my response was to people picking faults at the AM4 platform, so I picked at X99s faults. Also I didn't meant to necro a two week old post, I don't actually know how that happened, I didn't actively search for it. Not only that but I hadn't intented to make clone posts, I saw two posts that it was relevant to and replied accordingly. I also pride myself on not blindly crusading for AMD while being a fanboy for them, if they mess up, I'll critise them, but it does annoy me when AMD gets so much undeserved flak for launching a less than polished platform, there's only so much you can delay a product before it starts to eat at revenue and share holders get antsy.

        Pixelbook Go i5 Pixel 4 XL 

  

                                     

 

 

                                                                           

                                                                              

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×