Jump to content

ASUS Z690 and multiple NVME drives. Drives become unresponsive under load.

A little over a year ago, I had this thread going, where I was trying to find a solution to a very weird issue: 

 

My ASUS Z690 mobo hates Gen 4 NVMe drives:/ - CPUs, Motherboards, and Memory - Linus Tech Tips

 

I was all over the place in that thread, but finally found a solution, which was to replace my mother board, and keep it on the last known good BIOS version and never update it. 

 

The brought this up because I recently upgraded from 12700K to 14700K, and in order to do so, I had to get a newer version of BIOS with 14th gen support. And you guessed it. The problem is back. On a whole new motherboard (same model, but a new unit).

 

Specs: ASUS TUF Gaming Z690 D4, 64GB DDR 4 RAM 3200MHz, 14700K, 4090FE, 850W Corsair PSU... Four NVMe drives: 1TB WD SN850x Gen4, 1TB WD SN750 Gen3, 1TB WD SN750 Gen3, 500GB Samsung 980 (non-Pro, DRAMless).

 

I've used my computer with these 4 drives, 12700K and first a 3070 and for the last 5 months a 4090FE, with no issues. It was on BIOS from August 2022, and everything worked perfectly fine.

 

A couple of weeks ago, I decided to get a 14700K to better match my recently purchased 4090FE. To do so, I had to update BIOS. Since then, I started experiencing the same issue I had over a year ago, which I was only able to resolve by replacing my motherboard at that time.

 

The problem:

 

When transferring a large amount of data between SSDs, or installing a game, one of the drives involved in the process can randomly become unresponsive. It simply suspends all activity, transfer fails, and while the drive still shows in the Explorer, I can't copy or delete any data on it. WD dashboard doesn't see it anymore either. The only way to "restart" it is to reboot the system, and even then, it doesn't always start working on the first try, sometimes I reboot and the drive no longer shows in the Explorer, but re-appears after a second reboot. If this happens to the system drive (SN850 Gen4), it means an instant BSOD, eventually system files corruption and no more boot, but thankfully, this time I haven't had any of those, and only experienced the issue with the other 3 drives. So it can happen to any one of the other 3 drives in my system. It's not a drive issue. To save time, I'd just say the only hardware that I can suspect in causing this is my motherboard. If you want more details, feel free to check out my previous thread. 

 

Kicker:

 

Removing some of the M.2 NVMe drives from the system appears to reduce the frequency of the issue, or eliminates all together. If I remove one of the 2 WD SN750 Gen3 drives, and keep the other 2 drives plus the system drive (Gen4), it appears to work fine. If I remove the Samsung DRAMless drive, then one of the 2 750s will randomly stop responding during large file transfers, it can be easily replicated. So with a 850, 750 and 980 it works fine, but If I transfer data from a 750 tp 980, and at the same time transfer data from my Gen4 SN850 to a network drive (which is a slow transfer), one of the those Gen3s will stop working again. It can happen to either drive, the one data is being copied from, or the one it's being copied to. It never happened to both drives simultaneously. 

 

Guess:

 

My best guess is that it has something to do with overwhelming my PCIe lanes, or lack of bandwidth. Maybe the Chipset? I have the latest drivers and Intel ME driver. It's not the 4090, as I had the same issue with the 3070. It's not the 14700K, as I had the same issue with 12700K. It's possible that it's a combination of 4090 and 14700K, plus 4 NVMEs, but again, I and the same problem when I used a 12700K with 3070. I also had the same issue with Samsung 980 Pro and 990 Pro drives, in fact with 990 Pro as a system drive it was the worst, almost instant BSOD without even transferring anything, just seconds after booting into Windows 11. Feels like the faster the drives are, the more likely it is to trigger whatever is causing these drive shutdowns. And it wasn't happening on any BIOS before September 2022. Seems like any BIOS after that one brings the issue right back. It could be a simple BIOS setting, but I was not able to figure it out, yet.

 

Question:

 

What the hell can be causing this? I've reinstalled my Windows several times, I played with any and all driver combinations. I'm out of ideas. I recall seeing other people mentioning seeing the same thing with Gigabyte Z690 boards. Again, is it a chipset issue? No other posts on the internet with similar claims got any replies. For example, this looks like it might be the same issue, and it was posted around the same time I was asking about it at the end of 2022: https://rog.asus.com/forum/showthread.php?131587-z690-Nvme-gen-4-crashes-system-old-gen-3-works. It never got any replies. 

 

Right now, my system works fine with 3 drives. SN850x, SN750 and Samsung 980. I have to keep it with this combination, and I have to remember to only do one data transfer at the time. If I try with 3 Western Digital drives, it will not be stable. I'm actually able to use more faster drives now that I did the last time I was dealing with this, and this is with much faster CPU and GPU (which may actually be the reason why). The key is to find the perfect balance of drives that will not all be too fast all together, as dumb as it sounds, I know. And it's just too effin' weird.

 

Help me out here, guys. 

 

P.S. Some people say you can't use all 4 M.2 slots at the same time. I am skeptical of this. I did use 4 NVMe drives for over a year, just fine, but a BIOS update reintroduced the issue. It's got to be something else.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, drumn_bass said:

The key is to find the perfect balance of drives that will not all be too fast all together. And it's just too effin' weird.

Yep. Mainstream just does not have enough PCIe lanes. Sounds like you're simply bouncing off the bandwidth limits of the lanes you have available. You have an x8 DMI 4.0 uplink from the chipset to the CPU. That's equivalent bandwidth to x8 PCIe 4.0 IIRC, and all drives in chipset slots + lots of your other I/O (USBs, NIC, etc) also go through this. Only takes 2 fast PCIe 4.0 drives running full bore to hit that limit or get very close to it. Thus why it's ok with 2 of them, but once you add a 3rd drive of any kind and load them all, it goes over the bandwidth limit and chokes.

Intel HEDT and Server platform enthusiasts: Intel HEDT Xeon/i7 Megathread 

 

Main PC 

CPU: i9 7980XE @4.5GHz/1.22v/-2 AVX offset 

Cooler: EKWB Supremacy Block - custom loop w/360mm +280mm rads 

Motherboard: EVGA X299 Dark 

RAM:4x8GB HyperX Predator DDR4 @3200Mhz CL16 

GPU: Nvidia FE 2060 Super/Corsair HydroX 2070 FE block 

Storage:  1TB MP34 + 1TB 970 Evo + 500GB Atom30 + 250GB 960 Evo 

Optical Drives: LG WH14NS40 

PSU: EVGA 1600W T2 

Case & Fans: Corsair 750D Airflow - 3x Noctua iPPC NF-F12 + 4x Noctua iPPC NF-A14 PWM 

OS: Windows 11

 

Display: LG 27UK650-W (4K 60Hz IPS panel)

Mouse: EVGA X17

Keyboard: Corsair K55 RGB

 

Mobile/Work Devices: 2020 M1 MacBook Air (work computer) - iPhone 13 Pro Max - Apple Watch S3

 

Other Misc Devices: iPod Video (Gen 5.5E, 128GB SD card swap, running Rockbox), Nintendo Switch

Link to comment
Share on other sites

Link to post
Share on other sites

@Zando_

 

Wow. Thanks for the quick reply. So I guess I might have been pretty close with my assumption. Do you have any idea why a BIOS update causes it to start experiencing these bandwidth limitations? The one from September 2022 was introducing 13 gen support, so I think it might be a clue right there. Did they change how it handles PCIe lanes for newer gen CPUs? Maybe allocating more to it, thus creating a bottleneck in M.2 slots. If this makes sense, I feel like I'm probably sounding really dumb right here. The whole PCIe thing... I admit I never fully grasped it.

 

Also, you mentioned USB, One of my USB hubs started randomly getting kicked out, like it would say USB not recognized, until I unplug it and plug back it. It's starting to paint a picture.

 

Bonus question, if I may. Will using something like the ASUS Hyper M.2 X16 PCIe 3.0 X4 Expansion Card to host Gen3 drives help? Instead of having them in the motherboard's M.2 slots?

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, drumn_bass said:

o you have any idea why a BIOS update causes it to start experiencing these bandwidth limitations? The one from September 2022 was introducing 13 gen support, so I think it might be a clue right there. Did they change how it handles PCIe lanes for newer gen CPUs?

No idea. 12th, 13th, and 14th gen seem to all have the same 20 CPU PCIe 5.0/4.0 lanes + 8 DMI 4.0 lanes downlink to the chipset. So I don't see how they could change something in the BIOS that would cause issues, or why.

18 minutes ago, drumn_bass said:

The whole PCIe thing... I admit I never fully grasped it.

Lanes on a highway. Add more of em and you get more bandwidth (more cars across the same distance in the same amount of time). Your CPU has 20 lanes, 16 go to your CPU PCIe slots (one x16, or both at x8 if you install something in both slots), 4 go to the CPU M.2 slot. Your chipset M.2 slots, PCIe slots, and most of the board I/O share the 8 DMI lanes. So same as a highway, they get congested when you have more "cars" (things using PCIe lanes) than you have actual lanes. Each of your SSDs is using 4 lanes, so if they're running at 4.0 speeds then 2 of them fill that highway and everything else backs up behind them.

 

Intel/AMD assume most people won't be loading a bunch of high-speed drives so they can get away with this. But I think it's rather stupid given the popularity/low cost of very fast NVMe drives, and the fact that Intel/AMD market their mainstream CPUs to "prosumers", who will be doing exactly this workload regularly. HEDT is unfortunately pretty dead due to the prohibitive cost of current kit + a lot of it being focused on workstation use not people who just need a beefier alternative to mainstream. For reference, my i9 7980XE has 44 CPU PCIe lanes, so I can run an x16 GPU and 4 M.2 drives, all at full bandwidth, through the CPU itself. That's 32 lanes, leaving me with 12 more, though I believe on this platform 4 are stolen by the chipset. Leaving me with 8 spare. And modern HEDT has even more PCIe lanes so it gets rather silly (the current Intel stuff has 64 or 112 PCIe lanes depending on chip).

 

Back on topic, from your motherboard manual, M.2_1 is the CPU M.2 slot. M.2_2, M.2_3, and M.2_4 all run through the Z690 chipset. If there's 2 drive you consistently do large file transfers between, I would put one in the CPU slot, and the other in one of the chipset ones. Realistically other stuff is going through your chipset, so if you're running a transfer between two drives that are both in chipset slots, that's likely why it's choking.

Intel HEDT and Server platform enthusiasts: Intel HEDT Xeon/i7 Megathread 

 

Main PC 

CPU: i9 7980XE @4.5GHz/1.22v/-2 AVX offset 

Cooler: EKWB Supremacy Block - custom loop w/360mm +280mm rads 

Motherboard: EVGA X299 Dark 

RAM:4x8GB HyperX Predator DDR4 @3200Mhz CL16 

GPU: Nvidia FE 2060 Super/Corsair HydroX 2070 FE block 

Storage:  1TB MP34 + 1TB 970 Evo + 500GB Atom30 + 250GB 960 Evo 

Optical Drives: LG WH14NS40 

PSU: EVGA 1600W T2 

Case & Fans: Corsair 750D Airflow - 3x Noctua iPPC NF-F12 + 4x Noctua iPPC NF-A14 PWM 

OS: Windows 11

 

Display: LG 27UK650-W (4K 60Hz IPS panel)

Mouse: EVGA X17

Keyboard: Corsair K55 RGB

 

Mobile/Work Devices: 2020 M1 MacBook Air (work computer) - iPhone 13 Pro Max - Apple Watch S3

 

Other Misc Devices: iPod Video (Gen 5.5E, 128GB SD card swap, running Rockbox), Nintendo Switch

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×