RTX I/O Considerations
24 minutes ago, Zenith_X1 said:That's reassuring, surprising as well that the 2.4GB/s of WD's SN550 beats the Series X but I guess I need to learn more about compression.
Do we know if DirectStorage will be able to take advantage of SATA? If not...would a PCI-E 3.0/4.0 RAID controller allow, say, 4 SATA SSDs in RAID 0 to bypass such a restriction? And do you think that higher end motherboards will start moving toward more NVMe slots as the protocol becomes more prevalent?
The SN550 is rated for 2.4 GB/s reads, factor of 2 = 4.8 GB/s. That's the optimal bandwidth after compression with the Series X using BCPack, which only applies to textures. Other assets would fallback to Zlib most likely which is slower. I don't think a direct comparison is necessarily worthwhile as there are architectural differences, but in terms of numbers - there you go.
DirectStorage is for NVMe, not AHCI. AHCI over PCIe is possible and you can get bandwidth gains from that but the protocol is inherently limited when dealing with solid state storage. Read my Reddit post on the subject with links to further details on it and the RTX IO technologies. I believe AMD will have their own implementation for RDNA cards.
To give you an idea of the chasm between AHCI and NVMe just in terms of 4K IOPS - upcoming Gen 4 NVMe drives will be around 1 million while AHCI can't even hit 100K generally. Now I just said that streaming, sequentials, bandwidth are interlinked, so let's talk about why DirectStorage sticks to NVMe for queuing purposes as well, quote:
"It does this in several ways: by reducing per-request NVMe overhead, enabling batched many-at-a-time parallel IO requests which can be efficiently fed to the GPU, and giving games finer grain control over when they get notified of IO request completion instead of having to react to every tiny IO completion ... NVMe devices are not only extremely high bandwidth SSD based devices, but they also have hardware data access pipes called NVMe queues which are particularly suited to gaming workloads. To get data off the drive, an OS submits a request to the drive and data is delivered to the app via these queues. An NVMe device can have multiple queues and each queue can contain many requests at a time. This is a perfect match to the parallel and batched nature of modern gaming workloads."
The NVMe protocol is simply superior for use with the DirectStorage API. Sony for their part expands on NVMe with more priority states, again check the AnandTech article for more in-depth discussion of this, but it underlines why AHCI is limiting. Worth understanding as a side concept is that random and sequential workloads are related in the sense that, for example, your system/OS may combine random writes and then write sequentially, the SRAM/DRAM in your SSD will combine small (subpage) writes before writing to flash, even your SLC will only write out to TLC/QLC sequentially ("folding") even if the original writes were random in nature. Moreover SSDs work on the principle of parallelization which is inherently limited by the AHCI protocol in many different ways. If you study Sony's storage patent, it actually discusses how they place the data contiguously in order for it to be sequential as that greatly reduces the mapping overhead (e.g. amount of SRAM/DRAM required for metadata) even if the accesses seem random, but they act in parallel.

Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now