Jump to content

RTX I/O Considerations

Go to solution Solved by NewMaxx,
24 minutes ago, Zenith_X1 said:

That's reassuring, surprising as well that the 2.4GB/s of WD's SN550 beats the Series X but I guess I need to learn more about compression.

 

Do we know if DirectStorage will be able to take advantage of SATA?  If not...would a PCI-E 3.0/4.0 RAID controller allow, say, 4 SATA SSDs in RAID 0 to bypass such a restriction?  And do you think that higher end motherboards will start moving toward more NVMe slots as the protocol becomes more prevalent?

The SN550 is rated for 2.4 GB/s reads, factor of 2 = 4.8 GB/s. That's the optimal bandwidth after compression with the Series X using BCPack, which only applies to textures. Other assets would fallback to Zlib most likely which is slower. I don't think a direct comparison is necessarily worthwhile as there are architectural differences, but in terms of numbers - there you go.

 

DirectStorage is for NVMe, not AHCI. AHCI over PCIe is possible and you can get bandwidth gains from that but the protocol is inherently limited when dealing with solid state storage. Read my Reddit post on the subject with links to further details on it and the RTX IO technologies. I believe AMD will have their own implementation for RDNA cards.

 

To give you an idea of the chasm between AHCI and NVMe just in terms of 4K IOPS - upcoming Gen 4 NVMe drives will be around 1 million while AHCI can't even hit 100K generally. Now I just said that streaming, sequentials, bandwidth are interlinked, so let's talk about why DirectStorage sticks to NVMe for queuing purposes as well, quote:

 

"It does this in several ways: by reducing per-request NVMe overhead, enabling batched many-at-a-time parallel IO requests which can be efficiently fed to the GPU, and giving games finer grain control over when they get notified of IO request completion instead of having to react to every tiny IO completion ... NVMe devices are not only extremely high bandwidth SSD based devices, but they also have hardware data access pipes called NVMe queues which are particularly suited to gaming workloads. To get data off the drive, an OS submits a request to the drive and data is delivered to the app via these queues. An NVMe device can have multiple queues and each queue can contain many requests at a time. This is a perfect match to the parallel and batched nature of modern gaming workloads."

 

The NVMe protocol is simply superior for use with the DirectStorage API. Sony for their part expands on NVMe with more priority states, again check the AnandTech article for more in-depth discussion of this, but it underlines why AHCI is limiting. Worth understanding as a side concept is that random and sequential workloads are related in the sense that, for example, your system/OS may combine random writes and then write sequentially, the SRAM/DRAM in your SSD will combine small (subpage) writes before writing to flash, even your SLC will only write out to TLC/QLC sequentially ("folding") even if the original writes were random in nature. Moreover SSDs work on the principle of parallelization which is inherently limited by the AHCI protocol in many different ways. If you study Sony's storage patent, it actually discusses how they place the data contiguously in order for it to be sequential as that greatly reduces the mapping overhead (e.g. amount of SRAM/DRAM required for metadata) even if the accesses seem random, but they act in parallel.

Hey all, stream of consciousness incoming.

 

I was thinking about what would be best for my system when RTX I/O and Direct Storage release next year.  Pulling from nVidia's website:

 

"There is no SSD speed requirement for RTX IO, but obviously, faster SSD’s such as the latest generation of Gen4 NVMe SSD’s will produce better results, meaning faster load times, and the ability for games to stream more data into the world dynamically. Some games may have minimum requirements for SSD performance in the future, but those would be determined by the game developers."  -https://www.nvidia.com/en-us/geforce/news/rtx-30-series-community-qa/

 

A few things I took away from this are that 1) while we don't know what the performance benefit will be, we know from the PS5 demo that Sony is targeting 5.5 to 7 GB/sec which they say results in effectively "zero load screens".  2) "Minimum requirements for SSD performance" means next-gen games that leverage zero load times tech may require higher read speeds than current ~500 MB/s SSDs can offer, or perhaps that the 6Gb/s (~750 MB/s) SATA interface can even provide.  "Okay cool" I thought, "HDDs can finally be put to rest."

 

High speed SSD requirements could make 500 MB/s SSDs obsolete, or require a 14 drive RAID 0 array to meet 7 GB/s (incoming HOLY SH*T: "WE SATURATED PCI-E 4.0 USING ONLY SPINNING DINOSAUR HARD DRIVES" LTT video).  Then I realized that even if I ran RAID 0 with FULL saturation of all six of my motherboard's SATA interfaces (not all of which are 6 Gb/s but for the sake of argument let's say they were) I would STILL fall short of the lower-end 5.5 GB/sec target that Sony looks to achieve.  "Well crap" I thought, "that might mean that HDDs and SATA SSDs will become obsolete for gaming."

 

If storage speed affects performance enough for game creators to use storage speed in their "minimum requirements" then slow storage could seriously hamper framerate, which would be far worse than long load times.  I'm starting to wonder if storage speed requirements will drive a race toward "gamer-oriented" high speed storage.  And that's when I realized that even most NVME drives don't meet Sony's target.  "Uh oh" I thought, "even my Samsung 960 Pro only has a 3.5 GB/s read speed."

 

So now it's possible that my HDDs, SSDs, and my NVME drive might ALL be obsolete sooner than expected.

 

Damn.

 

/End Rant

 

 

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/
Share on other sites

Link to post
Share on other sites

Imo a couple of seconds gained from RTX I/O dont matter enough to be worth the price of a high end SSD

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/#findComment-14002502
Share on other sites

Link to post
Share on other sites

Those drives won't be obsolete, we will just have better options available. Most people will remain using HDDs and cheap sata SSDs due to cost, and devs for sure won't want to miss on that whole market, which means that games will still work on those.

 

Even when talking about the PS5, my bet is that only first party games will actually leverage such bandwidth, while the remaining of the games will just use that for faster loading times.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/#findComment-14002522
Share on other sites

Link to post
Share on other sites

7 minutes ago, Jurrunio said:

Imo a couple of seconds gained from RTX I/O dont matter enough to be worth the price of a high end SSD

 

1 minute ago, igormp said:

Those drives won't be obsolete, we will just have better options available. Most people will remain using HDDs and cheap sata SSDs due to cost, and devs for sure won't want to miss on that whole market, which means that games will still work on those.

 

Even when talking about the PS5, my bet is that only first party games will actually leverage such bandwidth, while the remaining of the games will just use that for faster loading times.

The thing is, when a developer puts storage speed in a future game's requirements section, I stop thinking that this is really about load times anymore.  Developers have forced consumers to move up the tech stack many times in the past and SSDs have been around for about a decade, so mandating an SSD seems reasonable.

 

But the key point I'm trying to get at is that if your storage speed is high enough, you don't need your game to have a loading screen at all (remember load times for PS1 vs N64?).  Games can load dynamically as you play, and anyone who is running older storage hardware will have to wait for their system to catch up as assets are loaded in.  Lower settings and changes to render distance would help players with slower storage speeds, but therefore there is absolutely a situation where a player with the latest top-tier CPU and GPU could have far lower FPS at max settings due to slow storage

 

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/#findComment-14002558
Share on other sites

Link to post
Share on other sites

I think you are locking onto FPS being related to how fast a game can load assets, although framerates could dip while loading from slower storage I'm pretty sure the overall experience should not be too bad. Lets remember a game has to sell copies to make money for the developers, if the game is written in such a way as to eliminate all but the top 1% or even the top 10%, then it will tank for sales. Most developers tend to create the game for the lowest denominator to maximise sales, so yes fast storage and monstrous GPUs can be catered for but not exclusively if the user base is still on last gen or even older tech.

Ways around the difference could be for example how Elite Dangerous and Eve Online cater for loading assets, the warp tunnel, it is a feature of those and other space games, but is just a loading screen. So someone with slower tech has a 30 second warp tunnel, someone with monster tech has a 10 second warp tunnel, both players just see a warp tunnel.

Similar methods can be employed in open world games, Mass Effect used lifts, you could have teleport graphics or climbing up/down a rocky tunnel, many methods to add a loading screen without breaking immersion.

Just my humble ramblings. :)

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/#findComment-14002782
Share on other sites

Link to post
Share on other sites

2 hours ago, Zenith_X1 said:

Games can load dynamically as you play

They have been around for so long that... idk, how could open world games work otherwise?

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/#findComment-14002905
Share on other sites

Link to post
Share on other sites

Couple random comments on your post...

 

The PS5 actually targets 9 GB/s after compression/decompression with a base speed around 5 GB/s. You can see their targets in the related storage patent, actually, which sets thresholds for performance on the accelerator/block. This is general compression using Kraken with a Zlib fallback. The Series X instead uses BCPack for textures, achieving about 4.8 GB/s from a 2.4 GB/s base, also with Zlib fallback. Check my Reddit post on the subject for more.

 

Nvidia's RTX IO has a target compression ratio of 2 which means a x4 PCIe 4.0 drive or striped x4 PCIe 3.0 drives would exceed anything the consoles can do. Further, as it is based on the DirectStorage API, any decent x4 PCIe 3.0 drive will match or exceed the Series X's best case (texture) bandwidth, for example even a 4-channel SN550. So there's absolutely nothing to be worried about there, although it will take some time for it to be implemented anyway.

 

While DirectStorage is focused on the NVMe protocol - and most computers sold starting in 2019 came with PCIe drives, by the way - there are already games that require a SATA SSD. That alone is plenty fast for loading as random 4K is a major limitation/bottleneck, the advantages you see from faster drives is sequential and intended for streaming. This is why the PS5's solution is DRAM-less and if you read the patent you see how they chunk the storage (again, read my Reddit post, the patent, and the AnandTech article).

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/#findComment-14003114
Share on other sites

Link to post
Share on other sites

16 minutes ago, NewMaxx said:

Couple random comments on your post...

 

The PS5 actually targets 9 GB/s after compression/decompression with a base speed around 5 GB/s. You can see their targets in the related storage patent, actually, which sets thresholds for performance on the accelerator/block. This is general compression using Kraken with a Zlib fallback. The Series X instead uses BCPack for textures, achieving about 4.8 GB/s from a 2.4 GB/s base, also with Zlib fallback. Check my Reddit post on the subject for more.

 

Nvidia's RTX IO has a target compression ratio of 2 which means a x4 PCIe 4.0 drive or striped x4 PCIe 3.0 drives would exceed anything the consoles can do. Further, as it is based on the DirectStorage API, any decent x4 PCIe 3.0 drive will match or exceed the Series X's best case (texture) bandwidth, for example even a 4-channel SN550. So there's absolutely nothing to be worried about there, although it will take some time for it to be implemented anyway.

 

While DirectStorage is focused on the NVMe protocol - and most computers sold starting in 2019 came with PCIe drives, by the way - there are already games that require a SATA SSD. That alone is plenty fast for loading as random 4K is a major limitation/bottleneck, the advantages you see from faster drives is sequential and intended for streaming. This is why the PS5's solution is DRAM-less and if you read the patent you see how they chunk the storage (again, read my Reddit post, the patent, and the AnandTech article).

That's reassuring, surprising as well that the 2.4GB/s of WD's SN550 beats the Series X but I guess I need to learn more about compression.

 

Do we know if DirectStorage will be able to take advantage of SATA?  If not...would a PCI-E 3.0/4.0 RAID controller allow, say, 4 SATA SSDs in RAID 0 to bypass such a restriction?  And do you think that higher end motherboards will start moving toward more NVMe slots as the protocol becomes more prevalent?

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/#findComment-14003155
Share on other sites

Link to post
Share on other sites

24 minutes ago, Zenith_X1 said:

That's reassuring, surprising as well that the 2.4GB/s of WD's SN550 beats the Series X but I guess I need to learn more about compression.

 

Do we know if DirectStorage will be able to take advantage of SATA?  If not...would a PCI-E 3.0/4.0 RAID controller allow, say, 4 SATA SSDs in RAID 0 to bypass such a restriction?  And do you think that higher end motherboards will start moving toward more NVMe slots as the protocol becomes more prevalent?

The SN550 is rated for 2.4 GB/s reads, factor of 2 = 4.8 GB/s. That's the optimal bandwidth after compression with the Series X using BCPack, which only applies to textures. Other assets would fallback to Zlib most likely which is slower. I don't think a direct comparison is necessarily worthwhile as there are architectural differences, but in terms of numbers - there you go.

 

DirectStorage is for NVMe, not AHCI. AHCI over PCIe is possible and you can get bandwidth gains from that but the protocol is inherently limited when dealing with solid state storage. Read my Reddit post on the subject with links to further details on it and the RTX IO technologies. I believe AMD will have their own implementation for RDNA cards.

 

To give you an idea of the chasm between AHCI and NVMe just in terms of 4K IOPS - upcoming Gen 4 NVMe drives will be around 1 million while AHCI can't even hit 100K generally. Now I just said that streaming, sequentials, bandwidth are interlinked, so let's talk about why DirectStorage sticks to NVMe for queuing purposes as well, quote:

 

"It does this in several ways: by reducing per-request NVMe overhead, enabling batched many-at-a-time parallel IO requests which can be efficiently fed to the GPU, and giving games finer grain control over when they get notified of IO request completion instead of having to react to every tiny IO completion ... NVMe devices are not only extremely high bandwidth SSD based devices, but they also have hardware data access pipes called NVMe queues which are particularly suited to gaming workloads. To get data off the drive, an OS submits a request to the drive and data is delivered to the app via these queues. An NVMe device can have multiple queues and each queue can contain many requests at a time. This is a perfect match to the parallel and batched nature of modern gaming workloads."

 

The NVMe protocol is simply superior for use with the DirectStorage API. Sony for their part expands on NVMe with more priority states, again check the AnandTech article for more in-depth discussion of this, but it underlines why AHCI is limiting. Worth understanding as a side concept is that random and sequential workloads are related in the sense that, for example, your system/OS may combine random writes and then write sequentially, the SRAM/DRAM in your SSD will combine small (subpage) writes before writing to flash, even your SLC will only write out to TLC/QLC sequentially ("folding") even if the original writes were random in nature. Moreover SSDs work on the principle of parallelization which is inherently limited by the AHCI protocol in many different ways. If you study Sony's storage patent, it actually discusses how they place the data contiguously in order for it to be sequential as that greatly reduces the mapping overhead (e.g. amount of SRAM/DRAM required for metadata) even if the accesses seem random, but they act in parallel.

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/#findComment-14003188
Share on other sites

Link to post
Share on other sites

7 minutes ago, NewMaxx said:

The SN550 is rated for 2.4 GB/s reads, factor of 2 = 4.8 GB/s. That's the optimal bandwidth after compression with the Series X using BCPack, which only applies to textures. Other assets would fallback to Zlib most likely which is slower. I don't think a direct comparison is necessarily worthwhile as there are architectural differences, but in terms of numbers - there you go.

 

DirectStorage is for NVMe, not AHCI. AHCI over PCIe is possible and you can get bandwidth gains from that but the protocol is inherently limited when dealing with solid state storage. Read my Reddit post on the subject with links to further details on it and the RTX IO technologies. I believe AMD will have their own implementation for RDNA cards.

 

To give you an idea of the chasm between AHCI and NVMe just in terms of 4K IOPS - upcoming Gen 4 NVMe drives will be around 1 million while AHCI can't even hit 100K generally. Now I just said that streaming, sequentials, bandwidth are interlinked, so let's talk about why DirectStorage sticks to NVMe for queuing purposes as well, quote:

 

"It does this in several ways: by reducing per-request NVMe overhead, enabling batched many-at-a-time parallel IO requests which can be efficiently fed to the GPU, and giving games finer grain control over when they get notified of IO request completion instead of having to react to every tiny IO completion ... NVMe devices are not only extremely high bandwidth SSD based devices, but they also have hardware data access pipes called NVMe queues which are particularly suited to gaming workloads. To get data off the drive, an OS submits a request to the drive and data is delivered to the app via these queues. An NVMe device can have multiple queues and each queue can contain many requests at a time. This is a perfect match to the parallel and batched nature of modern gaming workloads."

 

The NVMe protocol is simply superior for use with the DirectStorage API. Sony for their part expands on NVMe with more priority states, again check the AnandTech article for more in-depth discussion of this, but it underlines why AHCI is limiting.

Thank you NewMaxx, that was very informative!  I look forward to learning more from your posts

Link to comment
https://linustechtips.com/topic/1244539-rtx-io-considerations/#findComment-14003207
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×