Jump to content

Why is sequential/random R/W stats important when dealing with flash?

TiskTisk512
Go to solution Solved by TiskTisk512,
10 minutes ago, Enderman said:

Take a bunch of small files and copy them from one SSD to another.

Then take a single large video file and do the same.

You will see the video file transfer around 500MBps while the small files will be significantly less, depending on how small they are and where they are stored.

 

That article is wrong.

It is not instant at all.

OK so I found a reply on a Dell support site that kinda supports what you're saying, but man it has been a lot of searching to get there. You're right, but this goes into detail about why:

 

When people talk about sequential vs random writes to a file, they're generally drawing a distinction between writing without intermediate seeks ("sequential"), vs. a pattern of seek-write-seek-write-seek-write, etc. ("random").

The distinction is very important in traditional disk-based systems, where each disk seek will take around 10ms. Sequentially writing data to that same disk takes about 30ms per MB. So if you sequentially write 100MB of data to a disk, it will take around 3 seconds. But if you do 100 random writes of 1MB each, that will take a total of 4 seconds (3 seconds for the actual writing, and 10ms*100 == 1 second for all the seeking).

As each random write gets smaller, you pay more and more of a penalty for the disk seeks. In the extreme case where you perform 100 million random 1-byte writes, you'll still net 3 seconds for all the actual writes, but you'd now have 11.57 days worth of seeking to do! So clearly the degree to which your writes are sequential vs. random can really affect the time it takes to accomplish your task.

The situation is a bit different when it comes to flash. With flash, you don't have a physical disk head that you must move around. (This is where the 10ms seek cost comes from for a traditional disk). However, flash devices tend to have large page sizes (the smallest "typical" page size is around 512 bytes according to wikipedia, and 4K page sizes appear to be common as well). So if you're writing a small number of bytes, flash still has overhead in that you must read out an entire page, modify the bytes you're writing, and then write back the entire page. I don't know the characteristic numbers for flash off the top of my head. But the rule of thumb is that on flash if each of your writes is generally comparable in size to the device's page size, then you won't see much performance difference between random and sequential writes. If each of your writes is small compared to the device page size, then you'll see some overhead when doing random writes.

Now for all of the above, it's true that at the application layer much is hidden from you. There are layers in the kernel, disk/flash controller, etc. that could for example interject non-obvious seeks in the middle of your "sequential" writing. But in most cases, writing that "looks" sequential at the application layer (no seeks, lots of continuous I/O) will have sequential-write performance while writing that "looks" random at the application layer will have the (generally worse) random-write performance.

Hi all!

I really don't understand why it is that M.2 NVME and SSD/flash drives altogether are measured with these statistics when there isn't a platter, actuator arm, etc. I see the 'random' r/w at really low speeds for example on the M.2 Samsung 960 Pro... I don't understand...

Thanks!

Link to comment
Share on other sites

Link to post
Share on other sites

Because a storage drive still needs to find the data when it's in random places?

Or do you think it just magically knows where all your data is and instantly goes to the correct place?

 

That's why random reads and writes are so much slower than sequential even though it's still an SSD.

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to comment
Share on other sites

Link to post
Share on other sites

it still needs to find the data on the drive which might take a little bit depending on size 

I like to kill hardware. In 2016 alone I have killed 20 Xeon 5160, and 10+ Pentium 4. 

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Enderman said:

Because a storage drive still needs to find the data when it's in random places?

Or do you think it just magically knows where all your data is and instantly goes to the correct place?

 

That's why random reads and writes are so much slower than sequential even though it's still an SSD.

OK so I understand the 'random' part of this I guess (sorta, cause it has block addresses which it should be able to access instantaneously and it doesn't have to wait for the platter spin and it doesn't necessarily 'find' these so much as travels to them), but why is sequential read even relevant then? 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, TiskTisk512 said:

OK so I understand the 'random' part of this I guess (sorta, cause it has block addresses which it should be able to access instantaneously and it doesn't have to wait for the platter spin and it doesn't necessarily 'find' these so much as travels to them), but why is sequential read even relevant then? 

Because when you transfer a single large file then it's not searching randomly for the data, it knows exactly where the data is, so it reads and writes at faster speeds.

Basically sequential = large files (eg videos) and random = small files

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Enderman said:

Because when you transfer a single large file then it's not searching randomly for the data, it knows exactly where the data is, so it reads and writes at faster speeds.

Basically sequential = large files (eg videos) and random = small files

So why even call it sequential/random? Seems to be a bit disingenuous. More like "large file/small file" speeds... 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, TiskTisk512 said:

So why even call it sequential/random? Seems to be a bit disingenuous. More like "large file/small file" speeds... 

Because there is no definition of "small file" or "large file".

If you look up the definition of sequential and random you will understand why this is exactly how the tests are done and why those words are used to describe the result.

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Enderman said:

Because there is no definition of "small file" or "large file".

If you look up the definition of sequential and random you will understand why this is exactly how the tests are done and why those words are used to describe the result.

OK so actually this article is contradicting you and it's what brought me to ask this question to begin with. If sequential/random I/O are disk concepts, how come I still see those measurements?

Link to comment
Share on other sites

Link to post
Share on other sites

Sequential = file can be taken in series starting at 0 going to 10 irregardless of size. When the order of a file is important, sequential will be used.

Random = File can be taken in any order of 0 to 10 and can be stored in any medium until all data is present. 

 

That's the basic gist of it.

Cor Caeruleus Reborn v6

Spoiler

CPU: Intel - Core i7-8700K

CPU Cooler: be quiet! - PURE ROCK 
Thermal Compound: Arctic Silver - 5 High-Density Polysynthetic Silver 3.5g Thermal Paste 
Motherboard: ASRock Z370 Extreme4
Memory: G.Skill TridentZ RGB 2x8GB 3200/14
Storage: Samsung - 850 EVO-Series 500GB 2.5" Solid State Drive 
Storage: Samsung - 960 EVO 500GB M.2-2280 Solid State Drive
Storage: Western Digital - Blue 2TB 3.5" 5400RPM Internal Hard Drive
Storage: Western Digital - BLACK SERIES 3TB 3.5" 7200RPM Internal Hard Drive
Video Card: EVGA - 970 SSC ACX (1080 is in RMA)
Case: Fractal Design - Define R5 w/Window (Black) ATX Mid Tower Case
Power Supply: EVGA - SuperNOVA P2 750W with CableMod blue/black Pro Series
Optical Drive: LG - WH16NS40 Blu-Ray/DVD/CD Writer 
Operating System: Microsoft - Windows 10 Pro OEM 64-bit and Linux Mint Serena
Keyboard: Logitech - G910 Orion Spectrum RGB Wired Gaming Keyboard
Mouse: Logitech - G502 Wired Optical Mouse
Headphones: Logitech - G430 7.1 Channel  Headset
Speakers: Logitech - Z506 155W 5.1ch Speakers

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, TiskTisk512 said:

OK so actually this article is contradicting you and it's what brought me to ask this question to begin with. If sequential/random I/O are disk concepts, how come I still see those measurements?

Take a bunch of small files and copy them from one SSD to another.

Then take a single large video file and do the same.

You will see the video file transfer around 500MBps while the small files will be significantly less, depending on how small they are and where they are stored.

 

That article is wrong.

It is not instant at all.

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, ARikozuM said:

Sequential = file can be taken in series starting at 0 going to 10 irregardless of size. When the order of a file is important, sequential will be used.

Random = File can be taken in any order of 0 to 10 and can be stored in any medium until all data is present. 

 

That's the basic gist of it.

Can you give me a real-world example/use case here?

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, Enderman said:

Take a bunch of small files and copy them from one SSD to another.

Then take a single large video file and do the same.

You will see the video file transfer around 500MBps while the small files will be significantly less, depending on how small they are and where they are stored.

 

That article is wrong.

It is not instant at all.

OK so I found a reply on a Dell support site that kinda supports what you're saying, but man it has been a lot of searching to get there. You're right, but this goes into detail about why:

 

When people talk about sequential vs random writes to a file, they're generally drawing a distinction between writing without intermediate seeks ("sequential"), vs. a pattern of seek-write-seek-write-seek-write, etc. ("random").

The distinction is very important in traditional disk-based systems, where each disk seek will take around 10ms. Sequentially writing data to that same disk takes about 30ms per MB. So if you sequentially write 100MB of data to a disk, it will take around 3 seconds. But if you do 100 random writes of 1MB each, that will take a total of 4 seconds (3 seconds for the actual writing, and 10ms*100 == 1 second for all the seeking).

As each random write gets smaller, you pay more and more of a penalty for the disk seeks. In the extreme case where you perform 100 million random 1-byte writes, you'll still net 3 seconds for all the actual writes, but you'd now have 11.57 days worth of seeking to do! So clearly the degree to which your writes are sequential vs. random can really affect the time it takes to accomplish your task.

The situation is a bit different when it comes to flash. With flash, you don't have a physical disk head that you must move around. (This is where the 10ms seek cost comes from for a traditional disk). However, flash devices tend to have large page sizes (the smallest "typical" page size is around 512 bytes according to wikipedia, and 4K page sizes appear to be common as well). So if you're writing a small number of bytes, flash still has overhead in that you must read out an entire page, modify the bytes you're writing, and then write back the entire page. I don't know the characteristic numbers for flash off the top of my head. But the rule of thumb is that on flash if each of your writes is generally comparable in size to the device's page size, then you won't see much performance difference between random and sequential writes. If each of your writes is small compared to the device page size, then you'll see some overhead when doing random writes.

Now for all of the above, it's true that at the application layer much is hidden from you. There are layers in the kernel, disk/flash controller, etc. that could for example interject non-obvious seeks in the middle of your "sequential" writing. But in most cases, writing that "looks" sequential at the application layer (no seeks, lots of continuous I/O) will have sequential-write performance while writing that "looks" random at the application layer will have the (generally worse) random-write performance.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×