Jump to content
Sign in to follow this  
M.Yurizaki

[GUIDE] Selecting the appropriate storage

Recommended Posts

Posted (edited) · Original PosterOP

Like selecting the appropriate CPU or GPU for your usage, it's also important to consider what kind of storage you should be getting in order to maximize your value. It might be tempting to grab the highest performing drive, but what if your use case makes this seem like getting a high-end GTX 1080 Ti to drive a 1600x900 screen on a game like Team Fortress 2? You could certainly get something that has better value. This guide will run through the various storage types that are widely available and what use cases they're best suited for.

 

Before I begin though, I would like to go over a few things about storage:

 

Storage Specs and what they mean

IOPS and seek time

IOPS is short for input/output operations per second. This is how many times per second a storage drive can service requests, usually under worst case scenarios. i.e., IOPS are measured when doing 4K random tests. Seek time is the average time each request is serviced. IOPS and seek time are reciprocals of each other, much like hertz and cycle time. So if you have an IOPS rating of 300, it's reasonable to assume the average seek time is 3ms. IOPS is impacts two things:

  • How quickly data can be read or written
  • How the number of requests impacts the time it takes to complete an operation.

Bandwidth

This is how much data in a second the storage can transfer over, and usually the most visible or at least, the spec lots of people see and give weight to.

 

 

Queue Depth (QD)

Storage drives have queues to store commands from the system. In performance tests, it's usually done with 1 command per request and 32 commands per request. The former number is for worst case performance and the latter number is to create a fair comparison between AHCI and NVMe, as AHCI supports a maximum of 32 commands while NVMe supports 65536 commands per queue with 65536 queues. Though another reason may simply be that software hasn't really found a use for that many commands and the firmware of SSDs are tuned for lower QD values.

 

Sequential vs. Random read/write

Sequential represents the best case scenario when getting data off a storage device. In the absolute best case, it's simply something like "I want x number of bytes from location y" and the drive is able to do so. Random represents the worse case scenarios, with the worst being "I want 1 byte from [really large number] locations". You can think of sequential access like being able to read a book from start to finish with random access needing to flip back and forth between pages.

 

Random performance is more of a problem in hard drives. However, AHCI protocols can smartly queue up how the drive seeks data such that it's as sequential as possible, even if the data is random (it can re-arrange it on the controller card)

 

4K performance

The smallest unit of storage for the purposes of addressing is the sector. That is, in modern drives, you cannot access a single byte. If you want a single byte, you have to access the sector, transfer it over to RAM, and then have the application access a single byte. Sectors today are 4 kilobytes, or 4096 bytes, hence the "4K." 4K performance represents the worst case scenario, meaning addressing the smallest amount of data across random locations.
 

How all of this plays together

Imagine you have two scenarios:

  1. Transferring 1,000 files of 1 MB each, totaling 1GB
  2. Transferring 10 files of 100 MB each, totaling 1 GB

And you have two drives with the following performance:

  1. 100 MB/s with an IOPS rating of 10,000
  2. 500 MB/s with an IOPS rating of 100

You might think that in either case, the 500 MB/s drive should beat the 100 MB/s drive. However, this is the actual performance between the two:

  • Transferring 1,000 files of 1MB each
    • The 100 MB/s drive takes a total of (1GB/100MB/s + 1000 * seek time) = 10 + (1000 * 0.0001) = 10.01 seconds.
    • The 500 MB/s drive takes a total of (1GB/500MB/s + 1000 * seek time) = 2 + (1000 * 0.01) = 12 seconds
  • Transferring 10 files of 100MB each
    • The 100 MB/s drive takes a total of (1GB/100MB/s + 10 * seek time) = 10 + (10 * 0.0001) = 10.001 seconds.
    • The 500 MB/s drive takes a total of (1GB/500MB/s + 10 * seek time) = 2 + (10 * 0.01) = 2.1 seconds

Knowing how these two interact is key when figuring out if your use case will benefit from a storage drive or not. Of course, there hasn't been a case where bandwidth and seek time are disproportionate like this, but this was an example to demonstrate how one aspect affects things.

 

What "Loading" Really Means

Loading an application isn't just moving data from storage to RAM. The computer still has to process the data in order to get the application into a "usable" state. Imagine if you were cooking a meal. Moving data could be buying ingredients from the store and returning home with them or taking them out of the pantry or fridge and putting them on the counter. For the most part, you don't have a meal you can eat or "use". So you have to process this food in order to make it into a meal.

 

This can play a big role as well for choosing which storage you need. For example, loading applications involves making a lot of file requests of small size. Here, the bandwidth isn't really as important, the IOPS are more important. Also, if the application needs to process the data as it comes in, then the bottleneck could be the CPU if it takes too long processing the data. In some cases, a faster CPU with an HDD can actually match, if not beat, a slower CPU with an SSD. This is why for some use cases, going with faster storage doesn't do anything more to improve application loading performance or even response times if it needs to get something from storage. Plus application development for years has probably been optimized with the expectation of running on a hard drive, using as few requests as possible or smartly organizing the data to avoid performance hiccups.

 

I want to point out that modern operating systems leave data used by programs in RAM, even if the application has been terminated. This data is marked as "if an application needs more space and there's not enough actual free space, you can remove me." This is also called caching. So unless you're running out of memory constantly, applications that you constantly run and their data are typically resident in RAM most of the time.

 

A Note about Lifespan

One of the biggest concerns with SSDs is that they have a finite write cycle, whereas HDDs can run until they croak. However, this something you shouldn't be concerned about. A modern, high capacity SSD using TLC NAND needs about 1,000 write cycles before croaking. For a 256 GB drive, this is 256TB of data. If I am a typical use case, I've only clocked in at worst about 2-2.5 TB per year. Which means the drive has a theoretical lifespan of 120 or so years. The drive is definitely going to outlive me!

 

Storage Drive Recommendations

And now for the meat and potatoes of this post.

 

NVMe/PCIe SSDs

These represent the fastest storage devices around, with over gigabytes per second of bandwidth and IOPS up to the 300,000 range.

 

Best used in: High bandwidth applications or simultaneous multi-user applications. While I haven't verified it my self, video editing programs that store working data on the drive rather than in RAM should benefit from this. On the IOPS side, it may benefit multiple users of a number of VMs.

 

Not useful for: Just about anything else. This includes single user or cooperative multi-user applications and systems.

 

Recommendation: Don't buy an NVMe/PCIe SSD unless you have an application that can actually use the bandwidth.

 

SATA SSDs

SATA SSDs different from NVMe ones by having a much lower bandwidth and IOPS rating, typically 550MB/s bandwidth with up about 100,000 IOPS. One thing to point out, not all "solid state drives" are created equal. For budget systems that have 32GB or 64GB of "solid state storage," these use eMMC flash memory, which performs about as well as a hard drive, with maybe somewhat better IOPS.

 

Best used in: Application and OS storage. Performance essentially flat lines after SATA SSDs speeds. However, any other high bandwidth applications will still benefit. This is the best storage for single user systems.

 

Not useful for: Anything low bandwidth, such as documents, pictures, music, and movies (though this may change with 4K formats and the need to seek around).

 

Recommendation: If you want the most value and performance, buy a SATA SSD.

 

Hard Drives (HDD)

The good ol' spinning disk. While the best case bandwidth is approaching 300MB/s (yes, hard drives have gotten this good), it has very low IOPS, typically in the hundreds. Hard drives have their own specs to worry about as well.

  • Rotation speed: How fast the disk spins. This improves seek times and bandwidth.
  • Capacity: There's a hidden aspect about capacity. Typically the higher the capacity, the better performing the drive is. This is because higher capacities usually mean higher data density. Higher data density means for a given rotation speed, it accesses more data. If you find a drive of the same capacity but smaller physically, that will perform better than the larger drive.

Best used in: Storage for documents and multimedia files like pictures, music, and movies. These are low bandwidth files that are also typically read sequentially. Recording videos is also a good use case. Archival storage is a classic use case for most people since you can buy large capacities at a much more affordable rate.

 

Not useful for: Application and OS storage. Though this isn't to say using an HDD for this use case is gut wrenching, but if you had an SSD and an HDD, don't use the HDD for this use case.

 

Recommendation: High capacity drives are still good as companion storage drives for systems that have smaller SSDs, or for use in archival.

 

On a side note, most hard drive manufacturers have various models of drives. Western Digital is famous for the "WD Rainbow", where models are given colors. I recommend buying NAS grade storage drives, as they offer decent performance, run quiet and cooler, and have more reliability built in than the budget models like WD Blue or Green (however, WD merged the Blue and Green lines together). You can also buy high performance drives like WD Black, but they are noisy and can run hot.

 

Hybrid Drives/SSHDs

Hybrid drives, or SSHDs, are hard drives that have a few gigabytes of flash memory that's used as cache. This supplements the cache normally present in hard drives. The idea is that the more times a piece of data is accessed, the closer it approaches the flash memory speed and thus, SSD speeds. The other part is that if you power off the machine or reset it, the data remains in cache so the system can use it later.

 

Best used in: Applications and OS storage. Keep in mind that SSHD caching algorithms may need to learn what should be on the flash portion and what shouldn't. Chances are the OS and your daily-use programs will take over cache, letting them feel like they're running on an SSD, while everything else runs at HDD speeds.

 

Not useful for: Documents and multimedia storage. Or anything you don't really access a lot. The data has to be accessed frequently before it reaches cache speed.

 

Recommendation: I generally don't recommend SSHDs, but if you're severely limited in either budget or drive mounts and you want high capacity without spending a fortune on an SSD, an SSHD may give you the best of both worlds. Provided you understand that not everything will get a performance boost.

 

Edited by M.Yurizaki
Updated the SSHD recommendation.
Link to post
Share on other sites
4 minutes ago, M.Yurizaki said:

with the worst being "I want 1 byte from [really large number] of locations".

First error in your guild found. (and if I am incorrect, please explain why I am incorrect)

Link to post
Share on other sites

Wait, did I mess up here? Think I did... I thought 1 byte was only a single 1/0, thus only being able to be stored at a single location. But I think I messed up with bits and bytes.

Link to post
Share on other sites
4 minutes ago, Dutch-stoner said:

Wait, did I mess up here? Think I did... I thought 1 byte was only a single 1/0, thus only being able to be stored at a single location. But I think I messed up with bits and bytes.

Pretty sure a 0 or 1 is a bit and a byte is the set of eight such bits that can then start to represent hexadecimal stuff and move up from there :D


Join the Appleitionist cause! See spoiler below for answers to common questions that shouldn't be common!

Spoiler

Q: Do I have a virus?!
A: If you didn't click a sketchy email, haven't left your computer physically open to attack, haven't downloaded anything sketchy/free, know that your software hasn't been exploited in a new hack, then the answer is: probably not.

 

Q: What email/VPN should I use?
A: Proton mail and VPN are the best for email and VPNs respectively. (They're free in a good way)

 

Q: How can I stay anonymous on the (deep/dark) webzz???....

A: By learning how to de-anonymize everyone else; if you can do that, then you know what to do for yourself.

 

Q: What Linux distro is best for x y z?

A: Lubuntu for things with little processing power, Ubuntu for normal PCs, and if you need to do anything else then it's best if you do the research yourself.

 

Q: Why is my Linux giving me x y z error?

A: Have you googled it? Are you sure StackOverflow doesn't have an answer? Does the error tell you what's wrong? If the answer is no to all of those, message me.

 

Link to post
Share on other sites

Good guide. However, unless someone is trying to go for a "best bang for the buck" or "best budget" build... I think most PC enthusiasts would enjoy getting top-notch equipment. Not because they need to, but because they can. And such is the joy of building your own PC!

 

Still, I like how you mentioned smaller drives at the same RPM technically read faster than larger drives at the same RPM. (Physical size, not total data size)

Link to post
Share on other sites
38 minutes ago, M.Yurizaki said:

If you find a drive of the same capacity but smaller physically, that will perform better than the larger drive.

 

8 minutes ago, NinJake said:

at the same RPM

Didn't mention the RPM. And I also wonder... Data dencity is higher, yes. But the disk is also smaller. (never opened a craptop HDD) So I get that the smaller size disk at the same speeds and capacity are faster... On the outer edge. But is the lowest speed on an small size HDD still faster then the lowest speed on a larger disk?

Link to post
Share on other sites
Posted · Original PosterOP
1 minute ago, Dutch-stoner said:

 

Didn't mention the RPM. And I also wonder... Data dencity is higher, yes. But the disk is also smaller. (never opened a craptop HDD) So I get that the smaller size disk at the same speeds and capacity are faster... On the outer edge. But is the lowest speed on an small size HDD still faster then the lowest speed on a larger disk?

I literally meant "smaller physically". You can find 3.5" 1 TB drives with 2 platters. These should perform better than a 3.5" 1TB drive with the full compliment of 5 platters.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.


×