Like selecting the appropriate CPU or GPU for your usage, it's also important to consider what kind of storage you should be getting in order to maximize your value. It might be tempting to grab the highest performing drive, but what if your use case makes this seem like getting a high-end GTX 1080 Ti to drive a 1600x900 screen on a game like Team Fortress 2? You could certainly get something that has better value. This guide will run through the various storage types that are widely available and what use cases they're best suited for.
Before I begin though, I would like to go over a few things about storage:
Storage Specs and what they mean
IOPS and seek time
IOPS is short for input/output operations per second. This is how many times per second a storage drive can service requests, usually under worst case scenarios. i.e., IOPS are measured when doing 4K random tests. Seek time is the average time each request is serviced. IOPS and seek time are reciprocals of each other, much like hertz and cycle time. So if you have an IOPS rating of 300, it's reasonable to assume the average seek time is 3ms. IOPS is impacts two things:
How quickly data can be read or written
How the number of requests impacts the time it takes to complete an operation.
This is how much data in a second the storage can transfer over, and usually the most visible or at least, the spec lots of people see and give weight to.
Queue Depth (QD)
Storage drives have queues to store commands from the system. In performance tests, it's usually done with 1 command per request and 32 commands per request. The former number is for worst case performance and the latter number is to create a fair comparison between AHCI and NVMe, as AHCI supports a maximum of 32 commands while NVMe supports 65536 commands per queue with 65536 queues. Though another reason may simply be that software hasn't really found a use for that many commands and the firmware of SSDs are tuned for lower QD values.
Sequential vs. Random read/write
Sequential represents the best case scenario when getting data off a storage device. In the absolute best case, it's simply something like "I want x number of bytes from location y" and the drive is able to do so. Random represents the worse case scenarios, with the worst being "I want 1 byte from [really large number] locations". You can think of sequential access like being able to read a book from start to finish with random access needing to flip back and forth between pages.
Random performance is more of a problem in hard drives. However, AHCI protocols can smartly queue up how the drive seeks data such that it's as sequential as possible, even if the data is random (it can re-arrange it on the controller card)
The smallest unit of storage for the purposes of addressing is the sector. That is, in modern drives, you cannot access a single byte. If you want a single byte, you have to access the sector, transfer it over to RAM, and then have the application access a single byte. Sectors today are 4 kilobytes, or 4096 bytes, hence the "4K." 4K performance represents the worst case scenario, meaning addressing the smallest amount of data across random locations.
How all of this plays together
Imagine you have two scenarios:
Transferring 1,000 files of 1 MB each, totaling 1GB
Transferring 10 files of 100 MB each, totaling 1 GB
And you have two drives with the following performance:
100 MB/s with an IOPS rating of 10,000
500 MB/s with an IOPS rating of 100
You might think that in either case, the 500 MB/s drive should beat the 100 MB/s drive. However, this is the actual performance between the two:
Transferring 1,000 files of 1MB each
The 100 MB/s drive takes a total of (1GB/100MB/s + 1000 * seek time) = 10 + (1000 * 0.0001) = 10.01 seconds.
The 500 MB/s drive takes a total of (1GB/500MB/s + 1000 * seek time) = 2 + (1000 * 0.01) = 12 seconds
Transferring 10 files of 100MB each
The 100 MB/s drive takes a total of (1GB/100MB/s + 10 * seek time) = 10 + (10 * 0.0001) = 10.001 seconds.
The 500 MB/s drive takes a total of (1GB/500MB/s + 10 * seek time) = 2 + (10 * 0.01) = 2.1 seconds
Knowing how these two interact is key when figuring out if your use case will benefit from a storage drive or not. Of course, there hasn't been a case where bandwidth and seek time are disproportionate like this, but this was an example to demonstrate how one aspect affects things.
What "Loading" Really Means
Loading an application isn't just moving data from storage to RAM. The computer still has to process the data in order to get the application into a "usable" state. Imagine if you were cooking a meal. Moving data could be buying ingredients from the store and returning home with them or taking them out of the pantry or fridge and putting them on the counter. For the most part, you don't have a meal you can eat or "use". So you have to process this food in order to make it into a meal.
This can play a big role as well for choosing which storage you need. For example, loading applications involves making a lot of file requests of small size. Here, the bandwidth isn't really as important, the IOPS are more important. Also, if the application needs to process the data as it comes in, then the bottleneck could be the CPU if it takes too long processing the data. In some cases, a faster CPU with an HDD can actually match, if not beat, a slower CPU with an SSD. This is why for some use cases, going with faster storage doesn't do anything more to improve application loading performance or even response times if it needs to get something from storage. Plus application development for years has probably been optimized with the expectation of running on a hard drive, using as few requests as possible or smartly organizing the data to avoid performance hiccups.
I want to point out that modern operating systems leave data used by programs in RAM, even if the application has been terminated. This data is marked as "if an application needs more space and there's not enough actual free space, you can remove me." This is also called caching. So unless you're running out of memory constantly, applications that you constantly run and their data are typically resident in RAM most of the time.
A Note about Lifespan
One of the biggest concerns with SSDs is that they have a finite write cycle, whereas HDDs can run until they croak. However, this something you shouldn't be concerned about. A modern, high capacity SSD using TLC NAND needs about 1,000 write cycles before croaking. For a 256 GB drive, this is 256TB of data. If I am a typical use case, I've only clocked in at worst about 2-2.5 TB per year. Which means the drive has a theoretical lifespan of 120 or so years. The drive is definitely going to outlive me!
Storage Drive Recommendations
And now for the meat and potatoes of this post.
These represent the fastest storage devices around, with over gigabytes per second of bandwidth and IOPS up to the 300,000 range.
Best used in: High bandwidth applications or simultaneous multi-user applications. While I haven't verified it my self, video editing programs that store working data on the drive rather than in RAM should benefit from this. On the IOPS side, it may benefit multiple users of a number of VMs.
Not useful for: Just about anything else. This includes single user or cooperative multi-user applications and systems.
Recommendation: Don't buy an NVMe/PCIe SSD unless you have an application that can actually use the bandwidth.
SATA SSDs different from NVMe ones by having a much lower bandwidth and IOPS rating, typically 550MB/s bandwidth with up about 100,000 IOPS. One thing to point out, not all "solid state drives" are created equal. For budget systems that have 32GB or 64GB of "solid state storage," these use eMMC flash memory, which performs about as well as a hard drive, with maybe somewhat better IOPS.
Best used in: Application and OS storage. Performance essentially flat lines after SATA SSDs speeds. However, any other high bandwidth applications will still benefit. This is the best storage for single user systems.
Not useful for: Anything low bandwidth, such as documents, pictures, music, and movies (though this may change with 4K formats and the need to seek around).
Recommendation: If you want the most value and performance, buy a SATA SSD.
Hard Drives (HDD)
The good ol' spinning disk. While the best case bandwidth is approaching 300MB/s (yes, hard drives have gotten this good), it has very low IOPS, typically in the hundreds. Hard drives have their own specs to worry about as well.
Rotation speed: How fast the disk spins. This improves seek times and bandwidth.
Capacity: There's a hidden aspect about capacity. Typically the higher the capacity, the better performing the drive is. This is because higher capacities usually mean higher data density. Higher data density means for a given rotation speed, it accesses more data. If you find a drive of the same capacity but smaller physically, that will perform better than the larger drive.
Best used in: Storage for documents and multimedia files like pictures, music, and movies. These are low bandwidth files that are also typically read sequentially. Recording videos is also a good use case. Archival storage is a classic use case for most people since you can buy large capacities at a much more affordable rate.
Not useful for: Application and OS storage. Though this isn't to say using an HDD for this use case is gut wrenching, but if you had an SSD and an HDD, don't use the HDD for this use case.
Recommendation: High capacity drives are still good as companion storage drives for systems that have smaller SSDs, or for use in archival.
On a side note, most hard drive manufacturers have various models of drives. Western Digital is famous for the "WD Rainbow", where models are given colors. I recommend buying NAS grade storage drives, as they offer decent performance, run quiet and cooler, and have more reliability built in than the budget models like WD Blue or Green (however, WD merged the Blue and Green lines together). You can also buy high performance drives like WD Black, but they are noisy and can run hot.
Hybrid drives, or SSHDs, are hard drives that have a few gigabytes of flash memory that's used as cache. This supplements the cache normally present in hard drives. The idea is that the more times a piece of data is accessed, the closer it approaches the flash memory speed and thus, SSD speeds. The other part is that if you power off the machine or reset it, the data remains in cache so the system can use it later.
Best used in: Applications and OS storage. Keep in mind that SSHD caching algorithms may need to learn what should be on the flash portion and what shouldn't. Chances are the OS and your daily-use programs will take over cache, letting them feel like they're running on an SSD, while everything else runs at HDD speeds.
Not useful for: Documents and multimedia storage. Or anything you don't really access a lot. The data has to be accessed frequently before it reaches cache speed.
Recommendation: I generally don't recommend SSHDs, but if you're severely limited in either budget or drive mounts and you want high capacity without spending a fortune on an SSD, an SSHD may give you the best of both worlds. Provided you understand that not everything will get a performance boost.