What i hate about SSDs and HDDs
Bottom line of this thread: advertisement on drives is correct, the number you see in Windows is correct, the unit you see in Windows is incorrect. This has been known for years and years, but should the software start mentioning GiB, MiB, etc... We would see numerous people flipping and asking what the hell those things are.
EDIT: since this has been marked as best answer, I figure I should put the actual explanation for people who don't know enough with the first paragraph. Here goes...
Where lies the root of the discrepancy?
The problem with drive capacity numbers you see in different places arrives from the incorrect use of size units. In general, following applies:
- Drive manufacturers list drive capacity in kB, MB, GB, TB, ... These units are multiples of a B (byte). As is standard in metric, k stands for kilo (1,000), M stands for mega (1,000 k = 1,000,000), G stands for giga (1,000 M = 1,000,000 k = 1,000,000,000), and so on.
The base multiplier is 1,000 (one thousand), which is a power of 10, which is the base of decimal counting. - Most computer software calculates capacities with a base multiplier of 1,024, which is a power of two, which is the base of binary counting.
This means that one "kB" is actually 1,024 bytes. Obviously, one unit can't have two different definitions in a scientific field so the "kiB" (kibibyte) is born, together with its brothers Mi (mibi, 1,024 ki), Gi (gibi, 1,024 Mi), etc...
The reason they do this is twofold:- it gives a more accurate representation of how much will actually fit on the drive, as hard drive sectors are always a power of two in size. And actual file sizes ON DISK will thus be a power of two (unless you have a file system with very intelligent padding and combining of small data blobs within one sector, but that's another talk altogether);
- it's easier for a computer to calculate stuff with powers of two, because a computer is built upon binary logic and multiplying or dividing by powers of two is a matter of bit shifting, whereas multiplying or dividing by powers of 10 requires a whole lot of complex operations.
- Windows lists drive capacities with a base multiplier of 1024 (so in kiB, MiB, GiB, TiB, etc...) but places the decimal units (kB, MB, GB, TB, etc...) behind the number.
But what does it all mean?
In the end it boils down to this:
- Manufacturers are right: a 2TB drive can hold at least 2,000,000,000,000 bytes (two trillion bytes). And, in effect, it's actually more!
- The number you see in Windows is right: a 2TB drive is 2,000,000,000,000 / (1,024 * 1,024 * 1,024 * 1,024) = 1.818989404 TiB = 1.82 TiB (tibibytes)
- The UNIT listed in Windows is wrong. Windows calculates with binary units but displays the decimal units.
Why can't they just all do one standard unit thing?
For this, you have to look at history. Drive manufacturers in the good old days actually did the same thing as Windows is doing. Everyone listed capacities in "binary" (so base multiplier 1,024) but slapped the decimal unit behind it. Everyone and their dog knew that 1kB actually meant 1,024 bytes. RAM still does this, an 8 GB stick actually contains 8 GiB.
This was all fine and dandy until some clever marketeers came along and figured out that drive manufacturers could actually list the "decimal numbers" with the "decimal units" and get a bigger number on the box. So drive manufacturers started doing that, whilst the software always kept calculating the useful data.
So drive manufacturers are the enemy!
Well, that depends on your point of view:
- On the one hand, they only started listing the correct number for the unit they were using.
- On the other hand, if they changed anything, they should've just changed the unit they listed on the box to be actually correct (kiB, MiB, GiB, etc...)
I'm usually in for the "marketeers are a bunch of idiots" standpoint, but then again, they just came up with a clever way of making more with less. Plus, I'm glad they are at least using the correct number with the correct unit.
So, Windows is dumb?
Yes. Windows is dumb. It should list the correct unit.
Then again, imagine what happened if they all of a sudden started using the correct unit: tech forums all over the world would just explode with question about that weird i in their capacity listings.
On the other hand, most people probably just wouldn't care, if they even noticed it at all.
I'm still not sure how the whole thing works...
Well, uhm, I believe Linus made a video about it at some point. Can't seem to find it though.
TL;DR
Read first paragraph, way at the top, above the horizontal line.
EDIT EDIT: I might as well address the RAM situation in a bit more detail...
Why doesn't RAM have this discrepancy?
As stated a bit earlier: RAM manufacturers still put the "binary number" on the box whilst listing the "decimal unit". One could argue their marketing team could do the same as the hard drive manufacturers did and make even more millions of billions of thousands of dollars $$$$! The difference is in the way both mediums operate and address the available space. Hold on to your panties!
RAM is binary
Ram uses digital chips which have a certain amount of addressable registers. Each register containing one (or possibly multiple) bytes. Since the address decoder on the DIMMs is binary, the number of addresses it can decode will always be a power of two. So it is practically impossible to end up with a DIMM size which is a power of a ten.
HDDs are analog-ish
HDDs store their data on physical platters (hence "hard DISK"). The data is organized in circular tracks around the center of the platter. Depending on the technology (materials) used, the size required to store one bit of data varies and thus, the amount of bits you can fit in one track will vary.
Now, the tracks on hard disks are split into sectors. These sectors are the smallest unit of data a drive will read or write. They consist of a header, a data section, and a footer, each with its own purpose:
- Headers contain meta-data about the sector. This data is used in internal drive accounting.
- Footers contain ECC data to protect the data in the sector from faults, or to know when data is faulty.
- The data section is where data is actually stored. The size is always a power of two.
Because of the physical constraints it is not necessary that the actual amount of bits one can store on the disk is a power of two. The manufacturers make sure that the disk has enough room to fit at least the rated capacity in data sections, together with the necessary headers and footers.
There's also a portion of the drive that remains unused during normal operation. This is reserved for possible damaged sectors to be moved to. Damaged sectors occur during the life of every drive and are not a problem as long as there is still room to create a new sector which replaces it.
OK, all fine and dandy but what about SSDs?
Disclaimer: don't quote me on this chapter. I'm fairly sure this is quite accurate, but still...
Yes, it's true that SSDs have a very similar construction is very similar to RAM. They also store data in chips which have address decoders which can address a number of registers that is a power of two. This means that a 256 GB SSD will actually have room for at least 256 GiB!
The thing with SSDs is that they perform very complicated garbage collection and wear leveling algorithms. The more space they reserve for wear leveling, the longer the SSD will last. Thus, they reserve a portion of the flash chips for internal accounting and other mumbo-jumbo.
Even more TL;DR
See the other TL;DR.
EDIT EDIT EDIT: and while we're at it: file systems!
But wait! There's more! (This thing called formatting)
The attentive reader might notice there still is a discrepancy between the actual usable number of bytes and the number of bytes that should be available even if you take all of the above into account. That's because we didn't talk about formatting.
Formatting?
Yes, formatting. All of the above is talking about the physical block device. Now, to keep things manageable, operating systems use file systems (stuff like FAT, NTFS, ext2/3/4, etc...) to organize files on the block device. These file systems are created by "formatting a drive". You do it when you install Windows, and Windows will ask to format brand new USB thumb drives if they come unformatted out of the factory.
File systems use all sorts of data structures to contain both the data you want to store and the accounting data about that data (called meta-data). Because of this, the actual usable formatted space will be less yet again. How much less depends on the file system used, and the settings used when creating the file system. The bigger the file system, the bigger the discrepancy... Usually.
While we are at it: size on disk vs actual size
The even more attentive reader might also have noticed that Windows lists both the "size" and the "size on disk", and might wonder what the hell is up with that. Simple...
Sectors
As stated before: disks will always read and write at least one full sector of data. Say you have a plain text file containing "hello world", stored in regular plain old ASCII (1 byte per character). The resulting file would be 11 bytes long. Since you always write at least one sector, that file would consume one sector and does have a minimum "size on disk" of one sector, usually 512 B or 4 kiB.
File systems
File systems also come into play here. File systems will separate your data in blocks of a specific size. Often, these blocks will actually be bigger than the physical sectors, in order to improve throughput and decrease processing overhead. A fine example would be a 2 kiB block size on a drive with 512 B sectors. This would mean that said 11 B file would take up 2 kiB on disk! That's nuts!
File system developers realize that's nuts and have invented clever ways of getting around this issue. The most popular these days is stuffing the files inside the meta-data structures, if the files are small enough.
Oh wait, I forgot about partitioning!
Clever readers might know that before you format a drive, you usually divide it up in partitions. These are separated spaces on your device in order to keep certain types of data away from each other. Creating partitions also costs you some overhead space in accounting- and meta-data.
Same goes for the ever more popular logical volumes that are created by things like LVM, or clever data redundancy stuff like RAID.
So I'm wasting precious space all the time?!?!
Yes. You are! Fortunately, the overhead data involved in partitoning and similar stuff is pretty much negligible. File systems are pretty optimized these days as well. All of these things also provide too many benefits to simply discard them.
One last thing
There's also space lost in data formats. For example, storing "hello world" in a Word document will result in a file way bigger than 11 bytes. This is because a .docx format has a lot of meta-data about things like colours, fonts, paper sizes, etc... This is so diverse and format specific, though, that I won't go into it any further.
Dude, stop typing already, you are TL;DR times three already!
Sowie
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now