Thursday, December 13, 2012

Why hard drives are smaller than expected

People often ask why their Terabyte hard drive isn't a terabyte and time and again the simple, not necessarily false, answer given is that it is a marketing ploy by the evil manufacturers. But there is another answer.

In the good old days there was only the SI units for prefixes - A thousand meters in a kilometer, a thousand grams in a kilogram, and that is how we like it. Engineers and Scientists insist that it be so, well mostly they do. The SI standards organisation defines the prefixes this way.

The binary nature of digital computers lends itself to working with powers of two for units. The problem comes in with how close 1000 happens to be to the value 210 - the difference was considered negligible while designing computers and writing early computer system manuals. The habit stuck and the prefix "Kilo" in computer terms became interchangeable for the value 1024. It would be wasteful to use 1000 as the demarcation for many computer allocation units because powers of two allign well and make for more effecient and cost-effective designs.

This limitation is however specific to situations where bits are processed, stored or transfered in parallel. This includes processors, memory banks and system busses. Serial media, such as communication lines, networks, and hard drives do not suffer from this limitation. (It must be noted that while it is convenient to think of data stored on a hard drives as parallel bits, natively hard drives, just like tape devices, read and write bits in serial.)

A "kilo"-byte turns out to be a convenient measure for quantity of data. The difference also appears negligible at first glance, and using it this way feels comfortable to humans. Note however that the margin of error increases as we move to higher order numbers.

Prefix Order Binary prefix value Decimal prefix Value Deviation
Kilo 1 1,024   1,000   2.40%
Mega 2 1,048,576  1,000,000  4.86%
Giga 3 1,073,741,824  1,000,000,000  7.37%
Tera 4 1,099,511,627,776 1,000,000,000,000 9.95%
Peta 5 1,125,899,906,842,620 1,000,000,000,000,000 12.59%

The net effect is that a Terabyte hard drive is nearly 10% less than what you would expect its size to be!

As mentioned earlier, not all devices on a computer operate in parallel: networks are mostly serial lines. The phone and Digital lines that connects our homes to the Internet communicate in serial. The venerable computer mouse is a serial device. These days the USB protocol is used for just about anything and the "S" in USB in fact stands for "Serial".

Because hard drives in actual fact store data in serial (even "parallel" drives like ATA and SCSI drives eventually convert the data to a serial stream of ones and zeros), they follow the SI prefix specification for number of Bytes in a Gigabyte, while memory modules, which must maximize the investment in bus width and capacity, incorrectly follows a binary interpretation of the decimal prefixes!

The SI system only recognizes the powers-of-ten meaning of the prefixes. A new set of binary prefixes have been defined, though it is not part of the SI standard!

Kilo 1,000 = KB  1,024 = KiB (Kibi Byte)
Mega 1,000,000 = MB  1,048,576 = MiB (Mibi Byte)
Giga 1,000,000,000 = GB 1,073,741,824 = GiB (Gibi Byte)

Hard Drives, Modems, network cards, and airoplanes are designed by engineers following the SI standard and their size specification conforms to the traditional SI meanings. Memory modules follow the size specifications of the Binary prefix system, but marketing brands these with SI decimal prefixes. We as consumers are therefore spoiled since we get more than what we pay for with RAM!

There are however two other items worth mentioning.

The first is Solid State storage devices, such as flash drives. Like RAM these are absed on a natively paralle media and bits needs to be counted and maximized for optimal capacity and effectiveness. Yet these are marketed the same way traditional hard drives are - with the SI meaning of GB or Gigabyte. You would think that (ignoring file system overheads) you should be able to store a GB of data from ram into a 1-GB solid state drive! Blame this one on marketing and exploitation of the people who have come to expect a "1 GB hard drive to be less than 1 GB"

The second is the size of files stored on a hard drive. These are commonly shown with KB having the binary system meaning in stead of the SI meaning. This is despite common storage media used to be natively formatted as serially accessible streams - hard drives and tapes. I assume this may be in part because the writers of the early general purpose operating systems were so deeply ingrained in thinking about a Kilo-byte as 1024 bytes that they never considered doing it the other way, and possibly because those files had to be loaded into core memory which is allocated in chunks which have sizes that are powers-of-two.


So there you have it - don't blame the marketing guys for the missing space on your hard drive, thank them for the extra space on your memory modules. Blame the engineers though. :-)