What's the difference between "size" and "size on disk"?

Jragon · February 18, 2009, 10:39pm

I was about to ask a question about how files got so big until I looked carefully at the properties of my experimental text file. I was, to illustrate my question, making little .txt files with things like “a” in them and wondering how we got to 4kb files for the most basic things. Then I realized the actual SIZE with 1byte, which makes perfect sense… but WHY is the size on disk 4.00kilobytes? Sure I could understand it hitting 1 because of formatting, and pointers to font files and whatnot, but 4KB? Isn’t that a waste, I mean sure, 4KB isn’t skin off ANYBODY’S nose nowadays, but it still makes me wonder where the fact that is is almost half the memory (maybe I’m a little off on value here) of a lower-end super computer not 25 years ago in the most basic file. Is this just padding? Is it reserved memory? If so, what for? Why bother handling it in that size chunks? Is the amount of pointers needed for the file really that immense? I mean it’s not like notepad didn’t work fine and pretty much the same when a 20MB drive was a Big Deal and with those size drives 4KB adds up pretty fast, I can’t prove a Windows 3.1 file isn’t the same size but I can only imagine the files were much smaller. Did we just forget how to compress things effectively?

This may be a dumb/weird question, but it really boggles my mind how a text file is taking up a whole(!) 4KB on my computer.

Harmonious_Discord · February 18, 2009, 11:02pm

Your hard drive has been formatted to a 4k sector size. The smallest size a file can be is one sector of a disk. Every time your file is saved it will take up one sector until after you exceed the size of one sector. At that point your file will need two sectors or 8k of disk space allocated.

Floppies use sectors of 512 bytes. Larger sectors have been used to allow for using larger drives.

Fubaya · February 18, 2009, 11:04pm

Size is the number of bytes in a file, size on disk refers to the size of the clusters being used by the file. Files are stored in clusters which can vary in size depending on the file system. In this case, id assume your clusters are 4kb, so any file will take up at least 4kb on disk, but ive forgotten what little i ever knew about clusters so hopefully someone will correct me if needed.

beowulff · February 18, 2009, 11:06pm

Block size.
Disks can’t just store a single byte - they are block-oriented devices, so the file size gets rounded up to the nearest full block. In other words, if your disk has 512 byte blocks, even a 1-byte file will take 512 bytes. A 513 byte file will take 1,204 bytes. Back in the good old days, the Mac had an upper limit of 65,535 files per disk, regardless of the disk size. This was no big deal when floppies only stored 400K, but as multi-megabyte hard drives became available, it was an obvious problem when a 1-byte file occupied several hundred K on the disk. Apple soon updated there disk format to increase the number of files to billions, and the corresponding block size when down to something reasonable.

mhendo · February 18, 2009, 11:08pm

Presumably, your hard drive is formatted NTFS, which is the standard file system for modern Windows computers. As others have suggested, the cluster size is 4kb, so any file, no matter how small, must take up one 4kb cluster on the disc.

If your drive were formatted with an older file system, like FAT32, your 1 byte file would actually take up 16kb, because the clusters are larger on FAT32.

Stathol · February 18, 2009, 11:41pm

It’s kind of like going to the grocery store and trying to buy exactly 1 oz of apple sauce. They just don’t sell it in that quantity. Furthermore, if they sell apple sauce in, say, 4 oz containers and you want to buy 6.oz, you’ll just have to buy 8 oz. In other words, if you create a file that’s exactly 4097 bytes long, the allocated size on disk will 8k, because 4907 bytes just barely exceeds the 4k boundary.

As for why the system works this way, describing the starting position and length of every file (or every file fragment) in byte precision requires more storage space than describing these things in some larger unit of drive space. It’s an efficiency tradeoff, really. Using large cluster sizes reduce the amount of space that the filesystem structures themselves take up on disk, but increases the potential for wasted space because of files not ending on an even cluster boundary. Similarly, using small cluster sizes means less wasted space from over-allocation, but larger file system structures.

More importantly, cluster size has various implications for file system performance, but that’s usually only seriously considered in applications like database servers and whatnot.

Finally, there’s another way that “size” and “size on disk” can vary: using NTFS filesystem-level compression. Files compressed in this manner will usually appear in blue instead of black. In this case, the “size on disk” will (hopefully) be considerably smaller than the “size”.

Edit: Also, filesystem-level encryption can cause this effect, too.

ChrisBooth12 · February 18, 2009, 11:45pm

why cant a sector be 1 bit?

Stathol · February 18, 2009, 11:59pm

Primarily because the hard drive itself can’t fetch a single bit of data from the drive platter. You’re going to be getting your data from the hardware in byte-sized (hah!) chunks anyway, so there’s no point being more precise than that.

KneadToKnow · February 19, 2009, 12:08am

Because the drive’s table of contents has to be able to keep up with each sector, potentially individually. If each sector were 1 bit, the entry in the table of contents would be bigger than the sector it referred to. Even a drive which had sectors of one byte, essentially would fill itself with its own table of contents, assuming each TOC address could be expressed as a one-byte word.

ticker · February 19, 2009, 12:10am

Also the file system has to keep track of all the sectors. A 1 bit sector means most of the disk is used up keeping track of the tiny amount of space left for data, and the file system must also keep track of those sectors used to keep track of data sectors. OMG what a nightmare:eek:

Bytegeist · February 19, 2009, 12:30am

Somewhat related to this . . .

Many file systems implement something called “sparse files”, in which blocks containing nothing but zero-bytes are not stored explicitly on the disk.

Markxxx · February 20, 2009, 6:04am

The easiest way to think of it is as a shoe box.

Let’s say you collect little glass animals and each type of glass animal has it’s own shoebox. All the glass animals are basically the same size.

So you have

Shoebox with 1 elephant
Shoebox with 3 rhinos
Shoebox with 1 giraffe
Shoebox with 10 Foxes
Shoebox with 1 Ostrich

OK since each shoebox will hold up to 10 glass animals, and you only store the same type of animals in the same shoebox, if you add another elephant, rhino, giraffe or ostrich you won’t need another shoebox. Because you’ve already alloted space for those glass animals. But if you add even one more glass fox, you will need another shoebox.

Of course since you have a total of 16 glass animals you only would need 2 shoeboxes to be efficent instead of five. But five organizes the data better.

Topic		Replies	Views
FAT32: "Size" vs. "Size on disk" Factual Questions	4	1357	February 15, 2005
Folder on network: "size on disk" is much smaller than "size" Why? Factual Questions	10	11029	May 16, 2015
Explain hard drive capacity (actual vs. advertised) to me Factual Questions	11	2194	April 23, 2004
Single folder of MP3 files has almost a 1GB difference between "Size" and "Size on Disk" Factual Questions	23	815	May 31, 2022
Is Windows 98 adding wrong? Factual Questions	6	861	September 30, 2001

What's the difference between "size" and "size on disk"?

Related topics