What's the difference between "size" and "size on disk"?

I was about to ask a question about how files got so big until I looked carefully at the properties of my experimental text file. I was, to illustrate my question, making little .txt files with things like “a” in them and wondering how we got to 4kb files for the most basic things. Then I realized the actual SIZE with 1byte, which makes perfect sense… but WHY is the size on disk 4.00kilobytes? Sure I could understand it hitting 1 because of formatting, and pointers to font files and whatnot, but 4KB? Isn’t that a waste, I mean sure, 4KB isn’t skin off ANYBODY’S nose nowadays, but it still makes me wonder where the fact that is is almost half the memory (maybe I’m a little off on value here) of a lower-end super computer not 25 years ago in the most basic file. Is this just padding? Is it reserved memory? If so, what for? Why bother handling it in that size chunks? Is the amount of pointers needed for the file really that immense? I mean it’s not like notepad didn’t work fine and pretty much the same when a 20MB drive was a Big Deal and with those size drives 4KB adds up pretty fast, I can’t prove a Windows 3.1 file isn’t the same size but I can only imagine the files were much smaller. Did we just forget how to compress things effectively?

This may be a dumb/weird question, but it really boggles my mind how a text file is taking up a whole(!) 4KB on my computer.

Your hard drive has been formatted to a 4k sector size. The smallest size a file can be is one sector of a disk. Every time your file is saved it will take up one sector until after you exceed the size of one sector. At that point your file will need two sectors or 8k of disk space allocated.

Floppies use sectors of 512 bytes. Larger sectors have been used to allow for using larger drives.

Size is the number of bytes in a file, size on disk refers to the size of the clusters being used by the file. Files are stored in clusters which can vary in size depending on the file system. In this case, id assume your clusters are 4kb, so any file will take up at least 4kb on disk, but ive forgotten what little i ever knew about clusters so hopefully someone will correct me if needed.

Block size.
Disks can’t just store a single byte - they are block-oriented devices, so the file size gets rounded up to the nearest full block. In other words, if your disk has 512 byte blocks, even a 1-byte file will take 512 bytes. A 513 byte file will take 1,204 bytes. Back in the good old days, the Mac had an upper limit of 65,535 files per disk, regardless of the disk size. This was no big deal when floppies only stored 400K, but as multi-megabyte hard drives became available, it was an obvious problem when a 1-byte file occupied several hundred K on the disk. Apple soon updated there disk format to increase the number of files to billions, and the corresponding block size when down to something reasonable.

Presumably, your hard drive is formatted NTFS, which is the standard file system for modern Windows computers. As others have suggested, the cluster size is 4kb, so any file, no matter how small, must take up one 4kb cluster on the disc.

If your drive were formatted with an older file system, like FAT32, your 1 byte file would actually take up 16kb, because the clusters are larger on FAT32.

It’s kind of like going to the grocery store and trying to buy exactly 1 oz of apple sauce. They just don’t sell it in that quantity. Furthermore, if they sell apple sauce in, say, 4 oz containers and you want to buy 6.oz, you’ll just have to buy 8 oz. In other words, if you create a file that’s exactly 4097 bytes long, the allocated size on disk will 8k, because 4907 bytes just barely exceeds the 4k boundary.

As for why the system works this way, describing the starting position and length of every file (or every file fragment) in byte precision requires more storage space than describing these things in some larger unit of drive space. It’s an efficiency tradeoff, really. Using large cluster sizes reduce the amount of space that the filesystem structures themselves take up on disk, but increases the potential for wasted space because of files not ending on an even cluster boundary. Similarly, using small cluster sizes means less wasted space from over-allocation, but larger file system structures.

More importantly, cluster size has various implications for file system performance, but that’s usually only seriously considered in applications like database servers and whatnot.

Finally, there’s another way that “size” and “size on disk” can vary: using NTFS filesystem-level compression. Files compressed in this manner will usually appear in blue instead of black. In this case, the “size on disk” will (hopefully) be considerably smaller than the “size”.

Edit: Also, filesystem-level encryption can cause this effect, too.

why cant a sector be 1 bit?

Primarily because the hard drive itself can’t fetch a single bit of data from the drive platter. You’re going to be getting your data from the hardware in byte-sized (hah!) chunks anyway, so there’s no point being more precise than that.

Because the drive’s table of contents has to be able to keep up with each sector, potentially individually. If each sector were 1 bit, the entry in the table of contents would be bigger than the sector it referred to. Even a drive which had sectors of one byte, essentially would fill itself with its own table of contents, assuming each TOC address could be expressed as a one-byte word.

Also the file system has to keep track of all the sectors. A 1 bit sector means most of the disk is used up keeping track of the tiny amount of space left for data, and the file system must also keep track of those sectors used to keep track of data sectors. OMG what a nightmare:eek:

Somewhat related to this . . .

Many file systems implement something called “sparse files”, in which blocks containing nothing but zero-bytes are not stored explicitly on the disk.

The easiest way to think of it is as a shoe box.

Let’s say you collect little glass animals and each type of glass animal has it’s own shoebox. All the glass animals are basically the same size.

So you have

  1. Shoebox with 1 elephant
  2. Shoebox with 3 rhinos
  3. Shoebox with 1 giraffe
  4. Shoebox with 10 Foxes
  5. Shoebox with 1 Ostrich

OK since each shoebox will hold up to 10 glass animals, and you only store the same type of animals in the same shoebox, if you add another elephant, rhino, giraffe or ostrich you won’t need another shoebox. Because you’ve already alloted space for those glass animals. But if you add even one more glass fox, you will need another shoebox.

Of course since you have a total of 16 glass animals you only would need 2 shoeboxes to be efficent instead of five. But five organizes the data better.