Single folder of MP3 files has almost a 1GB difference between "Size" and "Size on Disk"

I was looking through my MP3 folders and discovered while most of them their “Size” and “Size on Disk” attributes are 100% the same (or close enough that only a few kilobytes are different) there’s a single folder with 37 MP3 files with a 4.21GB Size but a 3.44GB Size on Disk. No other folder has this massive of a discrepancy. They’re all on the same hard drive too, so they’re not under different forms of Allocation Unit Size either. I also notice that when I try to click around in that folder to look at what the individual mp3 file sizes are I get severe OS instability I don’t see looking at anything else.

Anyone know what the problem is?

Try copying them to a different disk. That should fix the problem.

You may have some bad disk sectors. Run chkdsk.

Okay I found an MP3 file that claimed it was 114MB, but 72MB Size on Disk.

I moved it to another folder, and it improved it 114MB Size but 107MB Size on Disk. But still wondering why it changed.

Sounds like your OS is doing some kind of compression, such as NTFS compression. It is a feature of Windows and you can check by right-clicking on your drive → Properties → look near the bottom on the “General” tab

I’m sure it is on other operating systems too.

I doubt it would be achieving any meaningful compression on MP3 files, since they are already compressed.

mp3 files can contain arbitrary amounts of empty space in the metadata section, and a sparse-file aware file-system will compress that very effectively.

There was an article recently that I read that I was having trouble finding that mentioned that all iTunes-sold mp3s had a big chunk of empty space in them. It was speculated by the author of the article that this was a holdover from when album art would be added to the files at download time (to match localization). The static mp3 file would reserve enough space in the metadata section to put album art, then the art would be added to the file. But then iTunes stopped embedding album art in files so they could make the Cover Flow feature work, and no one regenerated all the mp3 files, so they all have like 512kB of empty space in them.

So it could be that (or something like that).

That doesn’t sound like an improvement to me :slight_smile:

I always understood that to be the difference in the actual file size, and whatever slop there is in the file system.

So for example, you may have a file that’s a certain size, say… 3.2 megabytes. But due to the way the hard drive/SSD stores files, it may actually take up 3.25 or 3.3 megabytes on disk. Part of this is solely due to the way that drives and file systems work - they store in “clusters”, which are basically fixed chunks that drive space is allocated with. The default for Windows NTFS is 4k, but sometimes larger cluster sizes are useful, as a bigger chunk is allocated/read at once. So database systems and other uses often use 64k cluster sizes.

What this means for files is that your file is broken up into chunks of whatever size, and stored. Any left-over space is just empty. So in the example where you have a 4k cluster size, you get a whole bunch of 4k chunks and usually one small one with a bit left over that’s empty space. But if you write a 1k file, you’re writing 1k to the cluster, and having 3k of empty space. You can imagine how that works for a 64k cluster size- lots left over. But in comparison to 4k cluster sizes, we’re talking 1/16th the number of clusters to seek and read.

All this makes a lot of difference on spinning disks, and a lot less on solid state drives- seek time is negligible, and so is read time as a result.

To use a personal example, I have a MP3 on disk that is 7.63 MB (8,002,373 bytes) file size, and 7.63 MB (8,003,584 bytes) size on disk- only 1211 bytes difference- that’s that last cluster slop I was talking about above- my cluster size is 8k, so I have 977 clusters allocated to that file (8,003,584 bytes @ 8192 per cluster), but that last cluster only has 6981 bytes written to it, leaving 1211 as unallocated space.

This adds up over the course of a bunch of files, as you can imagine.

One thing you can do is look here and see about running this to find out your cluster size:

How to: Find out NTFS partition cluster size/Block size > Blog-D without Nonsense (dannyda.com)

Yes, that’s what it is. But note that in the OP’s example Size on Disk was smaller, and then he changed something and it became larger (but still smaller than the actual file size). Not an improvement to have your files taking up more space on disk than they used to.

The block size and fragmentation are ways that Size on Disk can end up larger than Size. Sparse file support is a way that Size on Disk can end up smaller.

The short answer to the OP is that there isn’t a problem with their computer. The discrepancy in file size is because the filesystem is being smart and not wasting a bunch of space storing big blocks of 0s.

I hadn’t really considered sparse files, but that makes sense. Are MP3 files typically a sparse file format?

No.‌‌

MP3 files shouldn’t be a sparse file, but starting a download might create a sparse file, to reserve space, and then as the download progresses the sparse file gets filled in. I think I’ve encountered bittorrent clients that work like that. If the download stops unexpectedly, the sparse space might never get released, so it stays claimed, but isn’t actually taking up space on disk.

That would also have the result of making the MP3s appear large enough to be the full file, but because they have long stretches of empty space, they won’t play all the way through properly.

Cloud storage and syncing such as with OneDrive, DropBox, or iCloud Drive is another way that these differences can manifest. There’s a “file size” associated with every file, but if it hasn’t been downloaded/synced to your computer, then it’s essentially just an alias/shortcut, probably occupying close to the minimum block size of your hard drive, like 4 kilobytes or thereabouts. That’s why my work OneDrive folder reports a size of 512 GB on my home computer, but it’s actual size on disk is only 10 GB or so, since I haven’t downloaded too many of the files. I doubt that’s the case here though.

As I posted upthread (and apologies for not being able to find the cite), mp3s bought from Apple have a bunch of empty space in them, presumably for album art that’s no longer used. Other sources of mp3s may have similar empty sections of metadata.

Partially downloaded files is another possible explanation.

I remember the Napster-dialup days; you’d download a song, and it would zip right down in no time at all. Some troll had created an empty file, all zeroes, and the compression features of X-modem (or Y-modem) would compress 3MB of zeros to a single packet or two. (I recall someone telling me in the good old days of dialup Xmodem - “I don’t like downloading ZIP files - they download so much slower than text files.”)

Windows had the disk compression feature, but with modern disk capacities, it’s probably not worth the effort. Plus, as mentioned above, much of high volume content is already somewhat compressed - photos, video, music… so the advantage of additional compression is less. Compressed folders were displayed blue rather than black text.

I would enable auto-compression (e.g., lz4) on any kind of desktop system. You are not going to max out your CPU.

Yes, the “don’t enable compression” warning I recall was for servers.

OTOH, you are wasting time trying to compress MP3, Photos, video and I assume a decent amount of executables. However, a modern CPU has plenty of time to waste - usually. Just open Task Manager and see how often your CPU comes close to 100%.

Enabling compression at the filesystem level is usually a bad idea these days. Any kind of file that is actually going to take up lots of space on your drive (music, video, etc.) has compression already built into the file itself, so compression will likely do nothing or even perform slightly worse, and while you likely do have CPU cycles to spare, running the CPU more will mean more power consumption and more heat, which is more cost and higher likelihood of hardware failures. Not worth it for the approx 0 other benefits.

If you happen to have, like, tons of plaintext or bitmap files on a drive somewhere, then sure. But I bet you don’t.

ISTM this depends on the speed of (de)compression compared to how long it takes it read something from the drive. If the files are highly redundant and the compression algorithm is fast and the drive is slow, perhaps enabling compression will improve performance. There should be some "concrete “best practices” information, though?

Worse yet, if it’s a random access file rather than being read sequentially, then there would be significant overhead in determining the real location of a specific byte offset into the compressed file. Compressing a commonly accessed database, for example, would be a hilarious waste of computing power since the program essentially just wants to read certain assorted pages when accessing information in the database file.