So a couple of years ago I set up a 6TB RAID 1 system which is about getting full, this is formatted in NTFS.
A month ago I bought a 10TB external HD and I’ve been copying everything across, this is formatted in exFAT.
I’ve noticed that the 10TB drive is almost full up and I haven’t even finished copying everything from the 6TB drive. Should this be the case? If so should I just format the 10TB in NTFS format?
Is there any particular advantage to exFAT over NTFS anyway?
Do you have loads of really small files? What cluster size did you use for the exFAT system? Sometimes they are formatted with a cluster size of 256 kB. The 60% slack you are describing does not sound reasonable, though.
ETA I don’t know why exFAT would have any particular advantage in this case over just using one of XFS, NTFS, ZFS, EXT4, etc
This page says default cluster size for a 32GB-256TB exFAT partition is 128 KB. In order for that to explain >4TB of wasted space, you’d have to have about 60 million files.
Yes, I have tons of small files. I’m not a hoarder in real life but I am one on the internet, I’ve saved tons of stuff ‘which I’ll read later’ pretty much from first starting to browse the web.
I’ve checked and the cluster size is set to 1024KB, good? Bad? Indifferent?
And yes to be honest I used exFAT on this drive instead of NTFS pretty much just to try the different format, I figured following the inevitable technological apocalypse I’d be able to use at least one of the drives.
1024k is pretty big but that might be dictated by the size of the disk. If you have lots of small files there’ll be a ton of wasted space; the cluster is the smallest unit addressable by the file system. So a 5 kB file will take up an entire 1024k cluster.
Each file is stored in multiple whole clusters. If the cluster size is 2024 KB, and the file is 2025 KB, it takes up 2 clusters. So the larger the cluster size, the more wastes space.
Isn’t that also true of NTFS?
So the issue would also be on his old drive. Given lots of small files, both drives would be inefficient, though to different amounts depending on differences in cluster size if any.
Doesn’t NTFS use a fixed cluster size?
In fact, don’t all file systems?
Nah, filesystems that don’t use blocks, for example. But even among file systems designed for use on block devices, there are various tricks to reduce wasted space due to internal fragmentation, like variable block sizes, block suballocation, block compression, inode inline data (ext4 and NTFS support this), and similar.
If it’s really going to save multiple TB, maybe it’s worth switching to one of these filesystems with advanced features (ZFS even has its own software RAID built in). I gather the OP really has tens of millions of files? But even just NTFS’s smaller cluster size and use of the MFT may account for the bulk of the difference.
It’s also proprietary AFAIK. At the same time, NTFS can also be read and written on Linux, Mac, etc. One interpretation is that exFAT is just meant to be a relatively simple file system that works on high-capacity SD cards.
The thing is, devices such as flash memory and hard disk drives in use today do have physical blocks or sectors that define a minimum record size that can be read or written. This makes some sort of sense: you can’t write, say, a single bit or byte onto the end of a magnetic tape; it will be wrapped up in some sort of error-correcting code and other low-level data. So, in any case, typical filesystems have to deal with 512-byte or 4K or whatever, depending on the medium and format. Even the CKD mentioned before is these days virtualized from the physical layout.
What you do have are the tricks mentioned before, so if the sector or block size is 512 bytes and you create 1000 files of 60 bytes each, instead of allocating 512000 bytes plus metadata, you would probably save the entire 512 kB or close to it. So it seems the OP should NOT use exFAT and definitely not with 128 kB clusters.
I once designed a filesystem for a NOR flash device. NOR flash has the interesting property that you can overwrite one-bits with zero-bits, at any address granularity. You can overwrite a single bit in the middle of a word anywhere in the flash, fairly freely (I think there was some maxinum number of times you could overwrite a bit without erasing but it was fairly high). NAND flash doesn’t work that way – you can only overwrite a block at a time. I took advantage of this feature of NOR flash in a number of ways in the filesystem design, one of which was that new data is simply appended to a sort of log of writes. So if you write 57 bytes to a file, a record is allocated containing those 57 bytes plus a header and is written to the end of a table of such records in flash (overwritting a bunch of 1’s). This design doesn’t really use any notion of allocation blocks.
NTFS can only be read (not written) on Macs. So the only out-of-the-box filesystem that’s compatible between Mac and Windows is exFAT. The 3rd-party Paragon driver enables NTFS read/write on Mac. Likewise there is a similar one for Windows which enables read/write of the Mac HFS+ filesystem.
In general I’d recommend using NTFS on Windows because it’s a journaled or transactional file system and is more resilient to fragmentation. In case of an abrupt or uncontrolled shutdown there is less chance of filesystem damage with NTFS. On Mac the same situation exists with HFS+ vs exFAT.
The OP issue is likely caused by the difference in default cluster size between NTFS and exFAT, combined with having many small files. For his 10TB drive the exFAT default cluster size is 128KB. For NTFS it is 4KB. So on exFAT each small file could be wasting nearly 128KB each:
1024 KB, according to Post #4. He didn’t use the default cluster size. So, yeah, it may be worth him re-formatting as NTFS (which can store small files and directories in the Master File Table, anyway). If he really needs read/write interoperability with Mac, you’re right, it won’t work out of the box, but there are the free and paid drivers you mentioned.
Also, if the RAID is connected as NAS, there is no reason to worry about compatibility since the Windows/Mac/Linux machines can all access the data over the network. I formatted mine as ZFS; I feel like it is more robust than NTFS for large sets of data, as each block is checksummed.
If by “tons of stuff you’ll read later” you mean tiny text files and HTML pages, file compression might make a huge difference for you (40-50%) at minimal performance loss, since you’re mostly just hoarding and rarely reading them anyway. If you actually meant millions of images, like porn, file compression won’t help (jpeg is already compressed). This should work with RAID too, though I suppose at some conceptual level it would be harder to ignore an error in a compressed file than a plaintext file, though you’d hope this is the sort of thing that RAID error checking would catch and fix to begin with.
Well implemented compression, such as lz4 on ZFS, can increase performance, because the slowest part is reading and writing data to disk. Reading a small amount of data and decompressing it (or compressing and writing) is faster than reading a large amount of data, because processors are so much faster than disks.
As a simple space saver, can you simply zip up whole directories? The wasted space is per file, so even if you used no compression, the transformation of many files into one should save an enormous amount.
Yeah, while there are circumstances where this may be true, for the OP’s archival purposes (write once, read very rarely) it probably won’t really matter either way. The performance change should be negligible, faster or slower.
Filesystem level compression with a saner block size is better for this. Zipping up whole directories makes it harder to do things like search indexing, online backups (a lot of providers still don’t do delta syncs on files), etc. It also makes that one huge zip file more vulnerable to data corruption, potentially affecting multiple files at once. Though, again, it’s hopefully less of an issue with a good RAID array.