Sometimes called nmh
. Point is, it does stuff like keep each message in a separate file, instead of all in one big file. Also, you are supposed to use it using command-line utilities, not a web browser.
Geeky design decisions like that bug me. They may seem “clever” for some arcane purposes but aren’t scalable. What if you had hundreds of thousands or millions of small messages, as email clients often do? Way to blow up a file system, or at least create huge file system overhead!
That’s the filesystem’s problem, not the mail agent’s. It is much superior for many things, for example, the entire folder doesn’t have to be rewritten when one message is deleted. Handling 10s of thousands of files in a single directory is not that big of a deal. It can also do tricky things with filesystem links, so the same message can be in multiple folders without taking up additional space.
The maximum number of files per directory on ext4 is 2^{32}-1, and if that isn’t enough zfs is 2^{48}.
Modern Unix(like) systems will use Maildir format, which is also a single file per message, but structured in a much less human readable way than mh. Both are generally more performant than mbox, where each folder is a file. I don’t know how Exchange stores email.
One of the properties of a good file system is that it shouldn’t limit the user to artificially low constraints in terms of parameters like maximum volume size, maximum file size, or maximum number of files. Those are just a matter of how many bits are allocated to the parameter. It doesn’t mean that it’s wise to go wild with any of them, and it shouldn’t be the file system’s “problem” if it’s used in ways that take it to boundary conditions that were never intended. Millions of files will bloat file structures like index files, impact performance, and waste potentially huge amounts of disk space since each tiny file can be no smaller than the size of one allocation unit, and so may be mostly wasted space.
As for Exchange, we’re talking here about email clients, so let’s focus on Outlook. Outlook stores emails in structures called Personal Storage Tables, each of which is a PST file. There can be one large one for all emails, or many smaller ones, or anything in between depending on how the user wants to organize their data. Each PST appears in Outlook with the same basic folder structure which can be modified or extended by the user as they see fit. This seems to me to be a sensible approach that gives the user maximum flexibility without putting undue stresses and overhead on the underlying file system.
K9? OK, first glimpse is promising. Will check tomorrow cautiously.
Reminds me of a user here, but the name came, I believed, from a friendly affinity to dogs. Paging @k9bfriender just in case.
No. I missed that they only have a Visual Studio Windows version. Which is odd. I guess the guys working on it aren’t interested in Macs or Unix.
You could run it under Wine, but that doesn’t seem a great jump forward.
No updates since 2020 either. So looks as if they have abandoned development. At least it can be recompiled.
That shouldn’t be the case for keeping things in a single file, either. You can write to files by appending data–you don’t have to rewrite the entire file every time.
Granted, under the hood, an SSD does rewrite blocks, as a part of wear leveling. But that would apply in all cases.
Where I could see the filesystem helping is in defragmentation. Such would be handled by the OS or even the firmware (of an SSD). But I doubt this would be something that would matter for most users.
The main advantage I see of the filesystem approach is just that you could access the files outside of any mail client. And being able to make changed by command line could be useful for the right type of user, like a network admin.
I hope it’s clear that “rewrite the file” is an irrelevant criticism of Outlook, anyway. Its Personal Storage Table (PST) files are not in any sense sequential files, but more closely analogous to a database. The ability to organize all your emails either in one large PST or in as many individual ones as you like – all of them having folder structures that you can change to your specific needs – is really one of Outlook’s outstanding features. Plus, I really like the UI, but I can only speak to the 2003 and 2007 releases.
Practically speaking, how do you backup a huge file that is updated dozens of times per day? (Probable answer: use MailStore Home, which I haven’t gotten around to figuring out.)
Bonus question for the OP. Thunderbird updated its UI last fall. I’m still using the classic version. I’m not an early adopter. When should I switch to the new UI?
The tension between a database and lots of files is nearly as old as computers.
Back in the day, Unix system managers knew about the intricacies of the Berkeley Fast File System, and in particular its fragmentation parameter. Here the last portion (or for small files the entire file) was allocated from fragments of a file system block - by default fragments were one eighth of a block. So files could use as little as say 1024 bytes of disk. This really worked well for mail. But the cost was of much more expensive file appending performance. As as soon as the end of the file reached eight fragments, the file system would rewrite the fragments onto a single block. One learnt that for systems where lots of data was produced, you build a dedicated files system with fragmentation disabled. The performance boost was staggering. But for systems where there were lots of little files, and in the days before multimedia content, files were usually small, and usually read and not written, fragmentation worked really well. The gain is available space was similarly staggering. The original FFS paper claimed of the order of 45%. When disk sizes were measured in megabytes, that was big.
The downside was that directories were simple lists of filename inode pairs. Large directories slowed down operations.
The modern era brings with it much more advanced filesystems. Log based file systems append modifications to the log in a very easily optimised manner. Directory structures become more efficient, and the entire file system behaves much more like a database.
But the underlying tension of structure remains. Databases have always brought with them the problem that they are large impenetrable lumps of data. To be useful they need to provide their own backup, archiving, and recovery systems that work outwith the file system. Databases don’t even need to run over a file system, and simply run on raw disk. Indeed they can perform better.
The great advantage of a mail system that uses lots of files is that you, as a user, can manage things without the mail tool. This can be a real win if something goes wrong, or you need to do things the tool is bad at. A corrupted mail database is not a happy thing. You may go back to the last backup, but you may never be able recover emails that arrived in the gap.
The issue of lots of small files sort of went away as emails acquired so much more cruft. Emails of a few hundred bytes are a thing of the past.
The difficulty I have with email systems is that I want all my devices to act as a cache. This includes PCs. Any computer or device should be able to sustain a total failure, be lost, stolen, destroyed, and be restorable from a master copy of data that is managed in a robust and external system. I know of one episode where thieves stole not just the computers but the backup disks. We still don’t quite have that generally available without additional effort. Email with IMAP does get one a lot of the way there. But for a range of reasons it isn’t perfect.
On the first point, there are several possible solutions, the preferable one depending on the individual circumstances. But it begins with the principle that backups are always necessary anyway, and should be done with a frequency commensurate with the importance of the data and amount of tolerable potential loss, typically nightly. If you’re doing incremental backups and the fact of a very large Outlook PST file becomes a nuisance then the solution is probably to leverage Outlook’s ability to support multiple PST databases and create a hierarchy of dated archives. Only the currently active PST will be backed up in an incremental, but all the archives going back to the dawn of time will be instantly available when needed. IMHO Outlook’s ability to organize PSTs this way is one of its important strengths.
And obviously another complementary strategy is to retain emails on the server for some period of time greater than the backup time interval. Both Outlook and Thunderbird and probably most other email clients can do this.
On your second point that I quoted, yes, a corrupted PST is not a happy thing. I don’t recall ever seeing this in decades of running Outlook and in supporting it for my friend who uses it in business and whose email databases are absolutely immense. Microsoft does offer a tool that purports to fix PST file corruption but I don’t remember ever having to use it. In any case, any complex information structure is always theoretically subject to inconsistency. So is the file system itself. The usual solution is either to run a repair utility that exploits built-in redundancies to rebuild damaged structures, or recover from backup. None of this is new or unique to email clients.
On a final note, I should add that my previous posts on this board will show that I’m a vociferous critic of Microsoft and their complete cluelessness about building quality software. But when they do something right, I’m willing to acknowledge it.
That’s fine for adding a message to a folder, but isn’t the case when deleting a message. Deleting can be an expensive operation, so is often saved up and done when a folder is “compacted”, instead of done immediately when a message is deleted.
Each message as a file also appealed to the Unix philosophy of everything is a file, and a text file at that.
Email folders are usually going to max out in the 10s of thousands of files, not millions. It really didn’t even stress filesystems on SunOS or Ultrix. Of course it is faster to list a directory containing one 20MB file than 10000 2KB files, but for something like file based backups the slowdown in checking each file’s metadata is made up for by not having to backup the entire 20MB, but only new (or changed) files. So a daily backup might take slightly longer to index, but only have to backup 30KB instead of 20MB.
In practice, mh was extremely fast at dealing with large mailboxes. Much faster than pine or elm, which would have to read the entire mailbox each time it was opened. In mh it would only have to look in each file if you were performing a search. Something like show
me the current message or the next
message was as simple for the file system as reading the file named 21
or 22
. mh could display several messages in the time it took pine to even open, if your inbox was too big.
You also have to remember that this is from a time where the mail being read was stored on the same computer running the mail agent. Move to things like centralized mail servers running imap was one of the big downfalls of mh, because as designed it is simply incompatible the concept (unless someone has written an abstraction layer in the 20 years since I’ve looked).
Oh, interesting. The guy who wrote it didn’t accept money. (“Make a donation to your local animal shelter if you like it.”) Mozilla probably does.
Apparently Thunderbird is not getting a lot of bugfixes from Mozilla. I am using Betterbird which is forked from Thunderbird but uses the same storage. It does do a better job with searches I think but is pretty similar. They list the improvements on the website. It was no trouble to switch since it picks up all of Thunderbird’s settings.
Thanks for that, I might give Betterbird a try (even though the name sounds like a brand of Thanksgiving turkey!). I’m still moving emails manually from Thunderbird to Outlook, but it looks like the app I’m using has the flexibility to get mailboxes from alternate locations, as long as it recognizes the storage structure.
The whole thing is just so frustrating. I’d still be happily using Outlook except for a stupid, arbitrary, and completely unnecessary insistence on Oauth2 by my perpetually useless ISP.
That sounds like you’re sending 500MB-2GB+ of data to your online backup service once per day. The going back to the end of time strategy doesn’t sound sustainable. The “Keep ~14 backups” strategy sounds viable, but in practice would require cooperation from both the backup and mail program.
Also picking nits, Thunderbird spun off from Mozilla proper a while ago. Wiki: " Available cross-platform, it is operated by the Mozilla Foundation’s subsidiary MZLA Technologies Corporation. Thunderbird is an independent, community-driven project that is managed and overseen by the Thunderbird Council, which is elected by the Thunderbird Community. The project strategy was originally modeled after that of Mozilla’s Firefox web browser and is an interface built on top of that web browser.[" So it’s separate from Firefox now.
“Service”? You seem to be thinking of backups to the “cloud”, which I understand is popular but neither I nor anyone I know does that. In any case, most folks have fast enough internet connections that it wouldn’t be a problem (mine is 700 Mbps). But my preferred strategy is to back up to external drives. My fellow Outlook user who runs a home business does backups to a rotating set of externals, some of which are kept offsite. A nightly incremental takes less than ten minutes over USB 3.
Why not? It’s sustainable for me. I have emails going back to the first one I ever sent when I first got dial-up internet back in the mists of ancient time, imported from Eudora (which I was using at the time) to Outlook, and subsequently migrated to multiple generations of new computers. As for my friend the business user with massive email volume, not all of the numerous PSTs need to be open at the same time, and in fact the oldest ones don’t even have to be online, but can be relegated to offline storage. This tiered approach is a great feature of Outlook. It can open any arbitrary PST file just like Word can open a document, and suddenly all its contents is integrated into the online structure.
FYI, for my oldest Thunderbird/Betterbird email account I am sending 1.7GB of data per day to an online backup service called SpiderOakONE. It has running without issues for quite some time. (Although I have mostly switched to a newer gmail account and this account is growing very slowly.)
The point is sending 2GB/day of backups is not an issue with a reasonably fast ISP.
I’ve been using Thunderbird as my email reader for my AT&T U-verse account for years. The only problem I ever had with it was some years ago (maybe about 5) AT&T made some change in their email access coding and while trying to change the coding I ended up deleting all my emails. There was supposed to be some way to recover them but I managed to screw that up. I may look into some of the backup systems that have been mentioned here.
I also need to go through my inbox and delete all the useless emails that are in it.