I got to thinking about Gmail the other day and have a question. If I recieve a funny 1 MB video clip in an email as an attachment, and decide to send it to 10 of my friends that also have Gmail, how is that file stored?
Does a seperate copy get put into each persons mail file, increasing the total size of the Gmail store by 10 MB, or is there only one copy and a link to it in each persons mail file?
If they did this, I can see them saving a lot of space… I wonder if their system is that smart though.
I wish there were a Gmail statistics site somewhere.
It’s an interesting question, but probably not one that you’ll get an absolute answer on from Google.
Taking your question as: “could they do this?”, it probably wouldn’t be too hard at some levels. Obviously, if Google processes a message that has an attachment over some threshold size, it might look for other gmail recipients of the message and decide to cache that file in one place. However, the problem becomes harder if anyone uses a redirection service, so their email doesn’t come up as a gmail account. It would be even harder if a few months down the line somebody sent the same file to a gmail recipient. Would gmail check to see if that attachment is already stored somewhere?
At some point, they’re doing a hash of every incoming file and comparing it to every other file they have in storage, and then doing a bit-for-bit check if the hashes match (because you have to be sure). I have no idea whether the storage for such a database and the processing power to run it is worth the savings in disk space.