Rayhteon has a radio ad playing the DC market that says that (I’m going from memory here) the world generates a zettabyte of data every 6 months, and says that is 5 times the total world capacity of hard drives.
First, I don’t think this correct. The article linked above implies that there is storage capacity of several zettabytes. But second, if it is 5 times the hard drive capacity of the entire world, where is all that data being captured?
Some of it really depends on your definition of “generated”. If I load something from HDD to main memory, is that “generated”? What about if I move from CPU to GPU memory? Or if stuff from RAM is written to a swap file? Do state transitions count as “extra data generated?” That is, if I’m playing a game and someone is at location [2,3,7] and I transform this vector so that they’re at [2,4,7] is that extra data generated, even if it’s written in the same area of memory? What if it’s stored in a temporary area before it’s written to the vector? And does only the area of memory that actually changed count? (That is, did we regenerate the whole vector, or only the y value)
Quite a lot of the internet is based on programmatically filling out templates with data. So every time you fetch a web page that’s not cached, the server is writing the latest news article into their web template and sending it to you. Even this message board is a really fancy way of shoving a database into a browser-readable format. But most of this data is very ephemeral.
The figure is, I suppose, plausible, but I’m not sure it really means anything.
For example, pretty much every operation that completes in a CPU will set various flags depending on the result for branching etc. That is data, and most likely (without doing any calcs) way more than what is claimed in the OP.