Raytheon claims about data generated

Rayhteon has a radio ad playing the DC market that says that (I’m going from memory here) the world generates a zettabyte of data every 6 months, and says that is 5 times the total world capacity of hard drives.

First, I don’t think this correct. The article linked above implies that there is storage capacity of several zettabytes. But second, if it is 5 times the hard drive capacity of the entire world, where is all that data being captured?

That doesn’t sound automatically unrealistic.

Lots of data is just lost. Even if it is generated, it’s not necessarily recorded, except in temporary memory.

Also, lots of data isn’t committed to hard drive but to archival quality tape or other media. No reason to store everything on a hard drive.

I have no clue if the claims are right or not, but there’s nothing automatically suspicious about them, either.

Some of it really depends on your definition of “generated”. If I load something from HDD to main memory, is that “generated”? What about if I move from CPU to GPU memory? Or if stuff from RAM is written to a swap file? Do state transitions count as “extra data generated?” That is, if I’m playing a game and someone is at location [2,3,7] and I transform this vector so that they’re at [2,4,7] is that extra data generated, even if it’s written in the same area of memory? What if it’s stored in a temporary area before it’s written to the vector? And does only the area of memory that actually changed count? (That is, did we regenerate the whole vector, or only the y value)

Quite a lot of the internet is based on programmatically filling out templates with data. So every time you fetch a web page that’s not cached, the server is writing the latest news article into their web template and sending it to you. Even this message board is a really fancy way of shoving a database into a browser-readable format. But most of this data is very ephemeral.

The figure is, I suppose, plausible, but I’m not sure it really means anything.

Dynamic content?

Ya, it seems pretty meaningless.

For example, pretty much every operation that completes in a CPU will set various flags depending on the result for branching etc. That is data, and most likely (without doing any calcs) way more than what is claimed in the OP.

If I copy some pictures from my hard drive to a backup service, am I generating data?