How can Wikipedia save all that data?

Putting it a different way, checking in on a torrent site, I see that season one of Game of Thrones in 4K video with Dolby Atmos sound is 331 GB. A single season of TV ten times the size of the current text of Wikipedia.

See:

Ref that second item. It’s about 4" by 6" x 1", weighs a few ounces, and stores 100TB for sub-millisecond access. And costs what modern people think is a lot of money for a gizmo at US$12,500.

I can recall speccing 100MB drives being $50K each and we’d need a million of them to hold the same data. Each one the size of a typical US home washing machine. A million washing machines stacked tightly is a rectangular block about 250 x 300 x 400 feet tall. Plus the home refrigerator-sized controller box needed for every 6 washing machines. So toss 180,000 of those on the pile too. And the even larger “channel devices” to control 15 controllers each, so 12,000 of those. And lots (!!1!) of cables as wide as your forearm.

It’ll cost a mere 50 billion dollars for the drives before you buy the even more eye-wateringly expensive controllers and channels, or before you build the VAB-sized structure* to hold it, or pay an army of men (yes, all men) in crew cuts, white shirts, and narrow black ties to wire it all together. Gawd help the guy who has to pay the power bill. The thing would literally melt into a heap of slag from the concentrated heat unless we spread the boxes out probably 2x or 3x as large in each dimension overall and provided insane amounts of HVAC capacity. Hell, we’d probably need cyrogenics to prevent a meltdown.

Or, for $12 grand (plus tax), stop by the store, pick one up, and stick it in your purse on the way out the door. With room for a bag of the cheap impulse buy candy they keep by the cash register. Easy peasy.

We live in miraculous times; truly we do.


* VAB:

I often think about the irony of how useful something like the Cray 1 was for some scientific fields yet simultaneously how completely useless that amount of MIPS/RAM/storage is for most mundane real-world home computer uses. Sure, you can do weather prediction and nuclear weapons design, but can you imagine the patience required to attempt to edit a 50 MP JPEG on one of these turkeys?

I’m trying to imagine how much storage you would get if you filled a rack with those 100TB drives.

That is easy to calculate, though there are non-trivial considerations involved. An Internet Archive-style “PetaBox” has 240 disks per rack. If each drive contains 100 TB, that makes 24 PB per rack. The maximum amount you can actually store depends on the RAID configuration, number of disks that are spares, type of filesystem, etc.

The old-timers are always talking about how a 32-bit memory module cost $400 in the 1950s, or whatever. Although increasing speed and capacity can unequivocally make once-big data no longer big, one thing I have heard is that it is not exactly the physical size of the storage unit that makes a set “big”, but the types and sophistication of the algorithms required to extract useful information from it; even sorting and searching start to run into bottlenecks if there are enough elements.

It’s all about algorithmic complexity. Better supercomputer ==> more lattice points in your physical simulation, or more pixels in your JPEG… You can crank up some parameter n .

Yeah. We’ve also made great strides in algorithms from the basics of the 1940s & 50s. But …

My never-very-expert-and-now-musty intuition is that we’ve made more progress in the capacity to gather data at scale and to store data at scale than we have in the ability to get useful results at scale.

We may soon be in a literal situation where our reach exceeds our grasp. We can observe everything and record everything, but we can’t make useful sense of most of it.

OTOH, quantum physics suggests we’ve wrung almost all the juice out of Moore’s law that there is to be had. He’s running dry. So something other than the hardware price/performance gains I outlined upthread will be needed to keep powering the IT miracle towards the future. If it isn’t quantum magic it’s got to be algorithm design.