Have we reached the peak of media compression technology or is it still advancing

I agree with those sentiments, as far as they go. But it’s worth pointing out that media doesn’t revolve exclusively around kids and their obsession with their phones, or around those with crappy internet connections content with over-compressed low-res streaming.

Consider, for instance, that virtually any new TV is now 4K resolution, and even older TVs will be at least 1080p and have a screen size that would have been considered remarkable just a few decades ago. Blu-Ray discs and, with suitable bandwidth, streaming services now deliver not just 1080p but 4K video.

Those of us who care about such things also have good quality home audio systems with excellent speakers and appreciate that great audio complements great video, and cinematic sound has progressed from the already excellent Dolby Digital to Dolby Digital Plus, Dolby Atmos, and other advanced technologies. So it’s not all bleakness and deteriorating quality! :slight_smile:

HD has been around for so long that some may be forgetting what the “p” in “480p” stands for. It means progressive scan, and in many cases there is a really substantial quality difference between interlaced 480i, the only thing the old NTSC broadcast standard was capable of, and 480p. So much so that in the early days of flat-screen TVs, there was an intermediate precursor to HDTV called “EDTV”, which stood for “Enhanced Definition TV”. These were basically sets capable of displaying 480p, and the results were often quite impressive.

Today the ATSC digital broadcast standard, which originally maxed out at 720p or 1080i, also supports 1080p, though I don’t know how many broadcasters are using it.

AI offers a different concept of compression for music. With the basic information like a score for instrumental and lyric parts and a background database for the artist’s styles music can be reproduced as if it were another live performance. Transmitting the end result alone is still limited by ordinary compression technology while a receiver that has the background database already would need much less information to be transferred.

Well, sure. With lossless compression, you can’t go beyond the actual amount of unique information contained. And though lossy compression gives you more room by allowing nearly imperceivable differences to be discarded, there is a limit to that as well.

But, for all practical purposes, we’re not going to reach that, because you still need infinite processing power and memory to reach those limits for all content. It is a theoretical limit, not a practical one.

I don’t see the point of this kind of pseudo-“compression” at all. Yes, you’re eliminating potential distortions from microphones and the recording process, which with modern equipment and a good sound engineer are undetectable. You’re potentially eliminating distortions from digital sampling and compression, but again, those can easily be arranged to be undetectable, and audio compression algorithms like FLAC are completely lossless, while even lossy compression is undetectable with sufficient bandwidth. And today storage and bandwidth are dirt cheap.

Furthermore, with all this well-executed technology, the overwhelmingly weakest point of the whole process chain is the listener’s audio equipment – their amplifier, speakers, and acoustic environment.

What you lose with AI reconstruction is all the nuances of the artists’ skills. In effect, you lose everything that mattered about the art – about the singer, the musician, the orchestra, the conductor. Why would anyone want to use a technology that gains them nothing and loses everything that’s truly important?

Rhetorical answer to your rhetorical question: Because it’s cheaper.

if it turns out to be significantly cheaper, an entire industry will develop around it. And the tastes of (most of) the public will follow.

There was a study a while back showing that young people preferred the “sizzle” sounds of MP3 compression artifacts:

And then there’s vinyl. That’s probably a mix: some people like the pop/hiss/etc artifacts. Others, I think, appreciate the ritualistic aspects of setting up the turntable. Neither one has anything to do with maximizing reproduction quality.

Another nuance is that you can compress things much better if you have a strong idea of what to expect. The epitome was a contest for image-compression algorithms, where the compressors would be tested with a library of 16 diverse sorts of images. The winning entrant put all 16 standard test images into the decoder, and then compressed them all down to only four bits.

That’s obviously an extreme case, of course, but more realistic examples abound. Plenty of audio codecs are optimized for human voices, for instance, because that’s what the bulk of audio files are, and so they won’t work as well for anything else. JPEG and other image formats work well for the sorts of details found in human faces, but less well for many other kinds of images. Text encoders are based on frequencies of letters and strings of letters in common languages such as English, but fare worse in other languages (even though a different algorithm could be devised that would work better in that other language than English).

Like any content-delivery system (which is what compression is used for), the results are going to be subjective. That ultimately means it’s up to the market to figure out what’s “good enough” at each price point. Some people are willing to pay to listen to a live performance. Others will accept recordings subjectively inferior to a live performance. And others more will be fine with AI-reconstructed sounds.

Good point - I suppose also worth considering that, since we know human perception really does paint in a lot of the gaps all by itself, if we ever develop a good enough direct-to-brain interface, we may be able to just feed in the fragments that the brain normally accepts as sufficient.

It was a simple example of what’s possible. There will be many of the same nuances in the product, they just won’t be applied exactly the same way in any reproduction.

Music is a simple example that doesn’t offer any benefit of compression in transmission of the entire set of data but what about an entire movie, or a complete TV series? How about truly realistic immersive video and movies? The amount of data required for that is massive. A movie where you experience virtual reality indistinguishable from real reality can be done this way.

See here:

I currently have sitting on my cheap Amazon Fire tablet (cost: about $100) two entire TV series and probably at least two dozen movies in high definition. These are all sitting on a 256 GB microSD chip in it (cost: about $25).

Yes, video compression is especially important compared to audio, but at this point in technological development I don’t see any advantage to further compression unless we get into the realm of virtual reality, which, although comprised of previous technologies, is basically a whole new medium with a vast new set of bandwidth demands.

Bandwidth and storage are dirt cheap now development of compression technology continues. It’s all about the future. The future is always bigger than we imagine.

Much, much bigger.

Not really. Most current VR systems use less bandwidth than standard video systems. The Oculus Quest 2 is 1832x1920, which is less than half of 4K in each eye, so both eyes together are less than one 4K stream. The Apple Vision Pro, which I think is the highest resolution commercial VR system, is 3660x3200 in each eye, which is more than 4K but quite a bit less than 8K. The higher demand of VR compared to video streaming is that each frame needs to be generated on the fly, but the bandwidth isn’t necessarily any higher than conventional video.

It seems that in the video realm the ability to capture and reproduce ever higher definition and colour spectrum is outpacing compression ability. Then add in additional audio channels. Although that seems to be topping out in the consumer realm. The ability to compress a 4K video of current level of highest quality video/audio in the same size file as the past 1080P basic surround sound is not possible. But storage, transmission speed and computing power still increase. So in many cases the end user experience and cost can be the same or even better. Often one roadblock is bypassed as other lanes are added or expanded. There are mathematical limitations to compression. But other hardware advances can make those limitations less of an issue.

Not so, due to the higher refresh rate. 8k@60 is 7680x4320x60=1.99 Gpx/s. The Vision Pro runs at at least 90 Hz, for 3660x3200x2x90=2.11 Gpx/s. And it’ll go up to 100 Hz for 2.34 Gpx/s.

Bonus question:
For lossless compression how close is, say, 7z to this mathematical limit? 90%? 99%? Something else?

LSLGuy isn’t quite right that we know what the limits are and can achieve them.

Claude Shannon worked all this stuff out several decades ago, but he made a number of assumptions, statistical independence being a key one. You can work out the maximum compression of a sequence of characters in English and compress them based on their frequency, but that will be an overestimate since characters are not independent.

You can do better if you create your distribution from words instead of letters, but word sequences are also not independent and you can do much better than that.

The real answer depends on too many things that aren’t fully understood. If you have a sequence of words, how many bits on average do you need to predict the next one successfully? Almost impossible to say. Who wrote the thing? That’ll affect the answer, among many other things.

You might have noticed something about the above–predicting the next word. That’s also what LLMs do. One can (and people have) written compressors that use an LLM to predict the next word, and only store enough information to nudge it to the right answer. In many cases you won’t have to store anything because it’ll simply get the prediction right from the start. It’s AI, but it’s lossless because you still encode the differences.

There are groups working on this but I don’t know yet what their performance is. They at least have the potential to do much better than traditional compressors (like 7-zip).

The good old challenge to compress 1 GB of Wikipedia currently stands at 110 MB (total size of the compressor + decompressor as Linux/Windows executables), apparently taking 9.5 GB of RAM and 50 hours to run on the test machine.

It’s a good result. But the dataset is too small to really test the bounds. 110 MB is too small for an LLM. But if you’re compressing 100 TB of text, you can afford to stick in a 400B-token LLM.

I suppose the problem with using AI to fill in the blanks, is are you willing to listen to the version by Lawrence Welk or "The Surf Boys cover band "instead of the original? After all, the descriptors to the AI to do a good job as mimic would be fairly complex too. Maybe when the repetitive quantity rises to the level of a multi-season TV series, the data describing how to mimic can be suffciently detailed that it works as convincingly as simple compression. After all, if the same 10 or so laugh track instances are used throughout the season, they can be stored once and inserted as a simple element in the script for Ai to recreate. Details of the characters’ voice nuances are sent once but reproduced for 100 hours of episodes.Etc. Reptition is the key to compression.

The other point to consider is the data equivalent of the criticism of the ultra-rich: you can still only use one Bentley or swimming pool or yatch at a time. The human mind can only absorb so much data. Audio, visual, tactile… So if the world has a surplus of bandwidth, the problem is storage, not transmission. (And then I’d add data generated and stored and used without human intervention, like security video surveillance footage than never gets viewed unless there’s an incident, or tens of thousands of points of weather data that is simply feed for a weather model)