Was/is/shall there soon be a graphics format that’s optimized for sharing general “office work”-type screenshots? By that, I mean your typical web browser screen, MS Excel, or other common desktop apps, which are usually a mix of graphics, line art, and text.
As far as I can tell, none of the existing common graphics formats are particularly suited for this particular use case:
- JPEG compresses things into a grid of little squares and blurs text; it’s great for photos but mediocre at screenshots
- PNG can compress losslessly, but results in big files
- WEBP & AVIF are improvements in efficiency, but still not really optimized for “text on top of simple shapes, with sometimes graphics too” that most desktop app screenshots are
The closest current format I can think of that approximates what I’m imagining is actually a properly composited PDF. By “properly composited”, I don’t mean the simple rasterized output you get when you simply embed a screenshot into PDF, but the native output of something like Acrobat or InDesign, which will composite a PDF out of many layers: simple graphics will be saved as vectors, text will have the font subsetted and then laid out by position (as actual text, not just vector shapes), photos can be individually encoded as JPEG and embedded on top of the other layers, and then the final container overall can be internally zipped.
In that way, each part of the app could be optimally encoded with a scheme that makes the most sense for its given data type: text is zipped, pictures are JPEG-encoded, shapes are vectorized into line art, etc. It’s not just a “dumb” raster grid of pixels but an intelligent composition of different data types overlaid on one another, but still in a printer- and screen-ready format.
Of course, it’s a pain to both create and serve such a PDF, especially compared to a simple screenshot. To be able to create a graphics format that works similarly, the screenshot-taking tool would have to be aware of the app’s UI structure and be able to break it down into its constituent shape/graphics/text parts.
My understanding is that some versions of Microsoft’s RDP (Remote Desktop Protocol) is actually able to stream Windows apps this way, breaking down each app’s UI into “system widgets/frames/API calls” (that the remote system can receive algorithmically and recreate locally), text (sent as text), and graphics/pictures (sent as embedded bitmaps). Or something like that; that explanation probably isn’t 100% accurate, but gets the idea across. When done that way, RDP usually resulted in vastly superior remote desktop experiences (less lag, more responsive graphics) compared to things like VNC or Zoom, which just streamed a video of the pixels without any sort of per-data-type optimization.
Has such a thing ever been attempted with screenshots or screen recordings? I understand it’s quite a technical challenge compared to a simple screenshot, and today’s disk space and bandwidth resources makes it not a very high priority problem to solve… but it’s an interesting problem to me, still. As someone who grew up on 14.4k modems, even though we now live in the time of gigabit fiber, it still breaks my heart to see a megabyte-large screenshot that just shows a simple application window. Surely there’s got to be some better way?