Quick version: how can we tweak the presentation of random images to maximize the probability that a human viewer will ‘see something’ in the randomness?
Machine Elf asked how many images a monitor of given resolution and color depth could theoretically display - the answer is a cosmically large but finite number. The very vast majority of these images look like noise, but of course the whole set also includes all the still frames from every possible movie of arbitrary length, text of all possible texts in all Earth languages, etc. It’s a version of the Library of Babel, but with exponentially more ‘unreadable’ junk in it.
Humans are pattern-seeking creatures: we see bunnies in clouds, Jesus on toast, tigers in the motions of leaves, that sort of thing. Pareidolia is a particular example. We pick out edges and movement better than subtle differences in color. We seem to be optimized to respond to images that match the visual statistics of scenes of nature. We aren’t great at differentiating one image of visual snow from another image of visual snow.
So, if I wanted to mess with Machine Elf’s monitor in order to maximize the chances that a given human observer would ‘see something’ in a given image, what would be the best way to do that, and what would the resulting ‘somethings’ be?
For example, I suspect that making the images greyscale rather than full 24-bit color would ease the load on the human visual system. Maybe showing two or three or four images in rapid succession would trick the eye into detecting a shape or motion in the aggregate, persistence-of-vision result. Or maybe the images should be strobed at the limit of perception, with the viewer sort of forced to make a judgement from a memory rather than being able to examine the image in detail.
We could tweak the statistics of the images by starting with a lower-resolution random ‘seed’ and then expanding that seed to fill the entire test image using an algorithm that matches the visual behavior of the natural world. The natural world has a lot of gradients in it rather than black pixels right next to white pixels, for example. Images made up of short overlapping line segments of random length and orientation - like toothpicks thrown across a floor - might also be more likely to evoke a perceived meaningful image. This would sort of mess with the notion of random pixel images, though, and might not be acceptable.
As for results, given the way that I think human visual systems are wired, I would expect that people would predominantly see human faces and/or human figures over animal or nature images over images of structures or technology or writing. What do others think?