AI image recognition and optical illusions

How do image recognition programs interpret optical illusions?

I’ve come across a few articles describing how AI was ‘fooled’ into seeing rotation where none existed but nothing on how they interpret illusions like the old/young woman in profile. Do we know whether they see the old woman, young one, or both?

Are there even any that would see either?

This entirely depends on:

  • The type of image recognition
  • What image elements it is trained to recognize
  • The training data

Some image captioning software for instance labeled absolutely all green meadows/fields/prairies as “field with sheep”, despite there being nothing remotely sheep like there, because that combination was predominant in the training data. Once the training data was improved it got to the point where some sort of objects in the field were required for it to go from “meadow” to “meadow with sheep”, but it could be deer or scattered boulders. Same type of software started out insisting any animal held by a child was a dog or cat and any swimming animal was a seal.

Face finding programs famously find more faces than exist, and miss some, which gave some hilarious mistaken Snapchat face swaps. The best one I remember is the one where the guy’s face is swapped with the hubcap of a car instead of his girlfriend’s picture.

If a image recognition program is trained to find young women or old crones in silhouettes they may classify the picture as containing one or both or neither, depending on the similarities to the training data and the how the program is programmed to render it’s conclusion, but it’s debatable whether that means they “see” anything.

The first description reminds me of the issue with auto colorizing B/W pics and videos. If the AI determines it’s a grassy field, below, it colors it green and assumes the above top part is sky so colors it blue.

Are there AI which are trying to identify young vs. old people, or trying to identify age?

If there are, I see no reason we could predict their results. I suspect there would be as many different answers as there are AIs. And, in this case, I’m including multiple passes of the same AI while it is learning as “different AIs” since they technically have different code.

I also suspect many of them would take a third option–not identifying a person in the image at all. These types of illusions rely on simplifications and our human tendency to see patterns in general and faces in particular.

All AI do is try to figure out some pattern in the data they are given that corresponds to what they are told is the correct answer. They keep trying and trying until they get the answers right on the data set, then you release them on data where you don’t know the answer. And you check their results. The better they get, the more useful they are, and you can start to trust they’ll get it right.

What pattern they pick up on is a total mystery. We don’t even know after they do it. So I can’t see any way to predict the answer of what answer they would give to any particular illusion.