How does image recognition work?

Say I wanted to create an iphone app that, at its simplest, compared pictures taken of product X and compared them to a database of images to find a match. How does that work? Are there pre-made modules that do that sort of thing and you just need to provide parameters and the database?

Almost certainly. Use those if you can–this isn’t something you want to write yourself.

There is an approach that uses principle component analysis. This is a statistical method for looking at a number of dependent variables to see how they vary with one another. In the case of imaging, each pixel would represent a variable (its brightness), so for a tiny image you could have 10,000 variables. Now imagine a 10,000 dimensional space. One image maps to one point in that space. Consider a thousand images. They compose a 1000-point cloud in that multidimensional space. There is some longest axis through that cloud angled to capture the largest possible share of the variance (an eigenvector), so you project all the points onto that axis, record each one’s projected position (its first principle component), collapse the cloud in that dimension, and keep repeating. You’ll get 10,000 different vectors and components, but the first ones you get are always disproportionately interesting. Sometimes image recognition performs this analysis and rejects all but the first few components, to simplify things. You might wind up with a 5 or 10 dimensional space, and images you take will cluster in the same area if they are of the same subject.

Look up “eigenfaces” to see an example.

Guaranteed, somebody will come along and point out that I muffed one part of the explanation or another, and they will be right, and welcomed. Yet and still, you should look up “eigenfaces”.

Love the way you just casually throw that in. :wink:

And here it is: “principal”, not “principle”. Other than that, it’s a good explanation, but still not something that you want to implement yourself unless you know a lot about numerical linear algebra.

Why, thank you, ultrafilter - this is part of what makes the SDMB so great!

It’s also more complicated than just clouds of points in that space, since you’ve got to deal with things like rotations and translations: A picture of an object at the left side of the field of view will have a completely different set of pixels than a picture of the same object at the right side of the field of view, and would occupy a completely different “cloud” in the image space, but should nonetheless be recognized as inherently the same thing.

You’d also probably want to have a multi-layered approach, with specialized recognition techniques for specific kinds of objects. For instance, a general-purpose image recognition tool might be able to identify something as a human face, but it probably won’t be able to distinguish one human face from another, because really, in absolute terms, we all look an awful lot alike. So it might then transfer it over to a facial recognition program, which would specifically look for all of the various small details by which human faces differ.

There already is an app that tries to do this - it’s called Google Goggles and comes free with the iPhone Google apps phone suite. You take a picture with your phone camera, then it does its image recognition by connecting to a Google server. It then returns images that are almost, but not entirely, unlike what you’ve photographed. I presume it’s still learning, but it’s astonishingly poor at the moment. I still have absolutely no idea what the point of it is.

I dunno, compared to previous attempts at general image recognition, I think Goggles does an amazingly good job. I mean, it figures out what it’s looking at at least half the time.

As for what it’s good for, right now, not much, but it’s a step towards something much bigger.

Yeah, I’m being inspired by Google Goggles and want to create a specialty app for a market that I doubt they’ll ever be supporting. This isn’t it, but say you saw a movie poster that was in Japanese, and wanted to know what the movie was and when it was coming out in the US. So you take a picture and the app compares the poster with its database.

That’s the same sort of thing that I want to do. I could probably program an app minus the image recognition part, so I was hoping there was some kind of public algorithm to do that. Probably wishful thinking.

Is this something where there’s a well-defined set of objects you’re trying to distinguish between? And they’re all inherently two-dimensional images? That’s a much simpler problem than general image recognition. Look up “template matching”.

Yes, all 2-D images, and a closed set of them. (Album covers, not landscapes, so to speak). I’ll look into that, thanks.

OK, so this is far beyond what you want, but it’s necessary to mention the SIFT algorithm when discussing image recognition. (I suppose that’s better characterized as ‘object recognition in images’.)