How does image recognition work?

JSexton · January 31, 2011, 4:41pm

Say I wanted to create an iphone app that, at its simplest, compared pictures taken of product X and compared them to a database of images to find a match. How does that work? Are there pre-made modules that do that sort of thing and you just need to provide parameters and the database?

ultrafilter · January 31, 2011, 5:29pm

Almost certainly. Use those if you can–this isn’t something you want to write yourself.

Napier · January 31, 2011, 7:19pm

There is an approach that uses principle component analysis. This is a statistical method for looking at a number of dependent variables to see how they vary with one another. In the case of imaging, each pixel would represent a variable (its brightness), so for a tiny image you could have 10,000 variables. Now imagine a 10,000 dimensional space. One image maps to one point in that space. Consider a thousand images. They compose a 1000-point cloud in that multidimensional space. There is some longest axis through that cloud angled to capture the largest possible share of the variance (an eigenvector), so you project all the points onto that axis, record each one’s projected position (its first principle component), collapse the cloud in that dimension, and keep repeating. You’ll get 10,000 different vectors and components, but the first ones you get are always disproportionately interesting. Sometimes image recognition performs this analysis and rejects all but the first few components, to simplify things. You might wind up with a 5 or 10 dimensional space, and images you take will cluster in the same area if they are of the same subject.

Look up “eigenfaces” to see an example.

Guaranteed, somebody will come along and point out that I muffed one part of the explanation or another, and they will be right, and welcomed. Yet and still, you should look up “eigenfaces”.

pulykamell · January 31, 2011, 7:38pm

Love the way you just casually throw that in.

ultrafilter · January 31, 2011, 7:40pm

And here it is: “principal”, not “principle”. Other than that, it’s a good explanation, but still not something that you want to implement yourself unless you know a lot about numerical linear algebra.

Napier · January 31, 2011, 8:14pm

Why, thank you, ultrafilter - this is part of what makes the SDMB so great!

Chronos · January 31, 2011, 8:26pm

It’s also more complicated than just clouds of points in that space, since you’ve got to deal with things like rotations and translations: A picture of an object at the left side of the field of view will have a completely different set of pixels than a picture of the same object at the right side of the field of view, and would occupy a completely different “cloud” in the image space, but should nonetheless be recognized as inherently the same thing.

You’d also probably want to have a multi-layered approach, with specialized recognition techniques for specific kinds of objects. For instance, a general-purpose image recognition tool might be able to identify something as a human face, but it probably won’t be able to distinguish one human face from another, because really, in absolute terms, we all look an awful lot alike. So it might then transfer it over to a facial recognition program, which would specifically look for all of the various small details by which human faces differ.

jjimm · January 31, 2011, 9:50pm

There already is an app that tries to do this - it’s called Google Goggles and comes free with the iPhone Google apps phone suite. You take a picture with your phone camera, then it does its image recognition by connecting to a Google server. It then returns images that are almost, but not entirely, unlike what you’ve photographed. I presume it’s still learning, but it’s astonishingly poor at the moment. I still have absolutely no idea what the point of it is.

Chronos · January 31, 2011, 9:56pm

I dunno, compared to previous attempts at general image recognition, I think Goggles does an amazingly good job. I mean, it figures out what it’s looking at at least half the time.

As for what it’s good for, right now, not much, but it’s a step towards something much bigger.

JSexton · January 31, 2011, 11:25pm

Yeah, I’m being inspired by Google Goggles and want to create a specialty app for a market that I doubt they’ll ever be supporting. This isn’t it, but say you saw a movie poster that was in Japanese, and wanted to know what the movie was and when it was coming out in the US. So you take a picture and the app compares the poster with its database.

That’s the same sort of thing that I want to do. I could probably program an app minus the image recognition part, so I was hoping there was some kind of public algorithm to do that. Probably wishful thinking.

Chronos · February 1, 2011, 1:05am

Is this something where there’s a well-defined set of objects you’re trying to distinguish between? And they’re all inherently two-dimensional images? That’s a much simpler problem than general image recognition. Look up “template matching”.

JSexton · February 1, 2011, 1:12am

Yes, all 2-D images, and a closed set of them. (Album covers, not landscapes, so to speak). I’ll look into that, thanks.

Digital_Stimulus · February 1, 2011, 3:39pm

OK, so this is far beyond what you want, but it’s necessary to mention the SIFT algorithm when discussing image recognition. (I suppose that’s better characterized as ‘object recognition in images’.)

Topic		Replies	Views
Which folder contains a photo of a particular object? Factual Questions	17	1066	December 23, 2018
What are eigenimages? Factual Questions	6	2721	October 4, 2005
Facial Recognition, Practical Application of Factual Questions	10	859	May 15, 2005
Laptop Facial Recognition Factual Questions	15	775	July 2, 2018
Where can I find thousands of images of todlers for free? (not as creepy as it sounds) Factual Questions	26	2342	December 28, 2017

How does image recognition work?

Related topics