Where can I find thousands of images of todlers for free? (not as creepy as it sounds)

I have been messing around with machine learning lately. I am currently quite interested in the deep dream (google search) concept.

For those who don’t know: the idea is that you train a network to identify certain images. Then you sort of run the network backwards to produce an image that maximizes the activation of some node, usually an output node. Say you have a network that can classify different kinds of animals. One of the possible output classifications is a peacock. You decide that you want to find an image that maximizes the activation of the peacock node. You start by selecting an initial image (can be anything: static noise, a picture of a tree etc), then you find out how to adjust each pixel slightly so as to increase the activation of the peacock node a bit. Then you do it again and again and again until the peacock node is highly activated. The result is an image that is some sort of trippy blend of whatever your starting image was and peacock. Most of the time you can still see what the original image is.

I want to try a variation on this. I want to train a network to identify a specific person, vs some other similarish looking people. I then will run the above procedure to maximize the output of the node indicating this specific person. I am expecting that the result will show what features of the specific person the network has learned to be idiosyncratic to that person. In other words, which features collectively uniquely identify that person.

To do this, I need to get a bunch of input images. Also, I would find it more interesting if the specific person is someone I know. The only person I know that I happen to have a lot of images of is my son (2 and a bit years old). I figure that if I train the network on images of a wide range of people, then the network will learn only obvious features (such as short, snotty, giggling etc). Although that might be interesting, I think it will be more interesting if the network has to differentiate between various toddlers who all share these obvious features. So now the network has to identify really special features (who knows what these might be). So to accomplish this I need a large number of images of toddlers. I suppose I could do a google search for ‘toddler’ but that would be a lot of work to save them one by one. I am hoping these is some sort of free stock photo database I can download.

As I write this I realized it might also be interesting to do this with celebrities. I might do that first because it will probably be easier to get a large number of images. I could do a ‘Face Off’ thing with John Travolta vs Nicholas cage. Nick Nolte vs Gary Busey might be too similar for the network though!

Go crazy and put ‘toddler’ into Google images?

Yeah, there is another reason why I am hesitant to use Google images. Why don’t you do it and let me know how it goes?

I did. What do you think the problem would be?

I just did. The first few pages seem to be pictures of toddlers and young children from stock photos and ads, except the 16th picture which shows a child with a cranial abnormality, from the Daily Mail. Why are you hesitant?

Well I need thousands of photos and going through thousands of non-curated search results could turn up something I would rather not see. Anyway, the main reason I am hesitant is it would be much more work to save them one by one, whereas if there is a stock database I can download from, it could be significantly less work.

It seems to me that downloading thousands of images of children is just a bad idea, whether or not the intention is harmless.

Yeah Google is monitoring your search terms and ONE SLIP UP ANYTHING TO DO WITH KIDS ANYTHING AT ALL EVEN ONCE YOU MAKE THE LIST THE FBI WILL BE AT YOUR DOOR INSIDE THE HOUR because that’s how it works.

There are many stock photo agencies that will allow you to download watermarked “comps” for free, eg Getty Images; these places are easy to search and you know you’re not going to stumble across anything creepy. I’m not sure how badly the watermarks will affect what you want to do. I also think the risk is pretty low here.

I’m thinking about the type of people that do Google searches for thread titles like this one, then join up.

Well, Neural Network Programming was pretty big in the eighties. I remember attending a few lectures. Is it still going on? With all the data online these days, it might be poised for a comeback.

You could die tomorrow. What do you want found on YOUR computer?

You missed the baby boat this year, but it will be back again next year – just get a job as a mall santa’s photographer/helper.

A brilliant novel that will get published posthumously.

If it doesn’t have to be toddlers, why not pick something with zero probability of creepiness?

The celebrity thing is good idea, or just do “40-55 year old white middle-class males”. And I agree that Getty Images or Shutterstock would have lots.

Unless you are also doing something that legitimately brings the attention of the Feds, what your relatives would find is lots of unexceptional pictures of toddlers, which your significant other or relatives or friends will understand as part of your facial recognition project–unless you share about it on public message boards but not with the people in your life.

You think the FBI is keeping an eye on people who buy Anne Geddes pictures?

Screenplay for a “Baby Geniuses” movie. Since they’ve already been to Europe, Egypt, and space, I’m thinking “Baby Geniuses: In The Hood”.

You’ll want to look into image databases used for research. I don’t know offhand any kid ones, but one example I found is CAFE. It has ages 2-8. One advantage this type of set is that the images are all standardized in size/ratio, lighting, etc. Once you get an account and permission, you can get the full set without having to worry about potentially “stealing” images that you’d have with GIS. A disadvantage is the same: you won’t find “real world” images with crappy lighting.

I’m reminded of an attempt by the military to use neural networks to distinguish between allied tanks and hostile tanks. The program performed beautifully on the test images, but when deployed in the field, it called every tank hostile. The problem was that in the training images, all of the images of friendly tanks were well lit and framed, but the pictures of hostile tanks were all taken under poor battlefield conditions… and so the program naturally learned that high-quality images were friendly and low-quality images were hostile.

If you’re trying to use your single-individual images from your own collection, then you absolutely need to make sure that all of the other images you use are as like those as possible. If you use a standard database like CAFE where they’re all the same dimensions, then the program will pick out your images as being different based on the dimensions. If you’re using sample images from stock libraries, then the program will pick out your images as being different based on lacking the watermark. What you really want to do is to either start with a standardized database that has multiple pictures of each subject, and just pick one of those as your focus, or find a set of pictures that were all taken under similar haphazard conditions. Celebrity shots might work for the latter, but it’d be tough to assemble an appropriate cohort for your son’s pictures.

If the program is really learning, it might learn to disregard watermarks, as they are always the same.