Maybe statistics? Maybe the captcha only shows you images that the engine has a 75%+ confidence about, and as more humans do the test and answer similarly, that confidence goes up.
e.g. “AI says to itself: I think most of these are storefronts. There’s three I know for sure are, and two I’m not sure about. If the human gets 3/5 right, I’ll accept it.” Then, of the remaining two, eventually most humans agree that one is a storefront and the other isn’t. Over time, the AI gets good enough at identifying storefronts and moves on to other things like, “Is this a restaurant or a bar?”
Edit: And street signs really aren’t always that easy to see. Sometimes they’re red, sometimes yellow, sometimes green or brown or white. Sometimes rectangular, sometimes in English, sometimes only with shapes, sometimes in two languages, sometimes half-broken. Sometimes they’re besides the road, sometimes above it, sometimes big, sometimes small, in various lightning and weather conditions… even for humans it’s really easy to miss a poorly-placed road name signs. Each state has its own signs, the federal government another set, some cities or counties have different styles, etc. and each of those get weathered over time or slowly changed with some subset still grandfathered in. It’s really not the worst thing to train your AI on, given that it’s sufficiently complex but each probably still has only one right answer.
Verrrry interesting. I wonder just how mouse movements can distinguish between sexes, as is claimed. That seems a little more unlikely than just determining if you are human. I’d sure like to see the code behind either of those.
Would this be one of those cases where the four (or forty, or four hundred) thousandth generation of a genetic algorithm seems to have guessed all the previous sexes correctly but no human actually knows exactly how it does it beyond generalities? Like, maybe it weights time to click 3.7655544567 times as much as fastest instantaneous mouse velocity and 2.577876555 times less than mouse tremor but nobody can explain why those particular coefficients work?
As though the algorithm has stumbled upon a baroque mashup of cycles and epicycles and maybe stock market prices or weather reports to predict the motions of planets, with no underlying theory to explain why it should work.
Why would that be so mysterious? Since men have longer fingers and bigger hands than women, it makes sense that men and women click at a different speeds.
Of course there are many other variables–such as the style and weight of the mouse,(I hate the flimsy, lightweight ones with small buttons), the surface of the tabletop, the amount of light in the room, the size of the font on the screen (if it’s hard to read, it takes longer to click on,),the type of clothing worn, etc.But can’t these things be traced by the same programmers who wrote the program?
I think everyone is missing the parsimonious answer here - the developers know that robots are clearly incapable of lying, and therefore would never click the box.
reCAPTCHA doesn’t require absolutely perfect responses every time - they’re not comparing your answers against a known ‘correct’ answer - they are comparing your responses with responses from other humans.
In the earlier versions that used words from digitised books, you would be presented with one computer-generated piece of text, and one scanned piece - you had to get the computer-generated one right, but for the scanned word:
If you were one of the first to see it, your response would just be accepted
If lots of other people saw it before you, your response would be checked against stats from those earlier answers (but this still might have accepted multiple ‘right’ answers if the scan was unclear and different people had read it differently)
For the street signs, I imagine the same is probably true - some of them are probably not real photos (and so their content can be absolutely known by the system) - for those that are real photos, your responses are being evaluated statistically against the herd.
A diligent programmer might sit down and grind through all the variables that the algorithm is juggling, but they would not know whether those variables are all valuable or just random associations. It seems ‘reasonable’ to look at mouse movement speed, but what if the algorithm is factoring in the fourth derivative of mouse x-position modulo the trailing digit of CPU temperature? We may grant that it seems to work to postdict the gender of all the samples used to generate the algorithm, and maybe it even works to predict future samples, but can you explain why it works? If you let the algorithm look at 1000 variables, it might use 50 or 60 or all 1000 in its decision tree when only 10 or 20 are actually causally linked - the rest might just be the result of pouncing on coincidences. On the other hand, maybe what we think of as ‘reasonable’ is misguided, and there is actually some hitherto undetected phenomenon that makes men’s computers run cooler than women’s. That would be mysterious.
With enough data, the spurious correlations should eventually drown in the noise, to the point where, if the algorithm is assigning them any weight at all, it’s only a very small weight. The trick is in knowing when you have enough data to make that happen.
You’re in a desert walking along in the sand when all of the sudden you look down, and you see a tortoise, crawling toward you. You reach down, you flip the tortoise over on its back. The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying to turn itself over, but it can’t, notwithoutyour help. Butyou’renot helping. Why is that?