Are captchas more trouble than they are worth?

Inspired by THIS thread…

Apparently the hackers and spammers have little trouble cracking these anymore. Is there a better way I wonder? How about a quick question on some pop-culture reference (a la American GIs who would quiz suspected German agents about who Babe Ruth or Fay Wray was)? Not only would a bot be unable to provide the answer, the armies of Indian captcha-breakers likely would have trouble with such questions. (Who is the star of the show “24”? Who plays third for the Yankees?)

Or, a simple little mathematical equation maybe? Or are such “security” measures ultimately pointless and always subject to countermeasures?

I tried a mathematical anti-bot measure. It seems to work well.

Of course, if the site is being targeted by the determined hacker there’s nothing it can do.

You also have to keep in mind about language barriers or such.

One interesting measure I saw is to ask the users to type in words of a certain color only (like ‘all the red words in the image’) or some sort of spatial recognition which would be hard pressed for bots (how many circles are there in the image?) but you risk accessibility issues.

What I wonder is how the spamming generates enough revenue to be worthwhile. The come-ons for web-cams that I saw in Yahoo chatrooms want you to create an account, which I assume leads to paying money in some way. But if a guy’s horny and desperate and wants to see some pornographic webcams, he’s probably already signed up for something that suits his particular pathetic needs.

What is it about this particular crime – which might reasonably be considered minor in the hierarchy of crimes – that makes me imagine finding the person behind it, and putting my hands around his throat, and squeezing, squeezing. . . .

I’d be fucked. :smiley: I’m a pop culture illiterate.

Best captcha I’ve seen is lots of small pictures of fluffy animals, and you have to identify all the puppies or all the kitties. It really needs human intelligence to do that.

ETA: here it is.

They’re even cracking that. The last paragraph on the first page of this article discusses it:

Jack somebody. If you mean the actor’s name, no idea.

I don’t even know what sport you’re referring to.


That depends on the number of images in the database. If the database is too small, it’s relatively easy to just recognize “OK, the last time I got this set of images, image#17362 was identified as a fox, so I’ll click on that one as a fox.”. For a captcha to be useful, it has to be able to automatically generate a vast number of different puzzles. For a pop-culture trivia test, you’d need to have a text file somewhere of all of the questions and answers, and if the crackers got ahold of that text file (which they would), the system would be useless. Nor could you generate your question file automatically from Google, because then the crackers could likewise automate the answers through Google.

Captcha weeds out A LOT of noise. Not all, but it certainly helps. I put one on my friend requests on Myspace and I went from ~15 spam requests per day down to maybe ~2 per month.

Truly? They are probably the most famous American sports franchise.

I know nothing at all about, say, Manchester United, except what sport they play.
I can well believe most people wouldn’t know who plays where for the Yankees, though.

(When I clicked on jjimm’s link it asked me to identify the pictures with hedgehogs…I bet many North Americans wouldn’t know what they look like).

Baseball - but I must admit I only got that because of the “plays third”. It’s the only American sport I know of that has a playing order. I doubt if many people this side of the pond would know for sure.

…And I just tried the kitten link again, and got two of the same picture show up in the grid. That’s pretty good evidence that their database is too small.


Right answer for the wrong reason. When one speaks of a baseball player “playing third” or “playing first”, it’s position on the field (first, second, or third base), not batting order. A team will typically have a particular batting order they like to use (conventional wisdom is to put your best home-run hitter in the #4 spot), but a player isn’t inherently associated with any particular batting order. The different positions, though (first, second, third, shortstop, pitcher, catcher, left field, center field, right field) do use different skill sets, and a player who’s a pretty good first baseman is likely to be a lousy pitcher (with the exception of Babe Ruth).

Thanks for clarifying. One of these days I’ll have to try to figure out baseball. (If only for interest’s sake - it gets precisely zero media coverage over here.)

While purchasing tickets on this weekend, I noticed the following on their captcha screen:

I thought it was kind of cool, and it made me resent them a little less, although I still find it very irritating when I get a completely illegible one.


That’s about the same as me, for Manchester United too.

It would be easy enough to get a server to present the same set of images as randomly-named each time, subtly alter pixel values, or apply filters, or randomly partially crop them on demand, so as to make it very difficult for a bot to tell that they were the same images each time.

But if you were digitizing the image from the book, then it would need something to compare it to, and it would therefore already have to be digitized somewhere. I don’t see how it could function as both a Capcha and a humanized OCR, if that’s what’s begin said.

Read this excellent article about how they do it:

The bottom line:

Baseball does have a playing order normally called a batting order, but Alex Rodriquez who “plays third [base]” usually bats fourth. No American baseball fan would understand the question “who plays third” to refer to the batting order rather than the fielding position.

Interesting, reading the Technology Review article, it sounds like they are using neural network software to learn the CAPTCHAs. I used one of these during my postgrad when I was attempting to get a computer to correctly identify different types of cells based on certain criteria (seven different physical measurements taken at different ages and/or environmental conditions) and it worked incredibly well, greater that 95% accurate if I remember rightly. The only reason that the results never made it into my thesis was that neither I nor anyone I worked with could explain ‘how’ the neural network got the results it did. Bear in mind though that in this case the NN was working on the actual physical measurements rather than the image of the cell itself, still in the rougly ten years it’s been since I did my postgrad I’m not surprised that this sort of thing is being applied directly to the images.