Are captchas more trouble than they are worth?

John_DiFool · October 14, 2008, 4:16pm

Inspired by THIS thread…

Apparently the hackers and spammers have little trouble cracking these anymore. Is there a better way I wonder? How about a quick question on some pop-culture reference (a la American GIs who would quiz suspected German agents about who Babe Ruth or Fay Wray was)? Not only would a bot be unable to provide the answer, the armies of Indian captcha-breakers likely would have trouble with such questions. (Who is the star of the show “24”? Who plays third for the Yankees?)

Or, a simple little mathematical equation maybe? Or are such “security” measures ultimately pointless and always subject to countermeasures?

Crowbar_of_Irony_3 · October 14, 2008, 4:23pm

I tried a mathematical anti-bot measure. It seems to work well.

Of course, if the site is being targeted by the determined hacker there’s nothing it can do.

You also have to keep in mind about language barriers or such.

One interesting measure I saw is to ask the users to type in words of a certain color only (like ‘all the red words in the image’) or some sort of spatial recognition which would be hard pressed for bots (how many circles are there in the image?) but you risk accessibility issues.

Baldwin · October 14, 2008, 4:30pm

What I wonder is how the spamming generates enough revenue to be worthwhile. The come-ons for web-cams that I saw in Yahoo chatrooms want you to create an account, which I assume leads to paying money in some way. But if a guy’s horny and desperate and wants to see some pornographic webcams, he’s probably already signed up for something that suits his particular pathetic needs.

What is it about this particular crime – which might reasonably be considered minor in the hierarchy of crimes – that makes me imagine finding the person behind it, and putting my hands around his throat, and squeezing, squeezing. . . .

CookingWithGas · October 14, 2008, 4:35pm

I’d be fucked. I’m a pop culture illiterate.

jjimm · October 14, 2008, 5:13pm

Best captcha I’ve seen is lots of small pictures of fluffy animals, and you have to identify all the puppies or all the kitties. It really needs human intelligence to do that.

ETA: here it is.

Erasmus_Darwin · October 14, 2008, 6:56pm

They’re even cracking that. The last paragraph on the first page of this article discusses it: http://www.technologyreview.com/web/21519/page1/

Mangetout · October 14, 2008, 7:02pm

Jack somebody. If you mean the actor’s name, no idea.

I don’t even know what sport you’re referring to.

Fail.

Chronos · October 14, 2008, 7:06pm

That depends on the number of images in the database. If the database is too small, it’s relatively easy to just recognize “OK, the last time I got this set of images, image#17362 was identified as a fox, so I’ll click on that one as a fox.”. For a captcha to be useful, it has to be able to automatically generate a vast number of different puzzles. For a pop-culture trivia test, you’d need to have a text file somewhere of all of the questions and answers, and if the crackers got ahold of that text file (which they would), the system would be useless. Nor could you generate your question file automatically from Google, because then the crackers could likewise automate the answers through Google.

Cisco · October 14, 2008, 7:07pm

Captcha weeds out A LOT of noise. Not all, but it certainly helps. I put one on my friend requests on Myspace and I went from ~15 spam requests per day down to maybe ~2 per month.

Laughing_Lagomorph · October 14, 2008, 8:11pm

Truly? They are probably the most famous American sports franchise.

I know nothing at all about, say, Manchester United, except what sport they play.
I can well believe most people wouldn’t know who plays where for the Yankees, though.

(When I clicked on jjimm’s link it asked me to identify the pictures with hedgehogs…I bet many North Americans wouldn’t know what they look like).

Alive_At_Both_Ends · October 14, 2008, 8:21pm

Baseball - but I must admit I only got that because of the “plays third”. It’s the only American sport I know of that has a playing order. I doubt if many people this side of the pond would know for sure.

Chronos · October 14, 2008, 8:23pm

…And I just tried the kitten link again, and got two of the same picture show up in the grid. That’s pretty good evidence that their database is too small.

EDIT:

Right answer for the wrong reason. When one speaks of a baseball player “playing third” or “playing first”, it’s position on the field (first, second, or third base), not batting order. A team will typically have a particular batting order they like to use (conventional wisdom is to put your best home-run hitter in the #4 spot), but a player isn’t inherently associated with any particular batting order. The different positions, though (first, second, third, shortstop, pitcher, catcher, left field, center field, right field) do use different skill sets, and a player who’s a pretty good first baseman is likely to be a lousy pitcher (with the exception of Babe Ruth).

Alive_At_Both_Ends · October 14, 2008, 8:34pm

Thanks for clarifying. One of these days I’ll have to try to figure out baseball. (If only for interest’s sake - it gets precisely zero media coverage over here.)

spoike · October 14, 2008, 8:43pm

While purchasing tickets on Ticketmaster.com this weekend, I noticed the following on their captcha screen:

I thought it was kind of cool, and it made me resent them a little less, although I still find it very irritating when I get a completely illegible one.

Mangetout · October 14, 2008, 9:08pm

Truly.

That’s about the same as me, for Manchester United too.

Mangetout · October 14, 2008, 9:10pm

It would be easy enough to get a server to present the same set of images as randomly-named each time, subtly alter pixel values, or apply filters, or randomly partially crop them on demand, so as to make it very difficult for a bot to tell that they were the same images each time.

Troy_McClure_SF · October 14, 2008, 10:23pm

But if you were digitizing the image from the book, then it would need something to compare it to, and it would therefore already have to be digitized somewhere. I don’t see how it could function as both a Capcha and a humanized OCR, if that’s what’s begin said.

tofergregg · October 15, 2008, 12:37am

Read this excellent article about how they do it:

http://arstechnica.com/news.ars/post/20080814-captchas-workfor-digitizing-old-damaged-texts-manuscripts.html

The bottom line:

According to the authors, humans handle over 100 million CAPTCHAs every day. “This mental effort is precious,” they write, “since deciphering CAPTCHAs requires people to perform a task that computers cannot.” Their automated system attempts to harvest this precious effort. Scanned text is subjected to analysis by two optical character recognition programs; in cases where the programs disagree, the questionable word is converted into a CAPTCHA. It, along with a control word of known identity (used for cases where a bot is trying to crack the CAPTCHA) are then distributed to participating websites. Currently, over 40,000 sites are using reCAPTCHA.

The identification performed by each computer program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 votes, the word is considered called. Those words that are consistently given a single identity by human judges are recycled as control words.

OldGuy · October 15, 2008, 4:35am

Baseball does have a playing order normally called a batting order, but Alex Rodriquez who “plays third [base]” usually bats fourth. No American baseball fan would understand the question “who plays third” to refer to the batting order rather than the fielding position.

grey_ideas · October 15, 2008, 7:44am

Interesting, reading the Technology Review article, it sounds like they are using neural network software to learn the CAPTCHAs. I used one of these during my postgrad when I was attempting to get a computer to correctly identify different types of cells based on certain criteria (seven different physical measurements taken at different ages and/or environmental conditions) and it worked incredibly well, greater that 95% accurate if I remember rightly. The only reason that the results never made it into my thesis was that neither I nor anyone I worked with could explain ‘how’ the neural network got the results it did. Bear in mind though that in this case the NN was working on the actual physical measurements rather than the image of the cell itself, still in the rougly ten years it’s been since I did my postgrad I’m not surprised that this sort of thing is being applied directly to the images.

Topic		Replies	Views
Image verification -- keeps bots out, and humans too! The BBQ Pit	58	4540	August 27, 2007
Image verification fields anger me The BBQ Pit	41	4500	December 17, 2008
Would a picture identifying question work in place of CAPTCHA? Factual Questions	33	4213	September 27, 2011
I've had it with these fucking Turing tests. The BBQ Pit	56	5794	December 16, 2009
reCaptcha is the Worst The BBQ Pit	57	16408	August 19, 2012

Are captchas more trouble than they are worth?

Related topics