I know that Captchais designed to keep computer programs from accessing things that you only want humans to access, but why are the challenge words/phrases so hard to read? It sometimes takes me 2 or 3 tries to get it right… and I am a college educated human. And will the inevitable advancement in software’s abilities to read text make this technology eventually obsolete?
Pretty much. Bots get better and better at it, so they have to become more and more illegible. It’s a pain, but the sheer volume of spam necessitates it - for example, this board would probably be 50% spam without such things. It’s a pain.
What’s worse is that bots these days can grab the catchpa and put them through via viruses to get actual humans to solve them and essentially tell them the answer. I’m not sure on the technical specifics of that, but I’ve heard that people unwittingly solve catchpas for bots.
No cite on that, though, so take it as you will.
Bots use OCR to solve captchas. They have to be illegible enough to prevent OCR from working. It’s an arms race - seems to me that they’ve already been pushed to the point where humans can barely read them.
Candyman74, I don’t think the old trick of tricking others into solving captchas is commonly used in practice. That’s because solving the captcha is only one step towards a goal (typically, creating an email account or social network account). It seems to be more common to hire people to do that whole job, where the captcha is just one part, than to farm out captcha solving as an isolated step.
Regardless, they’re not very effective these days. You can buy bulk accounts, with the captchas already bypassed, from “data entry services” for a few cents each.
Why not have captchas ask questions that are more than easy for a human and virtually impossible for a computer, e.g. what color is the sky? If John has three dimes and a nickel, how much money does he have? Is it because they are labor intensive to create?
Plenty of message boards do that. It’s a stock vbulletin option. The admin sets the questions and answers. Bots still get through.
They are labor intensive to create (if you only have a few, it would be easy for the machine to recognize them and have stock answers), but there are also substantial difficulties in making them work well for people with different language skills, cultural backgrounds, etc. A question that might seem trivial to you could be completely unanswerable by someone in Peru, for example. Perhaps that doesn’t matter for you on your personal blog, but it certainly matters to Google, Amazon, etc., and they tend to be the ones driving the state of the art in this type of thing.
That’s pretty cool. I’m in the process of reading some out of copyright (pre 1923) books on my Android phone that came with Google Books. There’s quite a lot of ‘typos’ in it and when I flip to the original pages it’s obvious why. Like the article says, it’s usually because the word is missing, smudged, ripped, someone wrote over it etc. I always figured that it was because they’re giving the books out for free so they just scanned it in, maybe ran it through a spellchecker and sent it out without putting a lot of effort (money) into it. It didn’t really dawn on me that they might not have the manpower to actually have someone read all these and clean them up (there’s 2 million+ free books for E-Readers). On top of that, I’m sure wrong words, that are spelled correctly are going to get missed anyways.
Also if Google really is using two sets of OCR software I’m curious as to how they sometimes entirely miss words all together.
I always wonder if they’d let users tag errors that need to be fixed. Just to keep it simple, it could even be that they don’t allow any sort of comments or allow the user to put in the correction, just tag where the error is. Then, if Google sees a certain amount of errors in one spot, say more then 100 or 1000 or 10,000 or whatever they consider to be ‘a lot’ they can fix it.
Either way, I’m sure it’s much easier now that most (all?) books start out in digital format so the OCR software isn’t needed.
Computers can answer both of those questions exactly as phrased:
If John has three dimes and a nickel, how much money does he have?
Note that I likely would have failed the second one because I can never remember which of those coins is which. (I have a dime here in my desk drawer, but it just says “one dime”, which makes me wonder who does your UX).
Touché! Point taken.
I definitely would have. I don’t have the foggiest what a dime or a nickel is.
one reason pictures are used the most and the most successful so far is that because pictures even as small as they are on captchas can hold enough pixels to simulate true randomness I mean in order to come up with an algorithm that can scan every single pixel of the captcha and determine without a doubt that the letter is an A for example by some kind of pattern when it looks like an A and a B combined with an apostraphe and even still looks like an inkblot test is very difficult especially when the people creating it will know what the captcha exploiters are looking for so they specifically make them as LEAST distinguishable as possible but still distinguishable. Any obvious pattern is the key to captcha’s being auto solved. the less of a pattern there is the harder it is for computers to auto solve.
That’s reCAPTCHA, and it’s a cute idea, but it blows. I wish so many sites didn’t use it. Because the unknown portion of the Captcha is not known, you have no way to control the quality or readability of the image delivered, relentlessly confusing users who don’t know how reCAPTCHA works and why it’s so sucky.
As an example, I offer this 100% genuine reCAPTCHA that I got from Facebook a couple days ago:
No, Facebook, I don’t have a definite integral key on this here keyboard.
You can click on that little link there to get a different set of words.
That’s not the point. The point is any solution that requires such a feature is shit.