Captcha and the NY Times. Are these really words?

Quoting Captcha;

*The words that you see were taken directly from old texts that are being scanned and stored in digital format in order to preserve them and make them more accessible to the world.

… Currently, we are helping to digitize old editions of the New York Times and books from Google Books.*

I’ve collectd a few of these words.

barmonical

traisaf

eadioe

ericol

ndndiauyk

webnot

orgella

unitypl

sistrydg

These are all words that passed the captcha test, so they aren’t incorrect responses on my part. Are these words any of you have ever heard or seen used? Am I missing something here?

Well, a link to what you’re talking about would help.

If that isn’t a link to help explain it, please post one to explain where you got the words.

I assume that the captcha (reCaptcha actually) project still produces error hits and these are the words you’re asking about?

Not necessarily. If it’s using the standard reCaptcha, there’s two words. One of them is “known” to the system and spelling that one wrong will fail. The other one is either only partially known or is unknown, and it’s gathering data. If you spell that one closely enough, it’ll still pass you.

I’m pretty sure that they sometimes combine portions of two unrelated snippets to help figure out where the letter breaks are. Like, if the system knows that one image means “canonical” and one means “barmy”, it might randomly give different chunks concatenated together to people.

When you tell it that the word you see is “barmonical”, it knows where those two letter breaks in the original words are.

The last bit is total speculation based on my experience seeing reCaptchas, and on some basic knowledge of how such systems are designed. It does look like it could explain all or almost all the “words” you have.

I collected the words from those presented in the Captcha test used to prove the user is a human. I’ve done this many times and just assumed they were random letters, not words. When I clicked the help option offered on the Captcha box, it lead to the text I quoted in the OP, stating it was from old text, and referencing the NY times. Each of the words I’ve listed are words that passed the Captcha test and allowed access to the the site, registering, etc. I assumed these words are the ones they are referring to as “old text” (the words I’ve quoted above). I’m curious as to whether there are old, abandoned words.

barmonical

  • an archaic term for schizophrenia

traisaf

  • a small wicker talisman often seen in Moroccan homes

eadioe

  • Lowland Scots for a person whose face is asymmetrical

ericol

  • an archaic patent medicine made primarily of lampblack and linseed oil

ndndiauyk

  • a Yupik building material made of dried intestines stuffed with grass

webnot

  • a fact that cannot be corroborated electronically

orgella

  • a botanical compound sometimes claimed to have aphrodisiac properties

unitypl

  • trade name for alginium pyroclasticate, a chemical solution used to clean printing plates

sistrydg

  • Gaelic, meaning incestuous

Thank you iamthewalrus. That makes sense.

And you got these, HOW? They don’t exist in the OED.

Actually, I think he got “traisaf” wrong… that’s the sound of something flying over a person’s head :smiley:

It’s like every single definition in the list is a great big webnot.

Sorry. Forgot what forum I’m in.