How Hard Would It Be To Determain Someone's Language?

I was watching one of those old timey movies and the con gets sent to prison and doesn’t speak English, so the warders have to figure out what he is speaking.

In the end it turns out to be Estonian, which has only 1.1 million speakers (this was in the 40s, so there was probably a bit more back then).

Got me to thinking, how hard would it be for a trained and educated linguist to figure out what language someone is speaking? I’m not saying the linguist could understand them, but like in this case an educated linguist could say “I bet he’s talking Estonian,” and then they try to find someone who speaks it to verify it.

I imagine with certain languages like those spoken in New Guinea, where I read the Ethnologue list over 1,000 languages there, it would be harder, but would a linguist be able to say “That sounds like a dialect or language from New Guinea?”

I’ve been teaching ESL for 13 years now. I can tell your original language and the accent of your first ESL teacher in about 30 seconds.

Hmm, theoretically:

If I was brought in to figure out someone’s language, the first thing I would do is try to get a geographic fix on where they grew up. (1-20 minutes.)

If they were in several linguistic “hot-spots” like New Zealand, then I would have to track down an expert and ask them to help. (1-7 days.)

If they were unable to identify their home country on a map, then I would play audio samples for them until they heard a sound they recognized (2-3 days.)

Could you experiment by demonstrating to them through pantomime that you’d like them to say “pick up the book”, “pick up the pencil”, “sit on the chair”, “sit on the table” and so on, and figure out the basics of the grammer that way? Determining whether a language was SVO vs. SOV, for example, might narrow things down and reduce the time needed to listen to audio samples.

Of course these days you might just have the write something (anything) on a piece of paper and then Google it.

This thread is concerned with people who don’t speak English so your ability to tell someone’s original language would not help you out in the scenario under discussion.

ETA: I remember a case quite recently in which they found some kid who’d been abandoned by her mother. No one knew what language she spoke. I seem to recall that finding out the language of the kid actually took pretty long. I believe that the way they went about it was something like this:
‘hm. Kid looks Asian. Let’s find someone who speaks Chinese and then work our way from there’.

I doubt this would work: Romanization systems vary wildly for the same language, and hitting on one by accident just from transcribing by ear is extremely unlikely. Look at the differences when the same word in Mandarin is transcribed in Pinyin versus Wade-Giles. (‘Zih’ in (Tongyong) Pinyin and ‘tzŭ’ in Wade-Giles represent the same sound!)

Maybe the rise of the International Phonetic Alphabet has changed this. In that case, you’d be betting that someone has made a substantial corpus of the language you’re looking for available online in IPA form. Still seems like a longshot.

Superhal writes:

> . . . linguistic “hot-spots” like New Zealand . . .

I presume that you mean New Guinea. The only languages spoken in New Zealand with any freqency are English and Maori.

Who said anything about romanization? Let them write something in their own script (which is probably the only one they know anyway). The characters can be looked up in a Unicode table if necessary, which alone might be sufficient to determine the language. For example, the Cherokee syllabary is used only to write the Cherokee language, so no further Googling woud be necessary. For other scripts, such as Latin or Cyrillic, some Googling would be necessary, but unless the person is a terrible misspeller, chances are you’ll get a hit right away.

Now, if the person is illiterate, or if their language has no widely used writing system, of course this method won’t work.

You could, provided you had a fairly comprehensive database of languages sorted by sentence structure. If the database also contained things like phoneme inventories, you could use that as well, just by listening to the subject talk.

However, I’m not aware of any such databases that can be queried in this manner. You’d probably have to end up flipping through a book on languages of the world, eliminating the candidates one by one.

If you’re in prison, I wonder how long it’d take before you learned to speak the language that everyone else was using.

If you were going to play sound samples to your illiterate, geographically-ignorant subject, where would you get them? (Btw, google helps, but if you can get anything in writing you can fax it or mail traced copies of it to linguists with ease.)

start by asking their name. Once you have their name, hit a good encyclopedia and see who has a similar name. See where they are from ethnically. Work from there.

Just working from the name John, you get a hell of a lot:

Wasn’t there an episode of Barney Miller revolving around this? Some old lady who got picked up babbling something and the police originally thought she was senile, until the smart cynical one recorded her and played her voice at the local Lithuanian cafe

Depends on how much contact you’re allowed with the inmates and/or guards. Obviously if you’re in a maximum security facility in Ashgabat with near-24-hour lockdown, you’re not going to get much of a chance to practice your Turkmen.

You could start with The Jesus Film, which has been dubbed into over 1000 languages, many of which are streamed for free online. You could also try various Internet radio stations from around the world.

As a trained and practicing linguist, the first thing I would do with such a sample would be to submit it to Google anyway. It’s not as if we linguists are fluent in all the world’s languages or have some magical ability to identify a language just by looking at a written sample. If we don’t recognize something right away, we’d turn to the Internet, books, and other references, just like anyone else would. We might find the answer faster than a layman, since we might have the resources closer at hand and be more familiar with the terminology used, but we wouldn’t be using any method unavailable to anyone else.

This was implicit in the premise, as far as I could tell, just like the assumption they won’t have any form of ID and won’t be traveling with books or maps in their own language.

There’s an interesting use of this trope in the film Inside Man.

They ask random bystanders in New York about the language that some bankrobbers use, one guy recognizes it and gets his ex-wife, who immediately exclaims “It’s Albanian all right, and I know this guy!”

It’s a recorded speech of the ex-President they used to fool the cop’s surveillance, so they deliberately invoked the cliché to waste the cops’ time.

And in the film Amistad, which is based on the following real-life event: A number of newly arrived slaves from Africa were being detained by the US pending the outcome of a court case. As the language of the slaves was unknown, a professor solicited from them their cardinal numbers from one to ten, and began shouting the sequence in the New York City harbour. Eventually someone recognized the sequence as Mende; he was then employed as a translator.

A lot depends on the ground rules you set.

If you’re allowed to take in a map or globe, the process gets a lot shorter!

If not, it would most likely be a process of guessing the basic language family, and bringing in an expert to narrow it down. You might well have to repeat the process several times to nail it exactly if it’s a really obscure language.

You might even be able to get close enough to communicate without having the exact same language. For example, if you had someone who spoke only Scots Gaelic, an Irish-speaker would be able to talk to him–or at least tell you he was a Scot. Probably not the best example, because I’m not sure there even are Gaelic speakers that don’t speak English anymore, but hopefully you get the point.

Asking your basic man on the street isn’t necessarily a good idea. There are English-speakers I have trouble understanding*, and a Mexican friend of mine tells me there are places that speak Spanish that’s incomprehensible to her. I can just imagine what it’s like when you get to areas with hundreds of dialects. You’ll need someone who’s actually studied the languages to figure it out.

  • I once asked a friend in a restaurant what language the people at the next table were speaking. Turned out to be English. They were from Jamaica.

In a couple of my linguist positions they had me doing exactly this (for audio recordings and written texts, not in person). I was a success at it mostly because we weren’t dealing with a wide-open worldwide scope, but with certain fairly broad regions of the world, and I had concentrated on studying as many languages as possible from those regions. I would start by first identifying the language family, and then picking out more specific clues that distinguished languages within the family.

To take a random example, all Indo-Aryan languages share a range of common phonological, lexical, and syntactical characteristics. Someone who’d studied at least of few of them could recognize the Indo-Aryan group immediately. Then if you heard implosive consonants, that would pinpoint it as either Sindhi or Lahnda. At this point I would ask them to call in the Sindhi specialist to verify and begin translating (if it was Lahnda, she knew that too). I was able to nail it 100% of the time, because the scope of the work included a limited number of language families, all of which I’d already familiarized myself with, down to the specific identifying features of each language.

There were maybe only 100 or so languages within the geographical scope we worked in. If it had been worldwide, this would have been much harder, though I could make educated guesses based on my sketchy knowledge of the phonology of Eskimo-Aleut, Mon-Khmer, Niger-Congo, etc. The written texts whose languages I was asked to identify were much wider in scope, but in the few cases where I couldn’t recognize it right off, Google (as explained by psychonaut) would reveal it in short order. The only times the Google method failed me was when it wasn’t a real language (gibberish) or encrypted. I got a good laugh every time they brought me the “Lorem ipsum” text, which completely baffled the other staff, no one else had heard of it. I could also recognize all the writing systems right off (I had a copy of The World’s Writing Systems on my desk), which helped greatly.

The DC metro area where I live is very ethnically diverse with immigrants from around the world, so anywhere you go you can overhear conversations in many different languages. I could make educated guesses like “that’s probably Quechua” or “that’s definitely Slavic,” but it’s harder trying to identify snippets of overheard speech from passersby on the fly, without stopping them to ask, which I don’t. Once at a picnic, though, I asked some nearby women what language they were speaking and they told me Yoruba, and by listening, paying attention, and remembering examples like this, I’ve continually built up my knowledge base.

During WWII did then enemy not know what language the Navajo and Hopi were using to communicate or did they just not understand it?

Not sure about WWII, but there was a German linguist in WWI who succeeded in identifying and translating the Choctaw used in American comms. I believe that experience is what led to the Navajos in WWII being used as code talkers, i.e. another level of encryption that disguised the meaning even if the Navajo language was translated.