Will language-translation software ever be perfected?

Voyager · November 22, 2006, 5:25pm

That’s a great example. Clearly, no one sentence will give the context that would guarantee a correct translation. If the sentence were in a book, the translation program would have to understand the setting of the book to be correct. Speakers in a modern college cafeteria or a 1930s big house might mean very different things by the same sentence. I don’t see how a data-driven search is going to get this right either.

Natural speech understanding programs have the same problem, except they don’t have as big a problem in the one word to many meanings dimension. They work fine in a very specific context, for example flight information, but is far from being generally useful.

I think to get this right, you’d have to get AI right first. That’s going to take a while.

Valteron · November 22, 2006, 6:12pm

That is the whole point. And speaking of “data-driven”, do you remember the running joke on Star Trek about the inabaility of Data to understand humour?

Even if you were to program the translation software to recognize the racist use of the word “boy”, for purposes of the fictional novel about the KKK that I am postulating, what if in the next chapter, the Klansman who was just terrorizing the black man Jim comes home to find his teen-aged son has a pamphlet from the civil rights movement?

He grabs the kid by the collar and shouts “You been hanging around with them Yankee, commie perverts, boy?” Now, a human translator would know without being told that the son of the Klansman is very, very likely to be white and that the term “boy” carries no racial connotation here as it did in the previous chapter. It does, however, carry connotations of anger and authority. The father, by sticking the word “boy” in at the end of the accusatory sentence, is stressing that the youth is under his control and had better answer quickly and truthfully.

Any human translator would grasp these facts without even really thinking about them. The word “garçon” would probably NOT be used at all. Instead, it would be more like "You worthless little brat, have you been. . . . . " Or the word “boy” might be replaced with “you answer me NOW!”.

I realize that that is very different from the original “boy”. But translations are like lovers. The beautiful ones are rarely faithful and the faithful ones are rarely beautiful.

I think the problem is this. Translation and language generally is an art, not a science. It arises from deep artistic impulses and feelings that even human beings cannot define.

Why did Lincoln say “Four-score and seven” instead of “eighty-seven”? Until AI can understand pathos, humour, sarcasm and all the other irrational but very human flavourings in human language, it will be able to handle only simple sentences like “I want an apple.”

Valteron · November 22, 2006, 6:34pm

I have taken courses in German and to be fair to Kennedy, his mistake was really minor. Native-speaking Germans have assured me it was mainly unnoticed by the majority of his German audience, and that the rest quickly put it down to a normal, minor mistake and forgot about it.

The way it works is that “Berliner” means both a Berlin resident and a type of jelly donut popularized in Berlin.

Just like “Wiener” means both a resident of Wien (Vienna) and a type of Vienna sausage. Lucky thing Kennedy wasn’t in Vienna.

In German, you leave out the article when you say you are a resident of a city. You say “Ich bin Berliner”. “Ich bin ein Berliner.” sounds like you are saying you are the donut known as a Berliner.

But if I had to compare it to some minor faux-pas in English, it would be like an Afghani who speaks little English saying “I am an Afghan.” You could say he made a mistake and said he was a small knitted blanket, but most people would not even notice the slip and would quickl make the transfer to “Afghani” in their minds.

I very much doubt if any Germans who heard Kennedy that day looked around and scratched their heads in confusion. As I remember, they cheered loudly.

biqu · November 22, 2006, 9:39pm

Maybe Germany has its share of prescriptivist grammarians who insist on an intrinsic distinction between Berliner and ein Berliner (independent of context, especially location – in Berlin and its environs the jelly doughnut is simply known as Pfannkuchen), but according to Wikipedia the majority of those in attendance at Kennedy’s speech found nothing to quibble about with his choice of words.

matt_mcl · November 22, 2006, 9:46pm

It all depends on what you’re willing to settle for, in my opinion. Until you develop some method for computers to understand what you’re writing, you will not replicate the translation process, but rather have more and more sophisticated variants of the Chinese box.

cckerberos · November 22, 2006, 10:03pm

I’ve never heard of such a form (though I wouldn’t be surprised if such a thing existed in archaic Japanese). There is a word for I that only the Emperor can use, though.

Voyager · November 22, 2006, 10:14pm

I didn’t think about the lookahead problem. I assume translators of written works read it first to understand the context before translating, and translators of speech (like those in the UN) know the context. A machine translator of text would have to “understand” the book to do a good job, thus the need for an AI for a general solution.

I suck at human languages, except my own, but I did translate an SF magazine from Spanish to English for my own amusement, so I have some slight appreciation of how hard it is. I did not produce literature, even when compared to what I started with.

Valteron · November 23, 2006, 3:40pm

[QUOTE=cckerberos]
I’ve never heard of such a form (though I wouldn’t be surprised if such a thing existed in archaic Japanese). There is a word for I that only the Emperor can use, though.[/QUOTE

Thank you, now that I think of it, that is really what I had heard. A form of “I” that only the Emperor can use. Cool! Does it sound very different from the other forms of “I”? Is it derived from some other word?

Valteron · November 23, 2006, 4:20pm

That is precisely why, as a tanslator and interpreter (the former is written, the latter is spoken) I seriously do not expect any software to give me a run for my money in my lifetime. Mind you, I am 58, so I can afford to say that. Given the amazing advances in cybernetics every single year, I might not say that if I were 20!

You are right that context is everything. The aother problem computers have is that every word has a connotation and a denotation. The denotation is what it really means. The connotation is all the emotional and intangible baggage attached to it that make a word good or bad.

Take again one small example: Why did Lincoln say “Four-score and seven” instead of “eighty-seven” years"? Americans of the 1860s did not normally use such an archaic formulation in everyday life. Nobody came into a market and ordered “fourscore apples”.

Lincoln knew this of course. But he also knew that “fourscore” was an archaic formulation found in old English and, most importantly, in the King James Bible, with which most of his listeners were familiar. He knew that while the expression was not in common use, it would be just well-known enough not to leave his listeners going

So why use this expression at all? Because the words immediately tell his listeners that he, the President, is about to say something important, something solemn and akin to a prayer. He is not just asking people to vote for a fellow Republican for Governor, for example.

Lincoln was about to state in a very short speech the essence of the democratic ideals he and the honoured dead was defending.

It was like a form of secular, national “Our Father” for a nation torn by war.

The use of an old English expression not only gave his listeners a slight feeling of hearing something sacred, but it also hearkened back to the English origins of American political institutions, to Magna Carta and even to the Declaration of Independence.

Now, my guess is that Lincoln probably opted for that expression after only a few seconds’ thought. And he probably did not consciously stop and dwell on all the connotations of those words as I have just done. More than likely, the expression popped into his brain, and was examined by BOTH his brain and his gut feelings, and he found it appropriate.

He might have agreed, after the fact, with the analysis I just offered of those words, but he would probably say that the words just seemed “right” to him.

Then again, there is another factor in literature. The overall rythm and sound of the words to create moods and effects.

In his “Elegy in a Country Churchyard” Grey says “The plowman homeward plods his weary way”. Wow! Can’t you just hear his tired steps in that line? Can’t you just FEEL the slow shuffle of a poor working man who is exhausted after a hard day? The effect is created with the use of long syllables like “plods” and letters like w in sequnce.

Imagine getting a computer to understand that? :dubious: Good luck!

DragonAsh · November 23, 2006, 10:10pm

I am a professional translator (Japanese-English). I’m 38, and I also do not expect machine translation (ML) to give me anything resembling a run for my money in my lifetime, and not just for literary work. Japanese-English in particular will likely remain almost impossible for a machine to translate. Sentence subjects are optional, and are often omitted. There are no ‘plural’ forms of verbs. It can sometimes be hard to figure out what the text said unless you can actually ask the person who wrote it, not just another native speaker. Plus – the quality of machine translated text depends significantly on the quality of the source text.

And, as noted by others, context is everything. Consider a sentence in a report I worked on this morning:

光ファイバー獲得が順調なNTTに注目したい。

Babel Fish gives this as: We would like to observe to NTT whose optical fiber acquisition is favorable.

That’s what the text says. What it means is something a bit different, of course: the ‘optical fiber’ in the text is NTT’s B FLET’S, a fiber-based (FTTH) broadband service. The ‘favorable acquisition’ means strong subscriber growth. It would be almost impossible for a computer to ‘read between the lines’, as it were, to accurately fill in the missing information. And that’s just a basic, basic example that I didn’t even bat an eye at when I came across it.

I suspect there will be uses for ML, depending on either language pair (I would guess that closely related languages such as Spanish/French/Italian/English would be comparatively easier) or context (menus, documents consisting only of lists of words in a tightly defined field, etc.). Or maybe ‘general gist’ translations, where text is run through software to see if it should be properly translated.

But I, for one, am not losing any sleep over ML at this point.

cckerberos · November 23, 2006, 11:20pm

The word is ‘chin’ ( 朕 ), and doesn’t sound like any of the other words for I (though Japanese has a large number, many of which sound completely different). According to a Japanese dictionary I checked, the word was a common word for I in Ancient China, but by the time of the Qin Dynasty (221 BC - 206 BC) was used only by royalty.

DragonAsh · November 24, 2006, 2:06am

This has actually been in the news recently: a memo recorded by a Tomohiko Tomita, a ‘grand steward’ (whatever that meant) of the Imperial Household Agency during the Showa era, kept memos of comments by the Showa Emperor Hirohito. In one of the memos, made back in 1978, the Emperor is said to comment on the fact that class-A war criminals are enshrined in Yasukuni Shrine:

「だから私あれ以来、参拝していない、それが私の心だ」

“That’s why I haven’t paid a visit to the shrine since then. That’s how I feel about it”

There was considerable debate over whether the memos were authentic or not – in indeed, some people noted that the text used the common first-person singular, 「私」instead of 「朕」. However, there are earlier records of Emperor Hirohito using 私 (watakushi) in private and public settings, at least in the post-WWII period.

Aeschines · November 24, 2006, 4:38am

Another Japanese-to-English translator here. In 2001 I had the opportunity to see what state-of-the-art medical translation software could do when I was working for a drug company. It was utter crap.

I agree with the other J-to-E translators here. Here’s the thing. Often the Japanese we work with is so poorly or idiosyncratically or ambiguously written that you have to use a lot of brain-power and intuition to figure out what is meant. I’ve done a lot of work for auto makers in which they used technical terms that, quite simply, were only being used in that company or even a single factory in the company.

The I-form “chin.” My understanding is that this was only used in written communication/documents. Perhaps formal speeches. For example, the Meiji Constitution uses it. Did Tenchan use it during his broadcasts to tell the Japanese that WWII was over? I don’t know.

Japanese newspapers continue to use the passive as an honorific when talking about the royal family (I despise the continuing existence of the royal family in Japan, btw, and the media’s ass-kissing of them even more). But the passive is very typical usage (even with intransitive verbs). What the newspapers used to use is the–oh what the hell’s it called?–the verb form where you’re made to do something.

So,

Modern: Tennou ha jinja ni sanpai sareta.
Pre-War: Tennou ha jinja ni sanpai saserareta (or something close to this.)

There were also other verb forms used exclusively for the royal family.

Johanna · November 24, 2006, 4:46am

The Malay language has a special royal register that must be used in the presence of the king and queen of Malaysia (their formal titles are Seri Paduka Baginda Yang di-Pertuan Agong and Seri Paduka Baginda Raja Permaisuri Agong). A non-royal has to replace the first person pronoun with the words patik, literally meaning ‘humble slave’, or pacal ‘slave of a slave’, which are not otherwise used in Malay.

There is a whole register of specialized vocabulary used in the royal presence. For example, you don’t tell the queen you’re going to the restroom. You literally say you’re going “to the river” (ke sungai). Instead of saying the common word for “eat” (makan), use the special word santap which could be translated ‘partake of food fit for a king’. The Malaysian monarchy website admits “the Royal language is seldom used these days.”

cckerberos · November 24, 2006, 5:54am

The causative?

The Showa Emperor did use chin in his surrender broadcast (Wiki has images of the Japanese text).

For what it’s worth, the Japanese Wiki article on personal pronouns says:

So I think you’re correct that chin was mainly for written documents.

Voyager · November 24, 2006, 6:01am

This is an example of how some humans aren’t good enough for a true translation. I love the Ciardi translation of Dante, because I think it takes a poet to do justice to it. Even then, the notes give all the puns and double meanings that are untranslatable - both from the fact that there are double meanings in Italian and from history. The Gettysburg Address is more of a prose poem than a speech, so it is the same problem. In some cases a computer would not just have to understand the text, it would have to be creative also, which is a bit much to ask.

Aeschines · November 24, 2006, 9:23pm

Yes, thanks. The causitive-passive was used for the Emp.

Thanks for that info.

Tibby · November 24, 2006, 11:48pm

Let’s assume that the translation software never gets perfected beyond a certain point. Now, think of typical business enterprises that require translation accuracy greater than that which software can provide (e.g. translating manuscripts in a publishing house). Does such a business still benefit from this less than perfected software? In other words, is the professional need for artificial language translation accuracy an all or nothing proposition? Assume a translation inaccuracy rate of 10%. Human translators are still needed to clean and doctor the artificially translated manuscript to a point eligible for publication. Is it advantageous for this business use artificial translation as the first step, or should they use human translation only? Does the human translator require more time and effort starting with a fresh, un-translated manuscript as opposed to a translated manuscript with errors? If human translators do benefit from less than perfect translating software, then I’m all for improving the software as much as possible. The software doesn’t need to be perfect, it simply needs to be better. The hypothetical publishing house may always need human translators, but they will need less and less of them as the software improves. Do businesses requiring professional level translations use translating software as a starting point today?

Aeschines · November 25, 2006, 4:53am

My take is that a two-stage process is unworkable. I’ve done it (or had it done to me). I “edited” the English output of a mediocre Japanese-to-English translator. Now, if the translation was 100% correct in such a circumstance but the writing was less than stellar, I could edit it OK. But if the translation was doubtful and the writing poor at the same time, then I pretty much had to go through the original document sentence by sentence, spending nearly as much time as had I done the thing myself in the first place.

The problem is this: Translation software isn’t just called upon to translate accurately; it’s also called upon to write excellent English in particular styles. It can’t do either well at all right now.

Johanna · November 25, 2006, 5:14am

Not where I work. Not at all. The nature of the texts we translate is often so irregular that machine translation would fail completely much of the time. It’s all human for us.

Topic		Replies	Views
Why does Chinese translate so badly? Factual Questions	63	9200	April 20, 2012
Translation pros - do you use online translators? Factual Questions	24	1583	November 29, 2006
Why no good Latin translators? Factual Questions	30	6229	February 24, 2013
Why can't Asian businesses marketing to the U.S. get their English translations right? The BBQ Pit	128	12222	November 10, 2017
Could you repeat that in English please? Miscellaneous and Personal Stuff I Must Share	55	3495	December 1, 2006

Will language-translation software ever be perfected?

Related topics