Help me design a universal translator

Like many not-quite-Trekkies, the Star Trek universal translator has always faintly bugged me. Oh, I understand its dramatic necessity, but every so often, the frankly magical way it’s presented as working takes me out of the story. (How it worked at all on Voyager was beyond me, for instance, since they were supposed to have zero information on the species of Delta Quadrant when they started.)

So let’s program a more realistic, non-magical one. Assume for purposes of this thought experiment that we’re talking about a spoken-word translator. It won’t be telepathic; instead it will be a tiny computer with an audio input and output elements, a prodigious database, and processing speed a hundred times faster than anything available now. It won’t be truly universal; instead it will have to be set for one input and a different output language at a time. Assume that we’re beginning by translating Japanese to English.

What problems do you foresee in programming this device? What caveats do you include in the user manual?

There was serious project to do machine translation using an intermediary language as a nexus. That way, you only need to provide modules to connect the input/output languages to the intermediary language, rather that having to provide for all possible combinations of the input/output languages. Reduced the complexity enormously.

I opened this thread intending to suggest this approach - it’s a very sensible one, and works well for other kinds of translations too (for example, conversions from imperial to metric).

One of the biggest problems with designing a universal translator is the detection and proper handling of idiom and imagery - many languages use them, but not all in the same way and in the same place, and for which I do not have any proposed solution.

Yeah, that one’s a brain-breaker. Consider the situation where you arrange to meet someone, and the other person doesn’t show. In English, we might say, “He stood me up.” I have heard that one idiom in French is “Il m’a posé un lapin.” (“He has given me a rabbit.”) Imagine translating from the French idiom, keeping track of the degree of casualness, etc, finding the root meaning, putting that into the bridge language, coming out the other side into English, then taking note of the degree of casualness, and picking the right English idion.

And that’s ignoring larger-scale issues like overall tone and style.

Am I understanding you to mean that in my Japanese-to-English example, the UT would actually translate Japanese-to-French, say, and then French-to-English? Why does that make it simpler?

It’s theoretically simpler to use an intermediate language if you plan to use the translator to translate any of a significant number of languages into any other of those languages. Instead of 2^n possibilities for translation you have 2xn possibilities. However, there would be the problem of determining an intermediate go-between that didn’t cause information to be lost during one or more of the combinations.

“My hovercraft is full of eels.”

Stranger

I have read that Turkish, due to its fairly strong simplistic structure would be a good candidate for the intermediate language. As I can only count to 10 and use a few basic words/phrases in Turkish, I am not sure how true this is.

The problem is that any computer we could program today, regardless of memory and processing capacity, is essentially a blind robot parrot, with no innate ability to comprehend the intended meaning of what’s being said. It isn’t always easy to translate subtle nuances between languages even when a concious human being who fluently speaks both languages is doing the translating. I suppose it could be done if you had enough processing power to take a brute-force approach: have a ginormous lookup table of practically every valid combination of words in both languages. But personally I think it would take something close to true artificial intelligence to work properly.

“I will not buy this record, it is scratched”

“Drop your panties, Sir William; I cannot wait until lunchtime!”

Picard: “Number One, did that Klingon commander just get fresh with me?”
Riker: “I recommend Evasive Maneuver Epsilon, followed by a full array from the forward photon torpedo launchers.”
Picard: “Make it so.”

Stranger

A Wicked Deception

A YouTube clip where actors act out a little skit that has been run through a few machine translators. It gets funnier every time I watch it.

2^n ? Don’t you mean n*(n-1)/2 ?

This is clearly the same process George Lucas uses to filch dialog from other films into his Star Wars screenplays.

Stranger

Sorry - I had a brain fart (I wonder how that would fare through a universal translator :slight_smile: ?). What I intended to say was n^2. If you allow for a null translation (e.g. english to english) then you can pick any of your n languages and translate it to any of your n languages. Hence n^2. If you are ignoring null translations then it seems to me it would be n*(n-1). I don’t think division by 2 enters into it. Translating english to french is different from translating french to english.

I think some kind of specially constructed artificial language would probably be better - it need not even be capable of being spoken or written with ease by humans.

The increasing trend in AI nowadays is that we’re beginning to solve hard problems not with clever algorithms but with massive amounts of data. Google translate is a good example of something from within this paradigm.

D’oh! You’re right, of course. (I was still closer.)