Can LLMs (large language models) be used to decipher ancient languages?

I don’t totally understand how LLM models are created/trained, but if they are largely statistical analyses of large corpora of written text, would you be able to train a model on a bunch of undeciphered ancient documents and have it tease out some level of meaning, grammar, etc.?

And then, I suppose, whether it could then translate that into an extant human language would be a separate question? Even if the LLM itself learned to speak in that undeciphered language, there may be no way to prompt it in English…?

Or if you trained a model on both the undeciphered language and a large English corpus at the same time, would the model be able to infer statistical relationships between words & concepts even if there’s no existing prior translation between the two?

No. You need some sort of “Rosetta stone” or some sort of context clues to start the process.

Key here is “large corpora” and “a bunch of undeciphered documents”. We don’t have that for unknown ancient languages. Our entire corpus for Linear A, for instance, is only about 7000 characters, and most of that only a few words at a time.

ChatGPT gave a pretty detailed, interesting, and readable response to this question: ChatGPT - LLMs and Ancient Scripts. It’s a fun read.

For what it’s worth, it largely agrees with you, with the caveat that it doesn’t mean LLMs are altogether useless in computational linguistics (not that you ever said that). (EDIT: Actually, I don’t think these reports are meaningfully differentiating between LLMs and other ML and computational linguistics techniques. Sorry about that.)

It references a few studies and reports, like An ancient language has defied decryption for 100 years. Can AI crack the code? - Rest of World and Translating lost languages using machine learning | MIT News | Massachusetts Institute of Technology, that discuss how LLMs and machine learning in general can be used to assist humans in decipherment, such as by helping with:

Many of these AI assisted translations are using transformer model technology which is the basis of LLMs as well but they are not LLMs themselves and a general LLM like ChatGPT would be a worse tool than an AI specifically trained for language translation. Earlier translation work was done using convolutional neural networks and recurrent neural networks though I think transformer models have largely taken their place. In a more general sense, Deep Learning algorithms are being used and those form the basis of what we generally think of as flexible “AI”.

So the same general technology as LLMs is being used to decipher ancient texts (though not from scratch with no ‘key’) but they’re not using major commercial LLMs as most people interact with them.

Theoretically, if we had tons and tons of Linear A texts but no Rosetta Stone of any kind, I’d expect that an LLM could be trained to talk back and forth with a Linear A fluent person, but not to translate, no?

With a large enough corpus but no Rosetta Stone, you probably could even work out translations, eventually. But it’d be a lot easier with dialog.

I was recently tinkering with something on a tangentially related topic out of curiosity, and learned a bit about how LLMs “translate” that might help shed some light on the subject for you.

Specifically, after a recent trip to Rome, I’ve been reading (and re-reading) some history. I noticed that the same source material happened to be covered in two of the books, but the Latin was translated slightly differently. Obviously, translators always have some discretion; there’s no such thing as a “pure” rendering from one language to another, because languages don’t map directly onto one another, so the translator is obligated to make interpretive choices, trying to convey the sense of the original text.

In any case, to make a long story short, I started fiddling with chatbot translations from Latin to English (using ChatGPT, Gemini, and Grok, to compare the outputs). I asked the 'bots to include parallel or embedded annotations about subjective interpretation, why each one it thought its chosen rendering was favorable, and other possible translation options.

But where it got really interesting was when, again out of curiosity, I started asking for English-language texts to be translated to Classical Latin. One of the 'bots (I don’t remember which) included some caveats, where it said this task was challenging, and cautioning me against relying on the output. I dug into this a bit, and based on the responses, I asked similar questions of the other 'bots. When confronted directly, they all confirmed the difficulty of the task, on the same basis.

What is that, specifically? It’s the fact that there’s a very large corpus of Latin-to-English translations, but an almost nonexistent corpus of original English writing having been translated to Classical Latin. And if you think about it, that makes perfect sense. Lots of English speakers want to read, say, Cicero, or Plutarch, and there’s no shortage of alternative translations to choose from. But basically nobody wants to read, say, John Grisham in Classical Latin. So there’s essentially zero material to provide translation references in that direction.

And that exposes something really fundamental about the way the LLMs “translate.” They don’t “understand” either language, in the sense of rendering meaning from one to the other. Instead, they have a big statistical model which supports probabalistic transformation. When it sees “Romani ite domum,” it has enough comparative examples of how that phrase is rendered to English to derive the translation “Romans go home.” But it doesn’t know that “Romani” means “Romans,” it doesn’t know that “ite” is imperative, etc etc. Instead, it’s able to map the Latin to English because it has lots and lots and lots of references that allow it to model the transformation.

Comparatively, though, it has no meaningful corpus to model the transformation the other way. If you give it a sample of English text that has never been translated to Latin (say, “It is a truth universally acknowledged…”), it will take a stab at it, making its best guess, and the output will probably be reasonably comprehensible to a Latin reader. But will it be good Latin? Will it have been translated in any meaningful sense? No, not really, for (I hope) obvious reasons.

Go ahead, try it yourself. Ask the 'bots about translating from Latin to English, and the other way around. Ask them to explain why the former is easy and the latter is hard. Take a short English excerpt like the Austen above and ask the 'bot to render it in Latin, including annotations about challenging phrases and alternative renderings. It will become quite clear, very quickly, how the 'bots are actually handling this kind of task, under the hood, and why an English-to-Latin translation request is so difficult, just from the standpoint of a system that requires a significant foundation of training references to construct its predictive model. (And this is completely aside from the subjectivity involved in translating to Latin specifically, with its moods and other expressive forms that require the writer to know what they’re trying to say.)

Anyway, I thought this was an interesting insight into the operation of these LLMs, which I think makes obvious why they would be essentially helpless trying to work on a “dead” language that has no existing translation references.

This is very important and I fear that most of the people that rely on LLMs, thinking they really are AI, don’t understand this simple fact.
You can use LLMs for some tasks, but you can’t rely on them as being authoritative with respect to humans, at best you are getting regurgitated human input that may or may not be correct.
But the LLM doesn’t understand,it is simply a machine.
I sometimes visit twitter to see how the old place is doing, or to get some input about sports (there seems to be very little about that in Bluesky) and when I don’t flee in panic at the racism and fascism in overt display I have to close it in disgust at the inevitable appeal to “Grok” (the twitter AI) to explain something, to prove or disprove something, etc.

Yeah, I think it must be theoretically possible to just bruteforce the translation if you had enough samples and unlimited time and resources - assuming the language can be tokenised in some way - assume a trial meaning for a small fragment and observe what that infers about the meaning of other samples that share some of the same tokens.

I don’t imagine this would be any kind of practical method and any kind of context would help, but in theory, the rest of the corpus as the context, plus unlimited trial and error, might stumble on the translation before all the stars burn out.

The thing that amused me the most about this little experiment is the fact that the 'bots understand (for lack of a better term) their own limitations well enough to acknowledge that this is a task for which they’re poorly suited and that they’ll be bad at, and yet they’re programmed for compliance and cooperation so they all go ahead and give you the crappy output anyway. :laughing:

OK, I LOL’ed. :slightly_smiling_face:

Also wondering how it would translate “Romanis eunt domus.”

Yes, that would be a Linear A room. We could know with confidence that 17% of the time the word glengwoop is closely placed to the word bizildabop and 9% of the time to the word teebleweeble but still never know what a glengwoop actually is if we never find a primer with the entry “g is for glengwoop” with a little drawing of a fig.

You can say that… but can you demonstrate that humans do understand, and that we’re not just doing the same thing the computer does?

And there’s another reason that translation from English to Latin is difficult, for humans and computers alike. There are a lot of concepts in the modern world that simply didn’t exist, in the time that Latin was an active, living language. You can, of course, find ways to express those ideas, but they’re usually pretty clunky: You could call an airplane a “flying chariot”, for instance, but it’s a little awkward… and even more so if you need to distinguish between airplanes, helicopters, and blimps. Or you can invent new Latin words for these new concepts, but given that no Roman ever used those words, what’s to say your new word is the “right” one?

This is not a useful question. A useful question would be whether an LLM possesses the same level of understanding of the world as a human, and the answer is pretty clearly “no”, because the LLM will produce a lot more hallucinations and false inferences than a human with expertise in a comparable area. It has no intellectual habits like curiosity, skepticism, uncertaintly, or productive discourse.

I know this because I frequently interact with CoPilot to generate code, and it routinely makes errors and incorrect inferences, because it doesn’t ask basic clarifying questions, because it isn’t aware of its own limitations. This requires me to craft massive prompts to frame the question in exactly the way I need, but by the time I’ve done this exercise, I’ve already done most of the heavy lifting.

It’s alright for helping me figure out how to figure out how to express certain idioms and patterns in languages I don’t know, but it’s not much help figuring out the right problem modeling, let alone the idioms to express it, simply because it doesn’t care if it’s ignorant, it possesses no apparent ability for higher-order problem modeling, and doesn’t have the intellectual ability to address its ignorance, because an LLM has no intellectual habits at all.

Depends on the meaning of “understand”
I can demonstrate that some human translators know what Rome is, what an imperative is and how it’s used.
I cannot demonstrate beyond all doubt, but I think most people would agree with me, that humans “grok” things (in the original Heinlein meaning of the term) where LLMs don’t.

Others have already mentioned it, but if the machine has no mapping between English and Language X, it can’t produce a translation.

I do think that if an LLM is trained on Language X, on a sufficiently large corpus, and it’s provided with a small & incomplete mapping of Language X to English, it would do a surprisingly good job of helping complete the known mappings. It might be a very helpful accelerator to deciphering a language from a small amount of text. So the concept probably isn’t entirely without merit, but it would need considerable massaging.

I don’t think it’s controversial to say that the current AIs have a lesser degree of understanding of the world than humans do. If nothing else, humans have a huge advantage in multimodality. But there’s a difference between saying that we have a greater degree of understanding, and saying that what humans have is understanding but what LLMs have isn’t understanding at all.

The computers will give reasonable, cogent answers if you ask them “What is Rome?”, or “What is an imperative?”, including pictures of the city, and examples of use of the imperative. What other sorts of tests for those things did you have in mind?

I’m failing at explaining my point.
You are correct in your objections, but there’s something I can’t seem to describe that is missing, that may be a symptom of my ideas being incorrect, I’ll have to think more about it.

In machine learning in general, there are two major categories of how things are trained, supervised and unsupervised learning. For most things, such as LLMs, the computer needs a “Rosetta stone” as @Darren_Garrison mentioned. Supervised learning, in a nutshell, is labelled data, such as “this picture is a cat” or “this hieroglyph represents the evening”, while unsupervised is where you throw a ton of data into the pot, but none of it is labelled. Unsupervised learning does limit the types of operations that you can perform. You can do things like clustering, where this belongs in group A, that belongs in group B, etc., but the model won’t tell you what group A and group B represent.

There are also offshoots, such as “one-shot learning”, where a model can “learn” with greatly limited data, but that is considered a subset of supervised learning.