Can LLMs (large language models) be used to decipher ancient languages?

Human brains produce conceptual models of the world that are in essence a linkage of highly abstracted associations and expectations. This is why a person can recognize a new type or class of object with only a handful of experiences from which it can infer other associations that have not been observed, while an LLM or other large data model needs many, many sources of information (text, images, video) to make a comparable set of broad associations. For instance, if you hand someone a kind of fruit that they’ve never seen before but explain that it is 'like an orange, only it has skin like an apple and tastes like a grape" a human can put these associations together into a fairly accurate and robust new mental concept which will at least be an approximation of the actual experience of touching, smelling, and tasting the fruit. LLMs ‘understand’ the world in the sense that they encode explicit association and expectations within their language model but many of the errors that are observed from LLMs comes from a lack of ability in implicit associativity that a human would recognize immediately. Being innately ‘multi-modal’ and experiencing the world in an embodied fashion certainly aids in generating more comprehensive set of models than LLM-powered chatbots can, but that isn’t the fundamental problem with weak and expensive entrainment of conceptual models.

There is a school of thought within developmental linguistics that the intrinsic development of language is not tied to communication but rather to facilitate more rigorous conceptualization; language comes about in small children to make explicit mental connections between concepts and needs, and only secondarily provides the ability to express these in shared vocabulary. Of course, small children get along fine for the first few years of their life with minimal vocabulary and an at-best shaky grasp of syntax and grammar, and don’t fully master the fundamentals of language for several more years of development, and humans not raised in a shared language environment will not master language in adulthood but still express language-like vocalizations. LLMs appear very impressive to us because they immediately jump to a competent use of language by design (and through the brute force approach of learning through a massive corpus of text which is far more than a person could read in many lifetimes), and because of that we tend to infer other mental capacities even though there is nothing going on in an chatbot that even roughly approximates human (or animal) cognition.

Stranger

Yup, and that’s also how LLMs work.

If you disagree, then propose a test by which one can distinguish what a human can do from what an LLM can do.

This is similar to questions about whether other animals are “sentient” or have “consciousness” or speak “language.” We assume that humans do, along with understanding, because we are the default we self-define with those capacities. Trying to isolate them to apply to other species or machines has been unprofitable; few can give coherent definitions or meaningful tests.

Long and often contentious threads here have centered on the meaning of words. Does that mean that some of us understand them and some of us don’t? Almost certainly not in most cases. (Certainly! in a few.) Arguing probably corresponds to some definitions of understanding, because a good argument does not involve dictionary definitions easily accessible but a range of nuances and contexts and histories and an ability to interpret the contrary allegations by the opponent and respond with original thoughts. I haven’t heard of AIs debating one another in this fashion, though surely someone has done it, so I don’t know what that outcome would be like.

Just like AIs’ inability to answer questions with a low corpus, the issue may be that it’s too early to make definitive statements about their capacities. Or we may need a different definition about understanding, just as machines can detect the color red without knowing what a color is. Is understanding therefore as critical as we self-centered humans believe it to be?

This is absolutely not how LLMs work. LLMs make a very large number of explicit associations between tokenized elements of their textual corpus, and then use Bayesian statistical methods and backpropagation to analyze refine those associations to produce a model that can produce a cromulent response to a query. They use language (at least, the English language) very well because the grammatical and metasemantic rules of language emerge from such an analysis, and are able to making ‘weak’ associations between fairly rigorous concepts but have struggled with metaphor and abstractions without increasing the volume of training corpus and extent of backpropagation during training.

The difficulty with making a test that can objectively and quantitatively “can distinguish what a human can do from what an LLM can do” is that the only way we have of interacting with an LLM-powered chatbot is thought textual language (or converted speech-to-text), and LLMs are very good at manipulating text in typical ways that it is used by dint of their large artificial neural network (ANN) of associations based upon their corpus, much larger (but not as complex has neural connections in the human brain and internal ‘processing’ done within neurons). We can see anecdotal examples of how they often fail at basic comprehension in ways that no mentally competent human would ever err but as advocates point out that have models have become more sophisticated (i.e. more text fed into them) those lapses become less evident, in no small part because training methods are modified in ad hoc ways to address those deficiencies.

But humans are general purpose reasoning machines, not ‘language models’, and are capable of making deep and highly abstracted conceptualization from sparse examples. If you took a naive LLM and trained it only using a corpus of text equivalent to the information that a human has in the first 18 years of life, it would not respond as well as a high school graduate if indeed it could master language at all.

Stranger

Which is just a restatement of “produce conceptual models of the world that are in essence a linkage of highly abstracted associations and expectations.”

No, it quite explicitly is not.

Stranger

I think that was what I was trying to say, human beings have a “model of the world” running inside their heads, it may be more or less correct but it’s there.
I remember an early SF story where a robot drives a spaceship, on getting the order to “get to $Destination as fast as possible” it takes off at such acceleration that it kills everybody onboard, a human pilot would understand that such an order is lethal because their internal model of the world would tell them so (hopefully), an LLM has no such thing.

This is part and parcel of the alignment problem; that you give some overarching goal or directive to an AI with the expectation that it will understand (and not subvert to its own ends) all of the implicit directions that are fundamentally value judgements like “Don’t kill your passengers even if it makes you late.” In symbolic AI systems these get codified in explicit rules about how the system is supposed to work but for any real world task these become very complicated and often result in conflicting requirements that have to be adjudicated (as anyone who has ever written a process specification knows). Humans do pretty well in terms of figuring out basic unwritten rules for such goals because of their highly abstracted concepts are understood within the larger overarching context of “don’t get people killed” until a system becomes really complex or behaves in unexpected ways.

Getting any AI trained using Bayesian inference (not just an LLM although those are the most complex systems that currently exist) is always going to have significant gaps in its inference because it doesn’t have a way of explicitly codifying such rules in its training, and the current approaches are to apply some kind of post hoc filtering to make sure that any action or command doesn’t violate those rules but that becomes problematic in real world interactions because of the speed of decisions and ability to interpret the consequences of actions through a filter. In theory, if you run the system through enough simulations it will ‘understand’ how to make decisions like a human would make value judgements, but the threshold to which an AI would make ‘better’ appropriate value judgements of complex scenarios is doubtlessly very high and not easily determined.

Stranger

Can a human learn a language just by examining a sufficiently large corpus and with no Rosetta Stone? If a human can I see no convincing argument that an AI could although not necessarily an LLM.

As for my first question, I will report a story that a good friend of mine told me. I found it hard to believe, but what do I know. My friend knew someone who decided to test that question. He took a language that he didn’t know anything about (not IE), but that there was a large corpus but written in the roman alphabet so he didn’t have to learn a strange script: Hungarian. He got a lot of text, novels I think, and started working on it and, according to my friend, succeeded in learning the language.

Human language-learners aren’t working from just text, though. There’s things like pictures, and other people pointing at things, and other people demonstrating things, and facial expressions, and all of those help. It’s also not a static dataset: Other humans interact with the learner, and let them know when they’ve got it right, and when they’re getting close, and when they’re kind of vaguely close-ish.

In order to answer this, can I ask whether you are monolingual or multilingual? If you are multilingual, do you have more than one native language or did you learn other languages as a teen / adult?

I can tell you, from teaching experience, that people who are natively bilingual have real difficulty translating the words and / or the grammar, and have to go by their understanding. They are certainly not doing what a machine does.

The successor to the aforementioned Linear A, Linear B, was rather famously decipher ed without any sort of Rosetta stone. I don’t doubt that some ML system could make the same inferences.

Native bilingualism (which occurs in households where parents speak different native languages) is actually a really interesting phenomenon from a developmental standpoint because children often grow up speaking a mishmash of the two (or more) languages. This isn’t seriously problematic when both languages are relatively closely related (both Romance languages or Punjabi and Sindhi, et cetera) where the grammar is essentially the same and the child just mixes vocabulary, but when they are languages with very different syntax and grammatical construction it can cause measurable delays (but not permanent impairment) in competent language acquisition, and translating between them is tricky because there is some kind of mental ‘code switching’ that occurs that makes it difficult to directly convert thoughts in one into words of another, whereas translating from a native language to one learned later in life or vice versa is more straightforward (assuming proficiency in the non-native language). It would be an interesting experiment to train an LLM on a combined corpus of text from two different unlike languages without distinction and see what it does; I’m guessing that it would produce a mishmash of vocabulary and synthesize some hybrid of syntax and grammar that would be odd and perhaps unintelligible to people proficient in either language.

Stranger

As I said in a post on another thread:

Interestingly, it’s easier for a child to learn starting from birth several languages. I recall asking an expert in this at one point and being told that a child could learn three languages starting at birth and end up sounding like a native in each. It’s necessary to have a native speaker of each of those three languages.to talk to a lot. So a person could have a mother, a father, and the neighborhood kids who each speak to them frequently in one of those three languages. As they grow up, they will sound like a child of that age who’s a native speaker.

Just for clarity, that isn’t the case in any of the examples I am thinking of. Rather, all of these people learned English and Welsh (or other languages, but I have the most experience with Welsh) in different domains, for a variety of different reasons, and usually were monoglot until the age of five or so. The two languages have very different syntax, but I didn’t know any of these people as children, so I can’t speak to that. I do think it’s one of the reasons why they have difficulty translating, and why AI isn’t very good with this language pair (so far), either.

Well, to be fair to the LLMs, there isn’t a very large corpus of text in Welsh, and it is quite a difficult language to learn coming from English.

Stranger

I claim that the machine translator is doing what the natively bilingual person does. It is reading the input, forming a semantic model of what is being described, and then describing the semantic model in the other language. The internal state of the LLM is a representation of the situation being described by the input, not just the input itself.

The reason these things arose out of translation models is because to properly translate text, you need to have a good semantic model of the situation being described.

LLMs do not have a ‘strong’ (conceptual) semantic model. They have a ‘weak’ semantic model basic upon a large array of textual associations but as demonstrated by @Pasta in this post and this post it makes basic errors and often just fills a response with ‘word salad’ when a subject is too complicated or highly technical to be represented by high frequency word and sentence associations.

Stranger

I don’t have a cite but my experience is very different from this. Kids with one Chinese native speaking parent and one English native speak parent (in China), were perhaps on the slower end of starting to speak but quickly matched or exceeded their mono-lingual peers. This did require that the English speaker spend significant time speaking English exclusively with the child.

In my kids case, at 2 or 3 years old. If it was a “grandparent” type, they would speak Shanghaiese dialect (unintelligible to Mandarin speakers but a reasonably close language), Mandarin to younger Chinese, and English to a Caucasian. And code switch between all three at the drop of a hat, and never had issues nor delays translating. It wasn’t just my kid, but many kids in my circle. The code switching was natural, automatic, and included mannerisms and cadence to match the language being spoken.

Back to the OP, LLM’s are simply pattern recognition engines based on a huge amount of data. There is zero experience, and zero intelligence. Therefore, LLM’s can recognize patterns in ancient language scraps, and those patterns may (or may not) be useful to decipher and translate. As pointed out upthread, training an LLM as a decipher engine should improve the results considerably. (Think garbage in, garbage out that a standard LLM has for the pattern recognition). On top of that, there needs to be “enough” of the ancient language to provide enough patterns to be useful.

Out of interest I asked ChatGPT whether AI could have cracked Enigma. The answer was yes and much faster than the humans at Bletchley Park, even without training data because it could:

  • Guide brute-force searches more efficiently (e.g. reinforcement learning),
  • Spot promising decryption paths,
  • Evaluate likely plaintexts based on linguistic models — something Turing’s team had to do manually.