Can languages be compared mathematically?

iiandyiiii · April 29, 2018, 8:55pm

By this I mean can two languages be stated through some methodical comparison to be X% or similar common? If so, how is this done?

It came to me that one possible method of comparison would be the following (and maybe this has been done, but a minute of googling didn’t reveal it):

A survey of a large number of monolingual people - for example, 100 monolingual French adults are given a simple multiple choice test in Spanish, along with 100 Spanish adults in French, and the average score is tallied. It wouldn’t give a percent value, but it could give a value that could be related to other 2 language comparisons, such as French and German, so that we could say French and Spanish are closer together than French and German by a score of x to y on this comparison test.

Is anything like that already a part of comparative linguistics?

Voyager · April 29, 2018, 9:23pm

I don’t know if anyone has done this mathematically, but it should be able to compare the words for similar concepts across languages. Many languages share common roots (like French and Spanish) so you could get some level of similarity. I don’t know how such a metric would handle languages such as English with many near synonyms from the influence of many languages and how loan words would be counted.
I think that would be more precise than tests.

Dr_Paprika · April 30, 2018, 1:58am

There are lots of ways to quantitize language, but this approach isn’t that useful.

Comparative linguists already know similar languages on the basis of root words, proximity, history and percentage of polyglots in a given area.

Finding Spanish is more similar to Portuguese than Sanskrit would require an awful lot of such tests to compare many languages. And the results wouldn’t necessarily be straightforward to interpret. If you could get 25% by guessing, what do scores of 29%, 31% and 33% mean in different comparisons… anything?

jtur88 · April 30, 2018, 3:07am

I would say no, because language depends so heavily on a presumption that both speakers know the context and the cultural subtexts.

Here’s an example: The word “rep” can be used in conversational and even informal written English to be shorthand for reputation, representative, or repetition. In nearly every situation, your listener would know exactly which one is being simplified to rep, which a mathematical scheme could not decipher. With difficulty, it is possible to construct a scenario in which a clarification is needed, such as a traveling salesman who brings a trainee along, the farmer’s daughter doing two reps.

bibliophage · April 30, 2018, 3:36am

Linguists sometimes use a measure called lexical similarity. It is of limited usefulness for a variety of reasons.

septimus · April 30, 2018, 4:21am

I think OP is asking about comparing languages that are close cousins, e.g. are Spanish and Catalan likely to be mutually intelligible? Linguists also explore long-distance relationships, e.g. do Basque and Dagestan descend from a common Neolithic language? Using arithmetical measures to guess about such questions has a long controversial history. A very simple approach is to count cognates. Recently the same software used to derive phylogeny (taxonomy) in biology has been applied in historical linguistics to arrange languages into a phylogenetic tree.

A recent book, Language Classification by Numbers (which I’ve not read) explores this area. Reviews are mixed.

eschereal · April 30, 2018, 4:45am

Exactly this. You cannot effectively learn a language without learning the cultural substrate that belongs with it. Language and culture affect each other together, so a technical analysis of structural and vocabulary similarities would be inadequate and uninformative.

DavidwithanR · April 30, 2018, 7:57am

You can compare absolutely any things mathematically. A numerical answer always presents itself, if you make sufficient effort to look for it.

Not only could you quantify language differences - while you were at it, you could hire someone to itemize and compare [the way your parents raised you] with [the way your friends’ parents raised them].

The question in each case is, is that result helpful, valid for the purpose, statistically significant, and whatever else.

Being guaranteed an answer can lead to a fallacious line of thinking - it probably has a name but I don’t know the name - in which people far overestimate the value of definite answers, and underestimate the value of uncertain or partial answers.

clairobscur · April 30, 2018, 11:26am

iiandyiiii:

By this I mean can two languages be stated through some methodical comparison to be X% or similar common? If so, how is this done?

It came to me that one possible method of comparison would be the following (and maybe this has been done, but a minute of googling didn’t reveal it):

A survey of a large number of monolingual people - for example, 100 monolingual French adults are given a simple multiple choice test in Spanish, along with 100 Spanish adults in French, and the average score is tallied. It wouldn’t give a percent value, but it could give a value that could be related to other 2 language comparisons, such as French and German, so that we could say French and Spanish are closer together than French and German by a score of x to y on this comparison test.

Is anything like that already a part of comparative linguistics?

It seems that a higher level of mutual intelligibility doesn’t necessarily show that two languages are more closely related, even though it seems it should be the case.

If you look at family languages according to linguists, Spanish and French belong to the same group, while Italian doesn’t. And French would also be closer to Italian than Spanish is. However, in practical terms, Italians and Spaniards can understand each other much much better than either can understand French (or than French people can understand either Spanish or Italian). And if you study both languages, the high level of similarity is indeed obvious.

(I assume that this situation is the result of some significant switch(s) that happened in French while it didn’t in the two others languages, but I wouldn’t know).

ftg · April 30, 2018, 11:58am

A more humorous way of comparing natural languages was given by IBM researcher Arnold Rosenberg in 1978. Finally someone put it up online (PDF).

It’s based on Arnie collecting a lot of expressions of the “It’s Greek to me.” form from various languages.

While not so much statistical, it is graph-based which is very Mathematical.

(I first mentioned this paper here back in 2001.)

EdelweissPirate · April 30, 2018, 1:25pm

Computational linguistics is totally a thing, of course. Much of that field is devoted to automated translation and things like parsing natural-language questions/statements for intelligent assistants like Siri. But the field is a lot broader than that, and as far as I can tell (IANALinguist) there’s a fair amount of intersection between computational and comparative linguistics.

One example of that intersection is the paper discussed here:

http://languagelog.ldc.upenn.edu/nll/?p=3090

It compares the phonemic diversity of different languages to their geographical location and attempts to apply the result to confirm anthropology’s out-of-Africa hypothesis. It got a lot of press at the time. The anthropologists I hung out with back then generally thought it was a really interesting result, for what that’s worth.

filmore · April 30, 2018, 1:52pm

It seems like you could quantify the framework aspects of how a language is built and then quantify how similar two languages match for those components. For example, does it put adjectives before or after the noun (“the squishy ball” or “the ball squishy”)? Do the nouns have gender (le/la thing)? Do nouns have articles (“give me the ball” or “give me ball”)? Are the articles plural to match the noun (“give me les balls” or “give me the balls”)? It seems like something like that could allow such a comparison based on how many similar constructs they shared.

Ruken · April 30, 2018, 1:57pm

Language tree models are (these days) constructed mathematically. They don’t handle continua well, so while you could see that French and Spanish are more closely related than either is to German, they’re not going to tell you much about the degree of relatedness between individual modern Romance languages.

eschereal · April 30, 2018, 4:28pm

How about comparing one language? The “Chinese Language” is almost entirely consistent in structure and semantics across a whole spread of mutually unintelligible spoken dialects. A person from Shanghai cannot verbally communicate with a person from Hong Kong or Shaoshan or Beijing in their native dialects, but the written form works in all parts of the country (with some minor variations). Chairman Mao had to have an interpreter in order to communicate with his staff – and it seems a tiny bit odd that he supported the standardization of the Han dialect nationwide when it was not even the language he spoke.

griffin1977 · April 30, 2018, 4:34pm

Much of what we know about original Indo-Europeans (the people whose language a massive percentage of the languages in both Europe and India) are based on this kind of analysis. We know almost nothing about them from a traditional sources, so instead we can analyse them based on the words that are shared between all the Indo-European languages. For example words relating to bees (and mead) exist in the similar form in most of the languages, so we know they come from somewhere with bees.

septimus · April 30, 2018, 9:58pm

Some syntactical elements, e.g. word order, are not particularly useful at guessing either filiation (Old English and Middle English have different word orders) or comprehensibility (word order is one of the easiest-to-learn aspects of a language).

Yes; note that mead was associated with religious rituals in ancient India, ancient Greece, and ancient Ireland. Words associated with wheeled wagons help locate Proto-Indo-European in time—and some of these words are missing from Hittite, as predicted from other clues that Hittite’s subfamily split off first, before the invention of the wheeled wagon.

One of the more famous inferences from a Proto-Indo-European word relates to *bhagos (beech tree). The absence of beech from most of the East European steppes was used as evidence that that was not the PIE Homeland… But then paleopalynologists (specialists in fossil pollens) determined that thousands of years ago beech trees were present at least as far east as the Don River!

hibernicus · April 30, 2018, 10:41pm

Absolutely, and such analysis leads to the surprising result that neighbouring languages often share many grammatical and phonological features even if they are not closely related. Sprachbund.

Chronos · May 1, 2018, 12:22am

The OP’s test wouldn’t be very useful. Greek is an Indo-European language, but most native speakers of other Indo-European languages couldn’t make heads nor tails of a Greek text, just because it uses a different alphabet. And I imagine that similar difficulties would exist between, say, Arabic and Hebrew (both semitic languages). But most Japanese speakers would be able to get at least some of the gist of written Chinese, because one of the Japanese forms of writing is related to Chinese (even though the spoken languages aren’t related, other than loanwords). And a Spanish speaker with a Portugese text, or vice-versa, would probably be able to understand almost all of it; it would just look like bad Spanish.

Tom_Tildrum · May 1, 2018, 12:52am

I can’t watch Trainspotting without subtitles.

Ruken · May 1, 2018, 1:01am

The Chinese Language is a political fiction.

Topic		Replies	Views
Romance languages Factual Questions	39	1489	November 7, 2003
Languages That Are Mutually-Intelligible. Factual Questions	84	4663	July 20, 2018
What's the most common language ever? Factual Questions	43	2495	March 31, 2008
Linguistics - is there a supreme language? Great Debates	127	5620	January 5, 2004
Mr. X says Spanish and Italian are the same language! Factual Questions	77	20021	August 26, 2012

Can languages be compared mathematically?

Related topics