Language families

Does anyone know how languages are classified? I’m in China, at the moment, and I know that Mandarin, having a tonal structure, is different from English, but I’m not sure how those differences are measured. That said, which languages are the most used, from the categories that exist, and how many world languages are there, in total?


An excellent source of information on languages and language families is Ethnologue

In really general terms, one can look at language classification as akin to biological taxonomy.

Dialects are like varieties and subspecies, which are grouped together to form languages. The distinction between dialect and language is slightly nebulous, but includes mutual intelligibility and lack of separate national history and literature. E.g., Australian English is a dialect of English because it is mutually intelligible and quite obviously derived from British English. Dutch and Flemish, mutually intelligible, have different national standards and a long history of separate existence with different literature, and so are usually considered separate languages.

Languages are grouped into families where close connections of vocabulary, grammar, and syntax are obvious. For example, English, Dutch, Flemish, and German are in the West Germanic subfamily of the Germanic family (as opposed to the North Germanic tongues like Danish and Swedish).

Families are in turn grouped into phyla (singular, phylum) on the basis of more distant relationships. The Germanic languages are grouped with the Romance languages (French, Spanish, Italian, etc.), the Slavic languages, the Celtic languages (Irish, Welsh, etc.), the Iranian languages (Persian, Tadzhik, etc.), and the Indic languages constituting most of the languages spoken in India, Bangladesh, and Pakistan as the Indo-European Phylum.

The term “language stock” is sometimes used, with varying meanings depending on the preferences of the anthropologist or comparative linguist using the term.

Some of the major language phyla are:

[ul][li]Indo-European – A large group of major world languages, spoken in Europe and South Asia. and having spread from there to a lot of places, including the Americas and Australia[/li][li]Sino-Tibetan – Chinese languages and a lot of smaller East Asian languages which people without a special interest in the area are not familiar with[/li][li]Afro-Asiatic, formerly called Hamito-Semitic – Arabic, Hebrew, most of the now-extinct languages of the Ancient Middle East, and a large group of languages throughout the northern half of Africa, including Tuareg, Amharic, and Somali.[/li][li]Finno-Ugric and Altaic, which may or may not be joined as Ural-Altaic are two other groups of some importance. The first includes the Finnic family – Finnish, Estonian, and smaller relatives – and a group comprising Hungarian and two West Siberian languages. The second includes the Turkic languages of Turkey and the steppe republics, along with Uighur (the language of Sinkiang); the Mongol languages; and the Tungus languages such as Yakut and Manchu.[/li][li]Niger-Congo – Includes all the Bantu languages of Africa from Gabon and Kenya south to the Cape, and several smaller families across West Africa. Ibo, Fulani, Mandingo, and Ewe are examples of the latter.[/li][li]Malayo-Polynesian – A very broad though thinly stretched group, it includes Malagasy, Malay, Indonesian, Maori, the Filipino languages such as Cebuano and Tagalog, and most of the languages of the Pacific islands[/li][li]Dravidian – Geographically small but important, the Dravidian languages are spoken in South India and Ceylon, with a few outliers in northern India and Pakistan.[/li][li]Uto-Aztecan – Native langauges spoken from the American Southwest through most of Mexico[/li][li]Athabascan – Native languages of much of Western Canada with groups spread out from there[/li][li]Macro-Siouxan** – Includes the Iroquois and Sioux languages of America, with a quite widespread original territory[/li][li]Andean-Equatorial** – A lot of the major Indian languages of South America are in this group: Guarani, the co-official language of Paraguay; Quechua, the language of the Inca Empire; Aymara, common in Bolivia and southernmost Peru; Tupi, something of a lingua franca among the Indians of much of Brazil. Arawak also belonged to this group.[/ul][/li]
Many languages are members of much smaller phyla or language isolates with no clear relatives. Georgian (in the Caucusus, not around Atlanta) is a member of the small Kartvelian phylum; Basque (famously), Korean, and Japanese are examples of isolates.

Not quite as rigid as biological taxonomy however. Wasn’t it said that a language is a dialect with an army?

The other difference is that taxonomic groups generally have a single ancestor, because descent is governed by the DNA that you get from your parents. With languages, descent can be quite complex. For example, English is a Germanic language, but even the Germanic ancestry is complex, because of multiple invasions of people speaking different Germanic languages into Britain. However. English has been deeply influenced by French – a Romance language, coming from Latin, but a language which in turn has been deeply influenced by Germanic invaders of France. So English is not a pure Germanic language, and French is not a pure Romance language.

The Indo-European language group, which includes Hindi, English, Greek, Kurdish, French and around 440 others, has the most speakers at about three billion. Italian may look nothing like Bengali today, but if you look at Latin and Sanskrit you can see how they might be distantly related. Sino-Tibetan has the second most speakers and about 250 languages.

The correct term is Austronesian (Southern Islands). Malayo-Polynesian is a subgroup within the Austronesian family. There are two subgroupings, the Formosan Austronesian languages, and Malayo-Polynesian. Austronesian is also considered as being the biggest family (and one of the most wide spread), as it has over 1,000 languages within it. The family stretches as far as Madagascar off of Africa to Easter island near Chile, up to Hawaii in the north and down to New Sealand in the south.

The origin of the family is thought to be Southern China, from there the speakers of Proto-Austronesian moved to Taiwan, and from there spread south into the Philippines, and out to the eastern Pacific and west into the islands of Malaysia, Indonesia, and finally into Madagascar (though, this isn’t a strict linear movement… there is some movement back and forth).

[li]Finno-Ugric and Altaic, which may or may not be joined as Ural-Altaic are two other groups of some importance. The first includes the Finnic family – Finnish, Estonian, and smaller relatives – and a group comprising Hungarian and two West Siberian languages. The second includes the Turkic languages of Turkey and the steppe republics, along with Uighur (the language of Sinkiang); the Mongol languages; and the Tungus languages such as Yakut and Manchu. [/li][/QUOTE]

Nitpick: Finno-Ugric is usually accepted as a single family, with Finnic, Volga-Oka and Permian languages on the Finnic side, Hungarian and West Siberians on Ugric side. Finno-Ugric family belongs to Uralic group with another, Samoyedic, language family, which includes several North Siberian languages. Although Finno-Ugric and Samoyedic are relatively close linguistically, there seems to be very little genetical relation, which suggests that Samoyeds may originally have spoken some other languages. In regards of ‘supergrouping’, it’s quite likely that Uralic and Altaic groups are close enough to see as relatives, but there have also been attempts to make connections between Uralic and Indo-European. However, it’s also possible that Germanic and Slavic families evolved at some point as speakers of old Finno-Ugric languages learned some Indo-European language such as Celtic.

This is correct, though according to some, Austronesian languages include four or even ten distinct subgroups. But even with the theory of ten subroups, nine of them are Formosan languages and the Malayo-Polynesian subgroup consists all hundreds of Austronesian languages outside of Taiwan.

I appreciate the corrections to my summary.

Short note to nitpickers: As I hope was obvious, I tried to give examples of relatively well-known languages rather than striving for completeness in my summaries of major phyla; you’ll note that I completely skipped Armenian and Shqipni, along with Dardic and the Hittite languages, in my Indo-European summary, for example.

I’m glad to know about Austronesian – I was vaguely aware that there was a relationship between the “aboriginal” languages of Taiwan and Malayo-Polynesian, but thought it was merely one subfamily of M-P (and a small enough one that I merely omitted it).

Regarding Ural-Altaic, I grew up interested in languages, and with it being “common knowledge” among those who cared that Finno-Ugric and Altaic were two major components of the same phylum, though the split was early and deep. The modern sense that this is not proven is a later development.

Note that the Altaic hypothesis really is still a hypothesis, and most historical linguists don’t unreservedly accept it. The Altaic hypothesis holds that the Turkic, Mongolian, and Tungusic language families belong to a single macrofamily. Language classifications are based on shared properties and on words that exist in each subgroup traceable back to a single root. There aren’t many root words that seem to be shared by all three language groups, and linguistic features like vowel harmony seem to be present among a group of languages that have had historical contact.

The difficult part is acertaining which features are due to contact and which to common descent; root words seem to be shared between Turkic and Mongolian, and between Mongolian and Tungusic, but there aren’t many candidates for sharing among the entire proposed classification. So it’s not really certain.

The Ural-Altaic hypothesis, linking the Uralic languages (including Finnish and Hungarian) to the proposed Altaic macrofamily, is far more tenuous yet. This is not a widely-accepted idea, though most linguists aren’t hostile to it. Again, it’s mostly based on grammatical features that are also explainable as the result of contact between languages, and are shared only by a few languages in each family.

Check for a very brief explanation.

From what i’ve read the two main subgroups are the Formosan languages, and then the Malayo-Polynesian languages. I was basing it off of what SIL has in their classification schema. Here’s what SIL says:

Formosan includes: Atayalic, Paiwanic, Tsouic
Malayo-Polynesian includes: Central Eastern, Unclassified, Western

SIL are notorious splitters (those who make very fine distinctions between languages.) For Formosan they list 23 languages, and 1,239 for Malayo-Polynesian. They’re so specific they list about 160 languages for the Philippines when the usual number i’ve heard is about 80 - 90 languages total.

Since we seem to have a few knowledgeable linguists about, has the “Amerind” hypothesis (i.e. that all Native American language phyla except Na-Dene and Eskimo-Aleut are really one large phylum) gained any credibility? Or do most linguists roll there eyes just as far back as they used to when you mention it?

Interestingly, Paul K. Benedict proposed that Austronesian, Tai-Kadai (which includes Thai) and Hmong-Mien families be grouped together into “Austro-Tai”. This would make it a very large family (and it’s not too inconceivable… or at least in the sense that they may have had early contact, being that Austronesian is originally from southern China).

An even bigger grouping would include Austroasiatic, creating the Austric family. This would include everything from Mon, Khmer, Munda, Thai, Vietnamese, Indonesian, Tagalog, Hawaiian, Maori. and would then become the biggest family. Of course there is not a lot of convincing evidence yet, but some believe this family has merit.

The only mainland Austronesian speakers are the Cham who have one of the earliest recorded writing systems among Austronesian peoples. They live in Vietnam and i think Laos as well.

I am an amateur at this, with no professional credentials whatsoever, but the strong impression I’ve gotten is that the Trans-Amerind Hypothesis (which I just coined to describe it, since I’ve never seen an actual name given it) is on a par with “Nostratic” as “quite possible, but definitely bluesky, and needing a great deal of work to adduce enough evidence to weight it as anything more than a hypothesis.”

It is, however, interesting that the American continents have the most isolates and language families – California having more unrelated groups than all of Africa, for example. Probably the relative lack of historical background regarding American Indian groups accounts for this – a great deal of “linguistic sleuthing” in the Old World depends on written records of no-longer-extant languages, which is in general not available in the New World.

The Cham are pretty much split between Vietnam (where the Champa Empire once existed; you can still visit ruins) and Cambodia. In Cambodia they’re also known as the “Khmer Islam.” Interesting religion… mix of Shi’a Islam via the Malay and the Hindu cultures that influence most of SE Asia, excepting the Vietnamese.

Re the italicized paragraph: Well, except for the Malay, who do have a large mainland presence in the old Malaya part of Malaysia. :slight_smile:

I find the complex of language families in Southeast Asia to be very difficult to comprehend – can you please summarize the family-and-phylum-affiliation of the major tongues of the Thailand-Indochina area, in the absence of proof of Austroasiatic?

Well, make that the last paragraph – I carefully coded in italics on that paragraph, forgetting that quotes are automatically italicized! :smack:

No, it’s got about the same credibility as all the revolutionary new language classifications do - zilch. Nostratic, Indo-Pacific, Amerind - it’s the sort of thing that can’t be proven, and barring great advances in linguistics, probably never will be provable. These claims are simply made without real evidence to back them up.

A lot of the proposals are justified using “mass comparison”, which purports to show a relation between two languages if enough words having even vaguely similar meanings and sounds are found. But these relations are incredibly tenuous, since the words are usually fairly dissimilar, and these correspondences don’t show patterns that would be expected. For instance, three languages from the same roots ought to in many cases have cognates shared among all three of them, and more distantly related languages ought to have some of these same cognates. But as far as I’ve heard, broader correspondences don’t seem to show up.

And real historical linguistics depends on the development of sound change rules that explain exactly how a word changed in the evolution of a language. Even fairly dissimilar cognates, such as the English “tooth” and the French “dent”, can usually be expalined on the basis of complex rules governing the changes that occured over time. These changes aren’t haphazard; generally, they’re quite regular, so correspondences can be set up: the Germanic ‘f’ (seen in the English “father”) corresponds to ‘p’ in the Italic languages (thus the Latin “pater”) - and these hold true for most or all of the words in each language.

These rules even allow for the reconstruction of dead languages. Based on the descendants found in daughter languages, the roots in the original language can be tentatively figured out. Thus, much of Proto-Indo-European, the language that English (and friends) descend from, has been reconstructed. Mass comparison yields no such protolanguage. It yields no system of regular correspondences between languages, no historical forms, no branching diagrams to explain the history of each language. Mass comparison is simply not evidence; it hasn’t been proven effective to the degree required for real scientific research.

These hypothetical families, Nostratic, Amerind, Indo-Pacific, and others, then, are laughable not because they’re impossible, but because there’s simply no evidence to suggest that they existed. Even if all languages are descended from a single fount, the Greenburgs of the world who are using this silly technique aren’t helping us find it. So while certainly any of these macrofamilies are possible, there’s nothing in particular to suggest that they exist. Believing in it is like believing in UFOs - sure, you can demand that the scientists prove you wrong, but they’re just gonna laugh.

In what way do you mean “Summarize”? Do you mean a rundown of similar traits, or a grouping of the families and the major languages affiliated with each?

The latter, although interesting stuff about traits would be welcome – the material I’ve read does a little on families, but not much, and doesn’t show how/if they are related to the larger phylum groups – and it omitted Vietnamese. I know that the Munda, Tai and Mon-Khmer families are there, but whether they’re connected with phyla like Sino-Tibetan or Austronesian isn’t clear, what other groups are represented, and exactly what major languages are in what families is likewise not very well explained.