Let’s say I want to collect one book in every language in the world, past or present. Each one must be a commercially-printed mass-produced book in its original language, not a translation, and must be a proper book, not a tract or pamphlet.
How many books might it be possible to place in such a collection?
You can’t, of course, actually collect a book in every language that’s ever existed. There are well over 6,000 languages currently existing, and many of them have never had a written form. There have been at least several tens of thousands of languages throughout history (and probably more like hundreds of thousands of them). Many of the languages which do have written forms have never had a book written in that language (although they may have had a translation of the Bible or some such).
As noted, most languages are not written. If you restrict yourself to written languages, you’re looking at around two hundred languages. So it’s a pretty manageable task.
Wiki says the entire Bible has been translated into 518 languages, and partially into almost 2,800.
Far fewer languages have any original written literature, let alone even one mass-produced commercially printed book.
I would guess offhand that most languages with more than 1 million speakers have at least some commercially published books. This table indicates there are about 400 of them, plus another 900 with more than 100,000 speakers.
It’s a figure I’ve heard John McWhorter use on several occasions. Coincidentally, I’ve been listening to his “Story of Human Language” on tape as I’ve been driving around this week.
He would seem to be using a different definition of “written language” than other sources, since his figure differs by an order of magnitude. The figure of 2,800 based on scriptural translations seems to give a minimum estimate.
Sources that I’ve looked at say that about 4,000 of the currently existing languages have a written form, although there are few to none books in most of those languages.
Incidentally, I wrote:
> There have been at least several tens of thousands of languages throughout
> history (and probably more like hundreds of thousands of them).
“Throughout history” taken literally means since there have been written records of anything. What I meant was more like since mankind has been able to speak anything like full-fledged languages, which probably goes back somewhere around 50,000 to 200,000 years.
I’m speculating but he probably doesn’t include transcribed languages.
Imagine you have a group of people living in New Guinea and they speak their own language. Let’s call it Nemo. Nemo is a spoken language only.
Then an American missionary shows up and lives with the group. He learns to speak Nemo and he begins writing down what people are saying. He eventually translates the Bible into Nemo.
But no native Nemo speaker can read that Bible. It’s basically a transcription of spoken Nemo onto paper using English phonetics and the English alphabet. Would you say that Nemo has now become a written language?
To further complicate the issue, the missionary is also an avid bird watcher. He keeps a journal in which he describes the birds he sees and writes down the calls they make. Now that he’s written down the calls the birds make, would you describe those bird calls as a written language?
Ok, understood, but that the number would be so low seems impossible to me. As I the reference I cited above indicates, there are something like 1,300 languages with more than 100,000 speakers. Groups that I am familiar with in that size range in Panama include the Ngabe (about 200,000-250,000 speakers) and Kuna/Guna (50,000-70,000). I am sure that many people in both groups are able to read and write their own languages, since when I have visited their homelands I have regularly seen signs or other information written in these languages. I have also seen books produced in these languages by the indigenous groups themselves. (Not necessarily “mass produced,” as suggested by the OP, but locally printed compilations of legends or other cultural history.) I would expect most groups with more than a few tens of thousands of speakers would be similar.
Irrelevant to the question, since the birds under no circumstances will be able to read the language. This is not the case with humans.
Just to extend a little on what I previously posted, many speakers of minority languages are going to be literate in the majority language of their area. The Ngabe and Kuna get instruction in Spanish in primary school, and most younger people can read and write it. Once you are literate in one language, it’s going to be easy enough to read your own language if it is transliterated into the same script. After all, even I can read Ngabe and Kuna in the sense that I can pronounce words written in the language sufficiently well for a native speaker to understand me (even though I don’t understand the words myself). I’m sure I would mispronounce many words, but a native speaker could surely understand a transliterated text in his own language if he were literate in the same script in another language.
I would guess that most languages with more than 10,000 speakers (maybe even smaller) would have some members literate in some other language who would be able to read their own language if it were transliterated.
But as has been pointed out, birds then sing in a written language, because a phonetic script can be used to represent “Oh sweet Canada Canada Canada”. which in the case of the white throated sparrow, is remarkable accurate. The term “written language” has a generally agreed upon definition among linguists, and that is, the people who speak the language have become to some degree literate, and would understand each other reading the phonetic transcription. Even the Inuit language of Arctic America was not recognized as a written language until Peacock in the 1960s transliterated it into the Roman alphabet, which has since been abandoned and now it has a unique alphabet.
The question is not about languages that can be represented in books, but have been.
(1) languages that have no written form
(2) languages whose written forms were current before mass-market book production (with the presumed exception of languages whose works have been re-issued in the modern era)
(3) languages whose written form only includes documentation by linguists / anthropologists etc.
(4) languages whose published books only include translations of other world literature such as the Bible or Harry Potter.
I’d say it should be fairly easy to make a list of these languages. Europe and Asia will have the most, with far fewer in the indigenous languages of Africa, Oceania, and the Americas. There’s a mega-list of languages here, but there’s probably more manageable regional lists elsewhere, by population. Depending on one’s definition of “mass market,” it would almost have to be a language with over 10,000 speakers, leaving out the vast majority of world languages.
Of course any human language can be written down. The question is about human languages that have been written down (and have had books published in them).
Majority, but perhaps not the vast majority. According to the table I linked to above, about 43% of all world languages have more than 10,000 speakers. (About 24% have fewer than 100 speakers.)
Basque has coexisted with other languages for about 2000 years and hadn’t developed a written form before encountering languages which had one; people learned to write as part of learning these other languages but didn’t write their own (they simply didn’t have any need to, as long-distance communication or communication with The Authorities worked better in these other languages anyway). Its first known written document happens to be the same one as for Spanish, notes in the margin of a book in Latin, and by the same hand (so, that defacing monk was at least trilingual) - end of the 10th or beginning of the 11th century. There is little more before the 19th century brought Nationalism (names, a few small documents; even letters would be sent in “Erdera”, “the other language” whichever that one happens to be).
Since everybody who speaks Basque is at least bilingual (with Spanish or French most commonly, but there are groups in the US and Australia as well), a lot of the time it can be difficult to figure out whether a simultaneously-published double edition was “originally” in Basque or in Erdera. One of my uncles has published self-translated bilingual books and he says some fragments would come out in one language, some in the other. My Basque-Spanish coworkers report the same; I often had to check their documents in Spanish or English and I could tell which language they’d been thinking in at the time they wrote a given part, but ideally it would be unnoticeable post-correction.
So how do you count that? And how would you have counted it before the 19th, or before the 10th century? Mass-produced documents where the original is evidently in Basque are pretty much nonexistant.
> Ethnologue has a very narrow definition of “language”: they tend to classify
> any identifiable dialect as a language.
Not quite, although your basic point is correct. There’s no such thing as an exact dividing line between two related varieties being two dialects of one language which are just barely mutually comprehensible and two varieties being two different languages being closely related but not quite mutually comprehensible. It’s just too hard to make that distinction even at a given time, and there are slow changes where such varieties move from being dialects to being separate languages. Ethnologue has chosen to be slightly slanted toward speaking of such varieties as two different languages. They don’t consider all dialects as different languages. You can see in the entries there that they often list what the various dialects of a language are.
For reference, there are 285 languages for which official Wikipedias have been created. In English, there are some 4.2 million articles. In Herero there are no articles, just the home page (defunct, I see). Herero is spoken in Botswana, Namibia and Angola by 240,000. I see there are at least 10 languages with insufficient activity whose pages have been locked.