The Straight Dope

Go Back   Straight Dope Message Board > Main > General Questions

Reply
 
Thread Tools Display Modes
  #1  
Old 09-11-2011, 06:23 PM
Kimstu Kimstu is offline
Guest
 
Join Date: Dec 1999
Does the English language have many more words than, say, Hindi/Urdu?

I frequently see claims to the effect that English has by far the largest vocabulary of all languages, given that it has over 170,000 words theoretically in current use (although something over 98% of all written English texts employ a vocabulary of less than 20,000 words).

However, the comparisons I've seen of the vocabulary size of English with other languages all invoke competitors where, IMHO, English is punching significantly below its weight. Yes, I'm not surprised that English may be approximately twice as big, vocabulary-wise, as Spanish or German or French, say. But Spanish and German and French are not what spring to my mind when I think of languages with large vocabularies.

The reasons commonly adduced to explain the unusually large size of English vocabulary generally include the following:

- major influence from multiple language families, especially its fundamental mix of Germanic and classical tongues;

- hundreds of years of exposure to and borrowing from other languages due to Anglophone political and cultural influence worldwide;

- a large number of native speakers and second-language speakers;

- absence of formal academic oversight of its linguistic development.

French and German certainly aren't comparable to English in these respects, but it's not clear to me that other languages aren't. In particular, I would think that Hindi/Urdu would have many similar factors favoring large vocabulary size. E.g., it has two separate major linguistic influences from extremely prolific ancient languages: Persian (itself a hybrid of two distinct linguistic streams from different families, Arabic and Middle Persian) and Sanskrit (also hybridized, from proto-Indo-Aryan and Dravidian and other South Asian language families). Hindi/Urdu has also borrowed like crazy from more recent linguistic sources, and has a huge number of speakers, many of whom enrich its content with words from other languages.

However, I can't seem to find an authoritative source on the size of Hindi/Urdu vocabulary (although I've seen it stated without cite or explanation as 120,000 words), nor can I find any explicit comparison between English and Hindi/Urdu vocabulary size. Anybody got the Straight Dope?
Reply With Quote
Advertisements  
  #2  
Old 09-11-2011, 06:46 PM
AK84 AK84 is offline
Guest
 
Join Date: Apr 2008
Please do not ever again conflate Hindi and Urdu, they are seperate languages!

Re the OP; the answer is that Urdu has a much larger vocabulary then English; of course the reason for this is that Urdu began life as an amalgamation* of many tongues and to this day every Farsi word is also an Urdu word.





*The Official term for it was infact Zaban-e-Urdu; "the language of the army".
Reply With Quote
  #3  
Old 09-11-2011, 06:53 PM
Johanna Johanna is offline
Charter Member
 
Join Date: Oct 1999
Location: Altered States of America
Posts: 11,305
Lots of Persian expressions have been disqualified from modern Hindi, just as tons of tatsama (loans from Sanskrit) were ruled out of Urdu. Nowadays you won't find the full extent of combined vocabulary on either side. The place to go for the consolidated vocabulary would be a monumental work dating from before the split: the Dictionary of Urdū, Classical Hindī, and English (1884) by J. T. Platts. Which is one huge-ass book. I don't have a copy of my own, though.

My Standard Twentieth Century Dictionary: Urdu into English (Delhi, 1980) claims it has "Over 50,000 words, phrases, and proverbs used in spoken and literary Urdu." Obviously if you don't count the phrases and proverbs, the number of words will fall short of 50,000. I'll say one thing for this dictionary, which I've had for over 20 years now, it's rarely been stumped. However, when a word isn't found in it, I can get it either from the literary Persian-English dictionary by Franz Steingass, or from the Hindi dictionary. One or the other.

I searched through the front matter of my Oxford Hindi-English Dictionary, but they aren't letting on how many words they stuffed into it.

My money is on Platt, if you can access a copy of it somehow. Good luck.
Reply With Quote
  #4  
Old 09-11-2011, 06:55 PM
coremelt coremelt is offline
Guest
 
Join Date: Jan 2009
Quote:
Originally Posted by AK84 View Post
Please do not ever again conflate Hindi and Urdu, they are seperate languages!
From Wiki:
Standard Urdu is mutually intelligible with Standard Hindi. Both languages share the same Indic base and are so similar in phonology and grammar that they appear to be one language.[5]

and:

Because of religious nationalism since the partition of British India and continued communal tensions, native speakers of both Hindi and Urdu frequently assert them to be completely distinct languages, despite the fact that they generally cannot tell the colloquial languages apart.

As to the size of vocabulary, the estimates I see online is 150,000 words which is approximately equal to English.
Reply With Quote
  #5  
Old 09-11-2011, 07:21 PM
Johanna Johanna is offline
Charter Member
 
Join Date: Oct 1999
Location: Altered States of America
Posts: 11,305
Quote:
Originally Posted by AK84 View Post
Please do not ever again conflate Hindi and Urdu, they are seperate languages!
This is ahistorical. The split between them is relatively recent. And IIRC the split was instigated by British imperial divide-and-rule policies. For centuries they were a shared language called Khaṛī Boli or Hindūstānī. I think Khaṛī Bolī refers more to spoken language (bolī means 'speaking'), with Hindūstānī referring more to the literary language. If Platt is any evidence, the combined language could be and was written equally well in both Devanāgarī and Perso-Arabic Nasta‘līq. For historical political reasons, the Perso-Arabic script was dominant throughout the time of the Mughal Empire and perhaps also the Delhi Sultanate.

Hindi and Urdu are not even two dialects of the same language; they are two divergent elaborations of one and the same dialect, Khaṛī Bolī, native to the Delhi-Agra area. I mean Hindi in the narrow sense, the official standard language of India. Hindi in the wider sense covers a vast area of dialects that are not mutually comprehensible. In the latter sense you could technically say that Hindi is not even the same language as Hindi. But aside from linguists, I think everybody pretty much understands the name Hindi in the narrow sense.
Reply With Quote
  #6  
Old 09-11-2011, 09:28 PM
AK84 AK84 is offline
Guest
 
Join Date: Apr 2008
It might have been the case 60 years ago. I don't know. However today I as a native speaker of Urdu when watching Indian television am stumped about half the time. Formal Hindi is even more difficult. I also know that cross border movies, songs and drama serials of which they are a lot have to be carefully scripted lest they become incomprehensible.
Reply With Quote
  #7  
Old 09-11-2011, 09:56 PM
Johanna Johanna is offline
Charter Member
 
Join Date: Oct 1999
Location: Altered States of America
Posts: 11,305
More like 100 years ago is when they were, quite deliberately and with malice aforethought, split apart because of communal politics. As noted above, this was instigated by the British for their divide-and-rule strategy. They put people to work with scissors (metaphorically), cutting Persian words out of the Hindi dictionary and pasting in Sanskrit tatsamas to replace them. It was manmade tinkering, not natural language evolution. It has only partially taken. Lots of Persian expressions that the policymakers tried to eliminate have remained in popularly spoken Hindi. The simpler the level of language being used, the more similar and even identical Hindi and Urdu get. The more prestigious, learned, and specialized the register, the more Urdu veers away from native Indic speech and uses literary Persian expressions, and the more Hindi gets away from both native speech and Persian, using Sanskrit vocabulary. It used to be that Bollywood Hindi films were a broad linguistic common ground. Maybe in recent years the gulf has been widening there too; I don't keep up with them like I used to.

When I first learned Urdu/Hindi and watched a lot of Bollywood, I noticed specifically that the songs used a vocabulary best described as Urdu. It's no wonder, considering the weight of centuries of Urdu poetic tradition that filtered down into film songs, and the fact that Muslims are heavily represented in the entertainment business in India, especially in music. Who are the two most renowned film music composers in India? Naushad and A. R. Rahman. Who are the two most beloved playback singers of all time? Lata Mangeshkar and Muhammad Rafi. That makes 3 out of 4 Muslims at the topmost level of the film music biz.
Reply With Quote
  #8  
Old 09-11-2011, 10:07 PM
AK84 AK84 is offline
Guest
 
Join Date: Apr 2008
Re the British, I Don't really think that was the case. If anything the amount of people speaking Urdu and Hindi has increased since 1947 . Before that you would know your own regional language and then English and the Urdu or Hindi. Look at Gandhi and Jinnah, both were Gujarati speakers and there professional language was English. Niether had a good command of Urdu or Hindi.
Reply With Quote
  #9  
Old 09-11-2011, 10:21 PM
Johanna Johanna is offline
Charter Member
 
Join Date: Oct 1999
Location: Altered States of America
Posts: 11,305
The actions of the language tinkerers 100 years ago (which is what I was talking about) has nothing to do with the total number of speakers today. The latter, post-1947, I'd attribute to the increase in elementary school attendance levels, national language policies intended to promote national integration using official languages, and most of all the huge growth of mass media.
Reply With Quote
  #10  
Old 09-11-2011, 10:30 PM
Exapno Mapcase Exapno Mapcase is online now
Charter Member
 
Join Date: Mar 2002
Location: NY but not NYC
Posts: 22,824
The OP ommited this paragraph from the link:
Quote:
This suggests that there are, at the very least, a quarter of a million distinct English words, excluding inflections, and words from technical and regional vocabulary not covered by the OED, or words not yet added to the published dictionary, of which perhaps 20 per cent are no longer in current use. If distinct senses were counted, the total would probably approach three quarters of a million.
Most people treat the OED as the authoritative source. Yet the Second Edition mentioned in the link is hopelessly obsolete; omitted all words that its editors found only a single mention of - except when certain favored authors used them; neglected to research the vast majority of writers because they weren't highbrow enough; was almost comically ignorant of technical and scientific terms, along with various trade argots, slang, and dialects; took totally unrepresentative samplings of usage from English-speaking countries outside the UK when they bothered to notice them at all; disdained any use of pidgins, creoles, and English as a second or world language; and corrected only a bare minimum of the millions of errors in the First Edition. The Third Edition will correct many of these flaws, but the Internet overflows its banks daily and the OED can't catch up, because the Internet is mostly English as she is spoke by several billion speakers and writers who the editors would have committed suicide before allowing to sully the pages of their precious volume.

And that's not even getting at a problem that the early editors of the OED barely even realized was an issue. Everyday language and schooling treats a "word" as a low-level, easily-understood entity. Lexicographers can't do this. Semanticists can't do this. Historical linguists can't do this. When the most basic count varies by 400-500% then a "word" can't be pinned down exactly any more than the simultaneous momentum and position of an electron can be.

A vast territory lies between 170,000 and three quarters of a million. The meaning of "word" itself breaks down into technicalities. The way different languages treat the slices that occur between spaces also varies so much that comparisons between and among languages for the technical definition of a word are matters for academic debates that last lifetimes. That way lies madness.

Get out now while you can still save yourself!
Reply With Quote
  #11  
Old 09-12-2011, 10:48 AM
Floater Floater is offline
Guest
 
Join Date: May 2000
According to a linguistics professor who was asked the question in a radio program there is absolutely no way whatsoever to say how many words a given language has.
Reply With Quote
  #12  
Old 09-12-2011, 11:44 AM
Acsenray Acsenray is offline
Charter Member
 
Join Date: Apr 2002
Location: U.S.A.
Posts: 25,431
Quote:
Originally Posted by AK84 View Post
Please do not ever again conflate Hindi and Urdu, they are seperate languages!
As Johanna has explained, this position is a political one, not a linguistic one. Hindi and Urdu are two standardized registers of one language, known either as Hindustani or Hindi-Urdu.

The Hindi used in television news is a different standardized register of Hindi-Urdu. However, the movie and music industries use a form of Hindi-Urdu largely intelligible across borders, and it doesn't require much effort to keep it that way.

Quote:
to this day every Farsi word is also an Urdu word.
Then every Sanskrit word, and every Urdu word, and every Farsi word is also a Hindustani word. It works both ways.

Quote:
*The Official term for it was infact Zaban-e-Urdu; "the language of the army".
More precisely "language of the camp"

"Urdu" being cognate with "horde," you might also say "language of the horde."
Reply With Quote
  #13  
Old 09-12-2011, 12:00 PM
Really Not All That Bright Really Not All That Bright is offline
Guest
 
Join Date: May 2003
Quote:
Originally Posted by Johanna View Post
Who are the two most beloved playback singers of all time? Lata Mangeshkar and Muhammad Rafi. That makes 3 out of 4 Muslims at the topmost level of the film music biz.
Lata Mangeshkar is (at least nominally) a Hindu, as is her sister, Asha Bhosle, who should certainly be on your list.
Reply With Quote
  #14  
Old 09-12-2011, 12:09 PM
Acsenray Acsenray is offline
Charter Member
 
Join Date: Apr 2002
Location: U.S.A.
Posts: 25,431
Quote:
Originally Posted by Really Not All That Bright View Post
Lata Mangeshkar is (at least nominally) a Hindu
I think that's why Johanna said "3 out of 4," the three being Naushad Ali, A. R. Rahman (a convert to Islam), and Mohammed Rafi.
Reply With Quote
  #15  
Old 09-12-2011, 12:24 PM
Johanna Johanna is offline
Charter Member
 
Join Date: Oct 1999
Location: Altered States of America
Posts: 11,305
Quote:
Originally Posted by Really Not All That Bright View Post
Lata Mangeshkar is (at least nominally) a Hindu, as is her sister, Asha Bhosle, who should certainly be on your list.
I implicitly acknowledged Lata being non-Muslim by saying "3 out of 4" on my list were Muslims. But I could have phrased that more clearly, so it isn't your bad if it was hard to parse. If I'd named the three most beloved singers, of course Asha would have been a close third.
Reply With Quote
  #16  
Old 09-12-2011, 12:24 PM
Really Not All That Bright Really Not All That Bright is offline
Guest
 
Join Date: May 2003
Quote:
Originally Posted by Acsenray View Post
I think that's why Johanna said "3 out of 4," the three being Naushad Ali, A. R. Rahman (a convert to Islam), and Mohammed Rafi.
Oh, I see. I didn't know Rahman was a Muslim. Sorry, Johanna.
Reply With Quote
  #17  
Old 09-12-2011, 12:29 PM
Johanna Johanna is offline
Charter Member
 
Join Date: Oct 1999
Location: Altered States of America
Posts: 11,305
Of course, A. R. Rahman is a native Tamil speaker, so it's meaningless to cite him on a question of Urdu. I ought to have thought of that.
Reply With Quote
  #18  
Old 09-12-2011, 03:49 PM
Kimstu Kimstu is offline
Guest
 
Join Date: Dec 1999
Thanks! Other heavy-hitting vocabulary contenders?

Quote:
Originally Posted by AK84
However today I as a native speaker of Urdu when watching Indian television am stumped about half the time.
Well, even I as a native speaker of American English can easily be stumped when watching UK television using unfamiliar dialects of British English. (And not necessarily rare or obscure dialects, either.) Different dialects or variants within a single language can evolve very rapidly in different directions while still being technically considered the same language.

Anyway, bahut shukriya and dhanyavad to all respondents. While I quite agree that it's hopeless to attempt to pin a precise number on the size of the vocabulary of almost any language, this discussion has reinforced my suspicion that English is not as utterly exceptional in its huge vocabulary size as is often claimed. In particular, English vocabulary compared to that of other European languages looks a lot bigger than it does when compared to Hindi/Urdu.

Now I'm wondering: are there other languages whose vocabulary size is closer to that of English than that of, say, French or German?
Reply With Quote
  #19  
Old 09-12-2011, 06:39 PM
Exapno Mapcase Exapno Mapcase is online now
Charter Member
 
Join Date: Mar 2002
Location: NY but not NYC
Posts: 22,824
Quote:
Originally Posted by Kimstu View Post
While I quite agree that it's hopeless to attempt to pin a precise number on the size of the vocabulary of almost any language, this discussion has reinforced my suspicion that English is not as utterly exceptional in its huge vocabulary size as is often claimed.
The huge vocabulary of English is predicated on a size of 750,000 to 1,000,000 "words," which is far greater than that claimed for any other language. Nothing in this thread should have led you to believe otherwise.
Reply With Quote
  #20  
Old 09-12-2011, 07:20 PM
Really Not All That Bright Really Not All That Bright is offline
Guest
 
Join Date: May 2003
I'm a little surprised that German is being discounted. It's famously accretive.
Reply With Quote
  #21  
Old 09-12-2011, 10:09 PM
Johanna Johanna is offline
Charter Member
 
Join Date: Oct 1999
Location: Altered States of America
Posts: 11,305
For that matter, Yiddish is the closest analogue to Urdu, in terms of its vocabulary being built up from a combined set of two Indo-European languages and one Semitic. And it's German to boot. No idea of how big it gets, though.
Reply With Quote
  #22  
Old 09-13-2011, 01:56 AM
Nava Nava is offline
Guest
 
Join Date: Nov 2004
Quote:
Originally Posted by Kimstu View Post
Now I'm wondering: are there other languages whose vocabulary size is closer to that of English than that of, say, French or German?
I'm still trying to parse the concept that "English has more words than French or German". How is that measured? Are phrasal verbs counted as separate words? Do obscure words borrowed from Swiss French and used once by Byron in a single poem count? Do identical-sounding and spelled words which actually come from different origins count as two, or as one? Do "will" and "would" count as different words, or as one? How old does a borrowed word or neologism have to be, or how extended geographically, for it to count? Does leetspeak count as separate words?

Several posters have mentioned that "it's difficult to define 'word', much less to count how many a language has", but if there was any explanation of how the initial comparison defined words and counted them, I've missed it.




Just as an aside, Catalan and Valenciano are undergoing a similar politics-based separation as the two branches of Hindi. Isn't politics grand?

Last edited by Nava; 09-13-2011 at 01:59 AM..
Reply With Quote
  #23  
Old 09-13-2011, 06:36 AM
Johanna Johanna is offline
Charter Member
 
Join Date: Oct 1999
Location: Altered States of America
Posts: 11,305
Quote:
Originally Posted by Nava View Post
Do "will" and "would" count as different words, or as one?
That's an excellent question. Would is originally just an inflection of will. But now I think it's perceived as an independent word. Should has become even more lexicalized—I don't think anyone even remembers that it used to be just an inflection of shall.

Quote:
How old does a borrowed word or neologism have to be, or how extended geographically, for it to count?
That is such an arbitrary process. The question comes up in editing: loanwords get italicized, naturalized words don't. Editors pick an English dictionary and use it as the criterion: if a foreign word has become a dictionary entry, it doesn't get italicized. Unless the dictionary marks it as a foreignism. Yeesh, this gets complicated.

Last edited by Johanna; 09-13-2011 at 06:38 AM..
Reply With Quote
  #24  
Old 09-13-2011, 10:06 AM
Exapno Mapcase Exapno Mapcase is online now
Charter Member
 
Join Date: Mar 2002
Location: NY but not NYC
Posts: 22,824
Quote:
Originally Posted by Nava View Post
Several posters have mentioned that "it's difficult to define 'word', much less to count how many a language has", but if there was any explanation of how the initial comparison defined words and counted them, I've missed it.
In simplest terms - because I'm no expert, although I've read a lot on the process - that count of 170,000 words in the OP's link comes from the number of headings that appear in the OED. Headings are the basic version of the word. Define is a heading. Defined, defines, defining are not. Basically all conjugations are not treated separate unless they have become separate words on their own. Will would have at least two entries, one as a verb and one as a noun.

In English, you can stick suffixes and prefixes with just about anything. A dictionary will list many words beginning with un- separately but most note that this is a sampling of all the possible words that can be formed this way. Suffixes are often fads. There were fads for words ending in -rama, or -wise, or -gate at various times. Most of these words never become common or lasting enough to rate a dictionary heading.

What then is a "word"? There are several possible definitions by this standard. Dictionaries are inherently conservative - or at least used to be. Each heading had a cost, in time, in expertise, and most especially in paper and print. Words that were obvious extrapolations didn't need to have separate entries.

In the age of the Internet, that conservatism is fading. UrbanDictionary.com has hundreds of thousands of terms that no dictionary has ever bothered with. Will they ever be counted as part of the language? That's a subjective decision. Slang is normally ignored until it becomes too widespread. The Dope has its own slang. Og is a word here, if nowhere else.

What does this mean for comparisons to other languages, many of which are even more agglutinative than English? Beats me. That's a subjective argument for experts.

If you'd like to go deeper into the process, many books have been written about the making of the OED and the multitudes of decisions that the editors had to make. I'd recommend Treasure-House of the Language: The Living OED, by Dr. Charlotte Brewer because she takes it into the present and exposes the deficiencies of those decisions and the struggle to correct them. Not fun reading, but constantly fascinating.
Reply With Quote
  #25  
Old 09-13-2011, 10:30 AM
Nava Nava is offline
Guest
 
Join Date: Nov 2004
So the count on the OP would come from comparing the number of headers in the OED vs the number of headers in, say, the Diccionario de la Real Academia?

I can see why someone would do that (it's simple), but there are several problems with the method, including that the people who have created both dictionaries may have used different criteria to separate headers. OED is saying it's down, come back later, but m-w has "put up" as its own header: RAE would have set a structure like that under the header "put". That alone creates a difference of many thousand items between the amount of headers you get in DRAE by the current methodology and the amount you'd get by having multi-word structures as separate headers.

Another problem is that different comprehensive dictionaries, created by the same organization, use different criteria: the Diccionario Panhispánico de Dudas is thicker than the DRAE.

Last edited by Nava; 09-13-2011 at 10:32 AM..
Reply With Quote
  #26  
Old 09-13-2011, 10:47 AM
WordMan WordMan is online now
Charter Member
 
Join Date: Apr 2001
Posts: 15,103
Quote:
Originally Posted by Johanna View Post
For that matter, Yiddish is the closest analogue to Urdu, in terms of its vocabulary being built up from a combined set of two Indo-European languages and one Semitic. And it's German to boot. No idea of how big it gets, though.
I don't know from how big it could get, but the thought is giving me shpilkes in my geneckteckessoink.

Reply With Quote
  #27  
Old 09-13-2011, 11:17 AM
Exapno Mapcase Exapno Mapcase is online now
Charter Member
 
Join Date: Mar 2002
Location: NY but not NYC
Posts: 22,824
Sheesh, hock mir nicht kein chinik.

So, WordMan, can you define your own name?
Reply With Quote
  #28  
Old 09-13-2011, 11:30 AM
WordMan WordMan is online now
Charter Member
 
Join Date: Apr 2001
Posts: 15,103
Quote:
Originally Posted by Exapno Mapcase View Post
Sheesh, hock mir nicht kein chinik.

So, WordMan, can you define your own name?
I wasn't trying to get on your nerves; sorry if I did.

Define my name - you mean in Yiddish? Hmm, I don't think so. My dad typically just called me pisher or schmendrick

In Hebrew, maybe sofer - for scribe or word-notator???
Reply With Quote
  #29  
Old 09-13-2011, 11:58 AM
Exapno Mapcase Exapno Mapcase is online now
Charter Member
 
Join Date: Mar 2002
Location: NY but not NYC
Posts: 22,824
I meant, is WordMan a word? Or is it a name? Are names words? What about Exapno and Mapcase? Names can become words. A john is a bathroom. And the word used for a prostitute's client. Are those two separate words even though the dictionary puts them under one header?

Oy vey!

Last edited by Exapno Mapcase; 09-13-2011 at 11:59 AM..
Reply With Quote
  #30  
Old 09-13-2011, 12:06 PM
Kimstu Kimstu is offline
Guest
 
Join Date: Dec 1999
Quote:
Originally Posted by Exapno Mapcase View Post
The huge vocabulary of English is predicated on a size of 750,000 to 1,000,000 "words," which is far greater than that claimed for any other language. Nothing in this thread should have led you to believe otherwise.
Well, your extremely inclusive definition of "vocabulary" is getting somewhat far afield from the typical vocabulary size comparisons that I mentioned in my OP.

For example, googling queries like "which language has the most words" or "does English have more words than [language name]" routinely turns up hits like this one:
Quote:
All that said, it is probably fair to say that English has about twice as many words as does Spanish. Large college-level English dictionaries typically include around 200,000 words. Comparable Spanish dictionaries, on the other hand, typically have around 100,000 words. Of course, many of those words are seldom used.
Now, of course this bluntly approximate statement (which however was qualified with caveats earlier in the linked cite) ignores all the linguistic arcana that, as you quite rightly point out, make it difficult to determine what a "word" is or how many "words" a particular language has. But it does reflect the general consensus that I mentioned in my OP: namely, that English is widely claimed to have, shall we say, a much bigger "standard dictionary word list" than various other European languages do.

And, as I noted in my previous post, this thread's discussion has reinforced my suspicion that English's "size advantage" for vocabulary defined in such a narrowly restricted sense may indeed be significantly less with respect to other world languages such as Hindi/Urdu than with respect to, say, Spanish or French or German.

I do not dispute that if you use instead a definition of "vocabulary" broad enough to ascribe to English, as you say, 'a size of 750,000 to 1,000,000 "words,"' then English may indeed have by far the biggest "vocabulary" of any language on the planet. (Although I haven't actually seen any quantitative evidence for that claim: after all, other languages can add prefixes and suffixes to basic word-forms too, and many of them have many more inflectional forms for each word than English does, so maybe there are other languages out there that also have on the order of a million "words".) But that isn't the sort of sense of "vocabulary size" that I opened this thread to ask about.
Reply With Quote
  #31  
Old 09-13-2011, 12:13 PM
Really Not All That Bright Really Not All That Bright is offline
Guest
 
Join Date: May 2003
Some of this is inherently subjective. Shakespeare coined hundreds of words that are in common use today, but thousands more would have been forgotten if kids didn't still have to read Shakespeare in English literature classes.
Reply With Quote
  #32  
Old 09-13-2011, 12:31 PM
WordMan WordMan is online now
Charter Member
 
Join Date: Apr 2001
Posts: 15,103
Quote:
Originally Posted by Exapno Mapcase View Post
I meant, is WordMan a word? Or is it a name? Are names words? What about Exapno and Mapcase? Names can become words. A john is a bathroom. And the word used for a prostitute's client. Are those two separate words even though the dictionary puts them under one header?

Oy vey!
Yeah - framing language in quantitative terms is like trying to nail jello to a wall.
Reply With Quote
  #33  
Old 09-13-2011, 03:45 PM
Exapno Mapcase Exapno Mapcase is online now
Charter Member
 
Join Date: Mar 2002
Location: NY but not NYC
Posts: 22,824
Kimstu, nobody can read your mind to know what you meant to say in the OP. If what you meant is the number of headers in an unabridged dictionary you now know to frame it in those terms.

Oddly, nobody even agrees on that number for the OED itself, much less the English language as a whole. See this chart.

Whether the question you've asked is answerable, even under those limits, is doubtful. Finding apples to apples comparisons across different language families is nigh impossible.

My subjective answer is that English, as the only world language today, is larger and richer - partly because it paradoxically contains more technical terms and more nonstandard terms - than any other language in history. I can't give you a number, though. If you don't want to accept that, feel free to argue. But you need to argue against the known and minutely examined history of English to do so, and you need to keep in mind that those you argue against will use that against you.
Reply With Quote
  #34  
Old 09-13-2011, 04:29 PM
Kimstu Kimstu is offline
Guest
 
Join Date: Dec 1999
Quote:
Originally Posted by Exapno Mapcase
Kimstu, nobody can read your mind to know what you meant to say in the OP. If what you meant is the number of headers in an unabridged dictionary you now know to frame it in those terms.
There there, sorry if I upset you. The previous respondents such as Johanna seemed to find the context of the OP's query fairly unambiguous, so I figured (mistakenly, it appears) that we were all pretty much on the same page as far as that was concerned.
Quote:
Originally Posted by Exapno Mapcase
Whether the question you've asked is answerable, even under those limits, is doubtful. Finding apples to apples comparisons across different language families is nigh impossible.
I agree. Which further confirms my hypothesis that sweeping claims about the uniquely large size of the English language among all world languages, of the simplistic "200K versus 100K words" sort that I've been referring to here, are essentially unsupported overgeneralizations.

It looks as though people who don't know much about non-European languages have been pretty much just eyeballing the thickness of dictionaries in English as compared to, say, Spanish or German, and extrapolating from that the now-widespread claim that standard English vocabulary far exceeds that of any language in the world.

Since, as you point out, similar comparisons with non-European language vocabularies haven't been made and can be much more difficult to make, there's no reason to take such a claim at face value.
Quote:
Originally Posted by Exapno Mapcase
My subjective answer is that English, as the only world language today, is larger and richer - partly because it paradoxically contains more technical terms and more nonstandard terms - than any other language in history. I can't give you a number, though.
That's okay. What I was trying to find out was whether any of the sources who seem to be putting such a response forward as an allegedly objective answer were basing it on any numbers.

And at present it appears that---except for a few rough estimates eyeballed from a few modern European languages---they're not.
Reply With Quote
  #35  
Old 09-14-2011, 01:27 AM
Nava Nava is offline
Guest
 
Join Date: Nov 2004
Quote:
Originally Posted by Kimstu View Post
TIt looks as though people who don't know much about non-European languages have been pretty much just eyeballing the thickness of dictionaries in English as compared to, say, Spanish or German
Not eyeballing the thickness, your own quote in post 30 confirms what Exapno said in 24: it's header count. But Spanish dictionaries (and German ones inasmuch as I'm familiar with them, which is little) place under a single header items which would go in separate headers in English. Use the same criteria to establish headers in Spanish that is used in English and shazam, suddenly the Spanish dictionary gets a lot more headers. Twice as many? Probably not, but 40-50% more yes.

OED wants me to subscribe, so back to M-W. "Would (verb)" is listed separate from two different "will (verb)", and from "will (noun)" (in a Spanish dictionary, "would" wouldn't be listed and "will" would have a single entry). Shall we list Spanish conjugates separately as well? That Spanish dictionary is looking thicker and thicker - and French ones have exploded out of their shelves.

Comparing amount of words in a language by the number of entries in their dictionary is like comparing the income of a tailor in Boston in 1811 and of another one, same city, in 2011, without taking inflation into account.

Last edited by Nava; 09-14-2011 at 01:27 AM..
Reply With Quote
  #36  
Old 09-14-2011, 02:33 PM
hibernicus hibernicus is offline
Charter Member
 
Join Date: Oct 2000
Location: Dublin, Ireland
Posts: 1,916
Quote:
Originally Posted by AK84 View Post
Re the OP; the answer is that Urdu has a much larger vocabulary then English
There may be a lot of words in Urdu, but they must be all quite simple. To quote Elton John on the subject, "sari seems to be the hardest word".
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 10:10 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.

Send questions for Cecil Adams to: cecil@chicagoreader.com

Send comments about this website to: webmaster@straightdope.com

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Publishers - interested in subscribing to the Straight Dope?
Write to: sdsubscriptions@chicagoreader.com.

Copyright © 2013 Sun-Times Media, LLC.