Does the English language have many more words than, say, Hindi/Urdu?

For that matter, Yiddish is the closest analogue to Urdu, in terms of its vocabulary being built up from a combined set of two Indo-European languages and one Semitic. And it’s German to boot. No idea of how big it gets, though.

I’m still trying to parse the concept that “English has more words than French or German”. How is that measured? Are phrasal verbs counted as separate words? Do obscure words borrowed from Swiss French and used once by Byron in a single poem count? Do identical-sounding and spelled words which actually come from different origins count as two, or as one? Do “will” and “would” count as different words, or as one? How old does a borrowed word or neologism have to be, or how extended geographically, for it to count? Does leetspeak count as separate words?

Several posters have mentioned that “it’s difficult to define ‘word’, much less to count how many a language has”, but if there was any explanation of how the initial comparison defined words and counted them, I’ve missed it.

Just as an aside, Catalan and Valenciano are undergoing a similar politics-based separation as the two branches of Hindi. Isn’t politics grand?

That’s an excellent question. Would is originally just an inflection of will. But now I think it’s perceived as an independent word. Should has become even more lexicalized—I don’t think anyone even remembers that it used to be just an inflection of shall.

That is such an arbitrary process. The question comes up in editing: loanwords get italicized, naturalized words don’t. Editors pick an English dictionary and use it as the criterion: if a foreign word has become a dictionary entry, it doesn’t get italicized. Unless the dictionary marks it as a foreignism. Yeesh, this gets complicated.

In simplest terms - because I’m no expert, although I’ve read a lot on the process - that count of 170,000 words in the OP’s link comes from the number of headings that appear in the OED. Headings are the basic version of the word. Define is a heading. Defined, defines, defining are not. Basically all conjugations are not treated separate unless they have become separate words on their own. Will would have at least two entries, one as a verb and one as a noun.

In English, you can stick suffixes and prefixes with just about anything. A dictionary will list many words beginning with un- separately but most note that this is a sampling of all the possible words that can be formed this way. Suffixes are often fads. There were fads for words ending in -rama, or -wise, or -gate at various times. Most of these words never become common or lasting enough to rate a dictionary heading.

What then is a “word”? There are several possible definitions by this standard. Dictionaries are inherently conservative - or at least used to be. Each heading had a cost, in time, in expertise, and most especially in paper and print. Words that were obvious extrapolations didn’t need to have separate entries.

In the age of the Internet, that conservatism is fading. UrbanDictionary.com has hundreds of thousands of terms that no dictionary has ever bothered with. Will they ever be counted as part of the language? That’s a subjective decision. Slang is normally ignored until it becomes too widespread. The Dope has its own slang. Og is a word here, if nowhere else.

What does this mean for comparisons to other languages, many of which are even more agglutinative than English? Beats me. That’s a subjective argument for experts.

If you’d like to go deeper into the process, many books have been written about the making of the OED and the multitudes of decisions that the editors had to make. I’d recommend Treasure-House of the Language: The Living OED, by Dr. Charlotte Brewer because she takes it into the present and exposes the deficiencies of those decisions and the struggle to correct them. Not fun reading, but constantly fascinating.

So the count on the OP would come from comparing the number of headers in the OED vs the number of headers in, say, the Diccionario de la Real Academia?

I can see why someone would do that (it’s simple), but there are several problems with the method, including that the people who have created both dictionaries may have used different criteria to separate headers. OED is saying it’s down, come back later, but m-w has “put up” as its own header: RAE would have set a structure like that under the header “put”. That alone creates a difference of many thousand items between the amount of headers you get in DRAE by the current methodology and the amount you’d get by having multi-word structures as separate headers.

Another problem is that different comprehensive dictionaries, created by the same organization, use different criteria: the Diccionario Panhispánico de Dudas is thicker than the DRAE.

I don’t know from how big it could get, but the thought is giving me shpilkes in my geneckteckessoink.

;):smiley:

Sheesh, hock mir nicht kein chinik.

So, WordMan, can you define your own name?

I wasn’t trying to get on your nerves; sorry if I did.

Define my name - you mean in Yiddish? Hmm, I don’t think so. My dad typically just called me *pisher *or *schmendrick *:smiley:

In Hebrew, maybe *sofer *- for scribe or word-notator???

I meant, is WordMan a word? Or is it a name? Are names words? What about Exapno and Mapcase? Names can become words. A john is a bathroom. And the word used for a prostitute’s client. Are those two separate words even though the dictionary puts them under one header?

Oy vey!

Well, your extremely inclusive definition of “vocabulary” is getting somewhat far afield from the typical vocabulary size comparisons that I mentioned in my OP.

For example, googling queries like “which language has the most words” or “does English have more words than [language name]” routinely turns up hits like this one:

Now, of course this bluntly approximate statement (which however was qualified with caveats earlier in the linked cite) ignores all the linguistic arcana that, as you quite rightly point out, make it difficult to determine what a “word” is or how many “words” a particular language has. But it does reflect the general consensus that I mentioned in my OP: namely, that English is widely claimed to have, shall we say, a much bigger “standard dictionary word list” than various other European languages do.

And, as I noted in my previous post, this thread’s discussion has reinforced my suspicion that English’s “size advantage” for vocabulary defined in such a narrowly restricted sense may indeed be significantly less with respect to other world languages such as Hindi/Urdu than with respect to, say, Spanish or French or German.

I do not dispute that if you use instead a definition of “vocabulary” broad enough to ascribe to English, as you say, ‘a size of 750,000 to 1,000,000 “words,”’ then English may indeed have by far the biggest “vocabulary” of any language on the planet. (Although I haven’t actually seen any quantitative evidence for that claim: after all, other languages can add prefixes and suffixes to basic word-forms too, and many of them have many more inflectional forms for each word than English does, so maybe there are other languages out there that also have on the order of a million “words”.) But that isn’t the sort of sense of “vocabulary size” that I opened this thread to ask about.

Some of this is inherently subjective. Shakespeare coined hundreds of words that are in common use today, but thousands more would have been forgotten if kids didn’t still have to read Shakespeare in English literature classes.

Yeah - framing language in quantitative terms is like trying to nail jello to a wall.

Kimstu, nobody can read your mind to know what you meant to say in the OP. If what you meant is the number of headers in an unabridged dictionary you now know to frame it in those terms.

Oddly, nobody even agrees on that number for the OED itself, much less the English language as a whole. See this chart.

Whether the question you’ve asked is answerable, even under those limits, is doubtful. Finding apples to apples comparisons across different language families is nigh impossible.

My subjective answer is that English, as the only world language today, is larger and richer - partly because it paradoxically contains more technical terms and more nonstandard terms - than any other language in history. I can’t give you a number, though. If you don’t want to accept that, feel free to argue. But you need to argue against the known and minutely examined history of English to do so, and you need to keep in mind that those you argue against will use that against you.

There there, sorry if I upset you. The previous respondents such as Johanna seemed to find the context of the OP’s query fairly unambiguous, so I figured (mistakenly, it appears) that we were all pretty much on the same page as far as that was concerned.

I agree. Which further confirms my hypothesis that sweeping claims about the uniquely large size of the English language among all world languages, of the simplistic “200K versus 100K words” sort that I’ve been referring to here, are essentially unsupported overgeneralizations.

It looks as though people who don’t know much about non-European languages have been pretty much just eyeballing the thickness of dictionaries in English as compared to, say, Spanish or German, and extrapolating from that the now-widespread claim that standard English vocabulary far exceeds that of any language in the world.

Since, as you point out, similar comparisons with non-European language vocabularies haven’t been made and can be much more difficult to make, there’s no reason to take such a claim at face value.

That’s okay. What I was trying to find out was whether any of the sources who seem to be putting such a response forward as an allegedly objective answer were basing it on any numbers.

And at present it appears that—except for a few rough estimates eyeballed from a few modern European languages—they’re not.

Not eyeballing the thickness, your own quote in post 30 confirms what Exapno said in 24: it’s header count. But Spanish dictionaries (and German ones inasmuch as I’m familiar with them, which is little) place under a single header items which would go in separate headers in English. Use the same criteria to establish headers in Spanish that is used in English and shazam, suddenly the Spanish dictionary gets a lot more headers. Twice as many? Probably not, but 40-50% more yes.

OED wants me to subscribe, so back to M-W. “Would (verb)” is listed separate from two different “will (verb)”, and from “will (noun)” (in a Spanish dictionary, “would” wouldn’t be listed and “will” would have a single entry). Shall we list Spanish conjugates separately as well? That Spanish dictionary is looking thicker and thicker - and French ones have exploded out of their shelves.

Comparing amount of words in a language by the number of entries in their dictionary is like comparing the income of a tailor in Boston in 1811 and of another one, same city, in 2011, without taking inflation into account.

There may be a lot of words in Urdu, but they must be all quite simple. To quote Elton John on the subject, “sari seems to be the hardest word”.

Reported.