Is Japanese really the hardest to learn E.Asian language, and Vietnamese the easiest?

I know by experience that 1000 characters is too few to be able to read the newspaper, specifically. The problem with newspapers is that the text is so condensed that there is virtually no redundancy. The few characters you can’t read really hamper your understanding of the text. There is a lot of debate about the validity of the joyou kanji list. Some of the characters on it are essentially never used, like 畝. Other very common characters aren’t on it, like 丼. Newspapers use their own list, the shimbun kanji-hyou. I don’t have the exact number of kanji on that list, but it’s about the same as the joyou.

That being said, I still find the number 4000 inflated and Sage Rat’s math is iffy, for the reasons you outlined. Looking at kanji lists, anything beyond the most common 3000 characters is extremely rare. Sure, the average person would recognize a few of them, but 1000?

By the way, I found a thread on a Japanese message board where a few native speakers are debating the issue. The method they used was to take a random sample of the JIS character list, which has all the characters you can use on a computer, and see how many they knew. While they all scored above 3000 characters, they also self-identified as better than average readers.

That’s not a very good test. Not “knowing” a kanji, you can often still tell what the word is based on knowing the other kanji in the word, recognising what sort of meaning the kanji should have based on its radicals, and the pronunciation it might have comparing it to kanji with similar non-radical parts. People also probably recognize a significant number of kanji that are used in personal names and place names, which wouldn’t be commonly used. And they’re going to know terms that are particular to their hobbies and professions. I couldn’t tell you what “precipitation” is in Japanese, but I do know “mercenary” and probably hundreds of other random, uncommon words that are particular to my interests and common within that realm though uncommon for an overview of all writings.

To do a proper test, what they need to do is to take a sample of ~18 year olds writing at their very best (essays for school and such) and then determine about how many kanji, on average, were used per student based on the average length, and compare that to longer works (novels) to see how many kanji they are likely to have used if they had written long enough works to have exhausted all of their repertoire, and then calculate from that how many kanji each of them probably knows.

The distribution of word frequencies is going to be the same regardless of language and how you chop things up. You could compare the curve of the most popular nouns to the most popular verbs in English to each other and while the single most frequent noun might have double the appearances as the single most frequent verb, the distribution curve is going to be the same. And more importantly, people write words based on their level of fluency. If you compared my Japanese writing to a Japanese person’s Japanese writing, the distribution curve of the words we use is going to be the same even though theirs will encompass more words. What this means is that if you take a sample of a hundred people’s writings, the appearance frequency of words, nouns, verbs, and kanji is going to match the average fluency of those one hundred people.

So if you take samples of writing and compare distributions, that’s going to tell you understanding as well.

For an English speaker to bump his knowledge of English up one level, he has to learn something like an exponentially greater number of words. This might seem non-intuitive, but it’s what a long-tailed distribution of writing/understanding means. So while it doesn’t seem like a person should have to double or could some how double his knowledge count to hit average, looking at word usage distributions, he really does. I can only offer suggestions for where all those words come from–like I did above–but it’s almost certainly so.

Now I fully agree that my ~4000 number could be wildly incorrect, but this is going to be an issue of eyeballing rather than the methodology. If you can match two language distributions up based on what people write and know the average from one, you’ll know what the average of the other will be.

I was simply addressing the fact that to me, 4000 characters is too high an estimate for average readers. There’s a significant difference, however, between the number of characters people can read and the number people use themselves in writing, and a still greater difference with the ones they can handwrite. Now, of course, the thread I linked to was far, far from a scientific study, it was just somewhat relevent to the discussion.

To do any sort of study, you would have to specify what you actually mean by “knowing” a character. A lot of the really tough questions on the kanji quizzes on television that Sublight was talking about involve unusual readings of very common characters. If you can’t read 会ま, does it mean you don’t know 会? How about 集る or 存える? Furthermore, like you point out, you might guess, based on context that 会ま might be tamatama, but can you really be said to know this character usage if you would never be able to use it correctly in your own writing?

I think what this highlights is just how much the complexity of Japanese writing makes a simple character count much less meaningful than for Chinese. Or a vocabulary count in any language, as a matter of fact. In Japanese, it’s character usage that’s most relevent.

That’s the take-home lesson, I think. To get a good measure of literacy you need to count words known, not merely characters known. Guaranteed for Chinese this would result in a much higher number of characters than Japanese. But even for Japanese it’s much higher than is really necessary (say a nice round number like 26 :D).

No, not just word count, character usage. One of the particularities of writen Japanese is that you can write some words several different ways. Sometimes, these are not interchangeable, for instance “to work” at a job is 勤める, to “work” by making efforts is 努める. Spoken, they’re both the same word (tsutomeru), but in writing you have to know which to use in context. Some other characters are interchangeable, for example, “to say” is almost always writen 言う nowadays. Still, some authors use 云う instead. Sometimes there are very common words (tashika – “certain”) writen with a rare alternate character (慥か instead of 確か). And, finally, like in my previous post, there are common words uncommonly written with common characters. Tamatama is basic vocabulary that either means “by chance” or “precisely.” 会 is a beginner-level character that’s used extensively. However, probably very, very few people know that you can write tamatama 会ま, and still fewer that this usage is only valid for the second meaning of the word but not the first.

I’ve studied both Japanese and Vietnamese, and am pretty good with languages. Japanese isn’t easy, but I always felt I was making headway and could communicate in a very basic way. Mastering Japanese would be more difficult than mastering Vietnamese.

That said, after years of trying, I gave up on learning Vietnamese. I can’t break through the wall to understand/speak in even the most basic way. It’s not just the tones. Vowels and consonants have too many subtle variations in pronunciation and emphasis. Plus each speaker seems to have his/her own radically varying regional pronunciation of each word. I still can’t really hear tones, which is probably my own handicap. (When I listen to music, I can either hear the music or comprehend the words, but not both at the same time.)

A native English speaking friend who is fluent in Japanese said that learning Vietnamese was much more difficult. I haven’t tried myself.

I’m trying to learn Chinese, but under a vastly different circumstance than when I learned Japanese, in a formal school. The tones are more difficult for me, but if I were to study full time it may be something which you get used to. It doesn’t help that my Chinese teacher doesn’t speak English and teaches me in Japanese. Since the word order is closer to English, I have to translate the Japanese to English and then think if Chinese.

For the kanji, Sage Rat’s reasoning doesn’t seem to hold. As others point out, there isn’t a one-to-one relationship between words and kanji. The relationship doesn’t take into consideration the number of combinations of kanji.

I’ve been wondering about this. Since starting to learn Chinese myself, I’ve done a fair number of language exchanges with native Chinese speakers, and they’ve brought up that you’re supposed to pronounce “the” differently depending on vowels or consonants, but I’ve never done this, nor do I recall ever hearing it mentioned in school. Is this something that’s true in British English but not American English, or was I just not paying attention that day in class?

I think it may be dialectal, but it may also be something we as native-speakers just haven’t thought much about and don’t recognize it when we do it. There’s all sorts of things about English that seem so natural to us, that we don’t spend much time examining it. For instance, the ‘th’ sound is very common in English, but even native-speakers might not notice that we actually have two ‘th’ sounds: voiced dental fricative /ð/ and the unvoiced dental fricative /θ/.

I was in Vietnam recently and I couldn’t get half the pronounciations correct, and I can speak/write Mandarin. Vietnamese has far more tones than Mandarin and the tones are quite different, at least different enough to sound odd to my ear.

Really? If you say “the apple” out loud, the “the” doesn’t sound different from when you say “the grapefruit”?

Doesn’t no. I use “the” sounding like “thee” when I want to stress the singlehood of something. Otherwise the is “thuh”.

“Man this isn’t the pizza. This is thee pizza!”

I don’t remember it in school, but it’s certainly pretty common to my experiences. You’re saying you say the “the” in “the bat” and “the ant” identically? To me, that’s just as difficult to pronounce as the “a” in “a ant.” I would never pronounce “the ant” as anything other than “thee ant.”

The only other time I’d use the “thee” pronunciation is when emphasizing something. “He is THEE best baseball player in the whole world.”

ETA: Sage Rat, why switch between a and an if not the and thee?

shrugs Cause it’s how people where I learned to talk, talked.

Yup, I say them all the same. Sage Rat apparently uses it the same way as I do. Thee is reserved for emphasis. Unless I’m singing along with the radio, then I mimic whatever the original singer is using.

Personally, its just “harder” to use the regular “the” sound when its in front of a vowel. Its a natural thing, I dont believe I’ve ever been taught to do that. Since I’ve noticed it, I’ve often wondered why that is. I believe it has to do with how your mouth forms the sound. “Thee” just flows easier before a vowel.