Typing in Chinese

The Straight Dope web page recently featured a Straight Dope Classic column about how to use a keyboard to type Chinese characters into a computer: http://www.straightdope.com/columns/951208.html

The article captured much of the difficulty of dealing with Chinese characters, but it didn’t capture one important mitigating factor: each Chinese character is, linguistically, about equal to a word in Western languages.

So it takes a lot of keystrokes (or pen strokes, or brush strokes if you’re really into it) before you get a finished character on your screen (rice paper, bamboo strip). But when you’re done you have the information content of a whole word.

I’ve done some informal comparisons, and it takes me an average of just about 5 keystrokes to enter a Chinese character into my computer. This is almost exactly the same as the English standard of assuming a 5-character word for counting words-per-minute in typing.

The data-entry rate is still slower than English, because there’s a recognition step where you have to look at your choices and select a character - this blows the whole touch-typing approach. But overall it’s not as bad of a situation as it seems at first.

There’s even less difference when it comes to handwriting. This is particularly true of mainland Chinese, who use simplified characters that have fewer strokes than the traditional characters used in Taiwan or Hong Kong. Also, when writing by hand, the Chinese use a script form (similar in nature to our cursive writing) that elides some of the strokes and rearranges them to allow for a minimum number of times that you have to lift the pen from the paper.

(Note to aspiring Chinese learners: yes, this means that you not only have to memorize upwards of 5,000 characters to be literate, there are also up to 4 different forms of each character in common use. Prepare for some serious flash card work).

Anyway, if I ask my Chinese friends to hand-transcribe something, and I just count words-per-minute, there is no appreciable difference in writing speed as compared to English. Sometimes I actually think they’re faster.

(Background: I’m an American who has learned to speak, read, write (and type) Mandarin).

Nice post.

From the article, fourth sentence:

But that was an informative bit about the Chinese “cursive”.

Yeah, I had seen that Uncle Cecil had mentioned the fact that Chinese characters = English words.

It just seemed that many people might not see how far this fact goes in leveling the field when it comes to data-entry throughput.

In particular, I thought a line later in the article might prove especially misleading:

…the alternative is to write out your damn language
longhand. This is even more of a pain, since one Chinese
character can have as many as 36 strokes. (Max per
English character: four.)

I’m sure Cecil was just trying to impress upon us the complexity of Chinese characters. But w.r.t. information throughput, it’s important to keep the character/word equation in mind.

(I also noticed a mistake I made in comparing Chinese typing to English typing. In English, you need to add a space character after each word, which you don’t need in Chinese. This brings the average keystrokes for English words up to 6 - %20 more than the average for Chinese.)


None of this is meant to detract from Uncle Cecil’s explanation. I’ll take a moment to express my admiration for Cecil - and not just because it’s part of the schtick.

I’ve been reading TSD since 1980. Until I found the web site, I kept a mail-order subscription to the Reader just so I’d always get the column.

I think Cecil’s column is nothing less than an asset to society. The most obvious thing, of course, is the sheer amount of information that’s just hard to find anywhere else. But I think even more important is the example he sets of a reasoned approach to understanding the world around us.

He provides an example of how to be intellectual while keeping a solid common sense. He shows how to maintain an interest across a wide variety of academic disciplines, while showing real insight into what makes people tick. And he keeps a clear and balanced perspective on any issue, even those that are traditionally the most obfuscated and divisive.

In a way that’s hard to define entirely, I consider myself a better person for reading TSD. And I certainly have a lot more interesting tidbits to use in conversations around the coffee machine.

Reading your last post, charizard, reminded me of a question I had from Cecil’s column. He says:

How does he define “strokes”? If I define a stroke as “not lifting the pen from the paper”, and if I assume you don’t want to write over the same line more than once, I can’t think of an english letter that takes more than three strokes.

If by “strokes” he means different straight lines, then would, for example, capital B count as only three strokes, and capital M and W count as 4?

The way I write, I would count capital M and W as one “stroke”, since I don’t have to lift the pen from the paper.

Arnold, the question of how to count strokes is tougher than it seems at first:

ENGLISH BLOCK LETTERING: After a little experimenting, it seems to me that we probably have to use a definition similar to “Number of distinct (straight or curved) lines in the finished character”. So, the letter M has four strokes, and the letter L has two. In everyday writing, many people would make several these strokes without lifting their pen, but I don’t think we want a count that depends on how individuals write.

CHINESE BLOCK LETTERING: Here, it’s very well defined: you start a new stroke each time you lift your pen. Strokes can change direction; the most complex single stroke has 2 right angles plus a hook at the end.

CURSIVE LETTERING: I think that, for both Chinese and English, the most useful definition of a stroke is “what you can do with one smooth motion.” That is, you count a new stroke every time you have to stop and change direction.

But I think we have to modify this a bit, and say that you have to count at least one stroke per character. Otherwise, a word like “eel” only has one stroke, since it consists of loops that run together without hard stops. Intuition tells me that this word has three strokes.