Chinese keyboards

A few corrections to the column (recently reprinted in the email newsletter) about how Chinese is entered on keyboards. It mostly gets the answer right, but is a bit outdated.

Phonetic input methods have gotten a lot smarter over the years and now have awareness of Chinese grammar rules and word frequencies. So in a lot of cases you don’t have to select each character individually; you enter the phonetic representation (pinyin) for the entire sentence or phrase you want to type, and the software automatically figures out which characters would form a comprehensible sentence that would be pronounced that way. It doesn’t work 100% perfectly and sometimes you do need to select characters (especially when entering proper nouns) but it’s reliable enough to be a significant time saver.

Even when you’re not entering full sentences you will usually enter an entire multisyllabic Chinese word rather than just a single syllable of a word, which almost always eliminates the need to scroll through lots of possible matches.

For common words, you can usually just type the first letter of each syllable. There are enough shortcuts for enough heavily-used words that phonetic text entry is often quite fast. And of course you can combine the shortcuts and sentence/phrase entry to further speed things up.

An example of what a dramatic difference this can make: here’s a very simple sentence, “I like to listen to classical music,” entered using the Mac phonetic input system. The Windows one is comparable.

Simplified Chinese characters: 我喜欢听古典音乐

Keystrokes to enter this one character at a time as described in the column: wo2xi6huan3ting1gu2dian<down arrow><down arrow>6yin1yue<down arrow><down arrow>5

Keystrokes to enter this using modern-day phonetic input: wxhtgudyy

The column alluded to alternate input methods, but I think it oversold the difficulty somewhat. A popular one in Taiwan, for example, is “Cangjie” which is based on assigning keystrokes to the shapes of different sections of the characters. If you know how to handwrite the characters (which an educated native speaker does) and a small number of rules, you can do the equivalent of English’s “sounding out a word” to figure out how to type it. Most characters take four or fewer keystrokes in Cangjie, comparable to the number of keystrokes required for an English word.

Using Cangjie or one of the other non-phonetic input methods, a good typist can enter Chinese text at about the same speed as a good English typist entering the equivalent English text – written Chinese is usually much terser than English so it makes more sense to think of this in terms of “number of keystrokes per idea” rather than “number of keystrokes per character.”

The article in question

Interesting response, but it seems to me that in creative writing (keep in mind that the only language I can speak is English) you might want to write a certain sentence a different way than the computer is going to write it. Wouldn’t that make writing (for example) a novel a lot more difficult?

The column is from 1995, so no doubt is outdated. I’ll see if Cecil thinks it’s ripe for a revision.

I asked my niece, who knows from Chinese (and then some), what she thought of the column. Reprinted with her permission:

*Fairly well done, but he makes it sound way harder than it is. We’re all familiar with autocorrect software at this point. When I was taught to type Chinese in high school, the same kind of software was already long since standard (so standard it came bundled with every Microsoft computer).

You don’t actually stop and calibrate after every syllable, rather, you type a sentence (using pinyin input), and the program guesses the most likely characters. At the end of each sentence (or sooner, if you choose), you check back and fix the wrongly guessed ones. The standard software–even in 2005–would learn from a given user, so folks doing technical writing with lesser-used terminology might have a slower time at first, but would eventually level out. The software (incorporated into things like Word, and browsers) leaves a wiggly underline under your most recent text to indicate you have yet to OK it. As I remember it, confirmation was as simple as a double-tap on the space bar.

Japanese is a whole other thing; all I know is that friends can still type quite fast in either.*

Oh, and an earlier discussion: http://boards.straightdope.com/sdmb/showthread.php?t=705760

Maybe the Chinese people are scrutable after all !

Generally, Japanese type in hiragana (which is just 46 characters), and software will suggest kanji (Chinese characters) for them. They can even do this on phone keypads, with each number corresponding with 5 hiragana, just as in English each number corresponds with 3 letters. https://en.wikipedia.org/wiki/Japanese_input_methods

Most people under 50 use romaji input on computer keyboards, not hiragana. Japanese keyboards come with hiragana printed on the keycaps, but I’ve only seen old people actually use that layout, which is a vestige from typewriters. The romanized input shows up as hiragana on-screen, you hit space to select down through options. Like Chinese, there’s some degree of prediction. As far as programmers are concerned, Japanese, Korean, and Chinese are conceptually lumped together and referred to as CJK input.

The offhand remark that the written form of Chinese doesn’t vary much leads to an interesting behavior you sometimes see in groups of Chinese people who speak different dialects. During conversation, sometimes one person speaking a particular version of Chinese won’t know how to say a particular word so another person speaking a different dialect. In that case, they sometimes draw the character in the air to make the meaning clear. Interesting to watch, particularly between Taiwanese and Hong Kong folks, who have quite a dialect gap to jump, but otherwise similar cultural references…

For a time, long before the modern methods were feasible, IBM tried to get Japan simply to switch outright to Katakana.

MODERATOR NOTE: Since there were two threads on the same topic, I’ve merged them (at Post #4).