There’s a thread going on right now about why the alphabet is ordered the way it is. It is, of course, a result of accidents of history; the order of the alphabet is completely arbitrary and does not incorporate any kind of organizing principle. (As I mentioned on that thread: If we were to re-order the alphabet we should do it along phonetic lines, with all the vowels first, then the glottal consonants (k and g), then working forward through the mouth to the labial consonants (p, b and m). But I digress.)
Wherever it came from, alphabetical order sure is useful, for organizing AND RETRIEVING large quantities of data of any kind, from names in telephone directories to subjects in encyclopedias. It’s so useful that it seems hard to imagine how any modern literate civilization could get along without it.
Yet some cultures, most notably China and Japan, do not use alphabetic script, they use a very large set of ideogram characters, one for every word in the language (more or less). So how do they organize their phone books and encyclopedias and so on?
Not sure about Japanese, but in China, characters can be ordered by a) number of strokes amd b) the order in which strokes are written to yield a given character. That’s the way it’s done in Chinese dictionaries, anyway.
You can go to http://www.mandarintools.com/chardict_rs.html and see the radicals for the characters laid out. Japanese Kanji are Chinese characters and thus use the radicals, traditionally, for ordering them. In addition to the radicals, one counts the strokes used to make the rest of the character and then, armed with that information, can find the character in the dictionary.
IMHO, a far better system is the SKIP (Simplified Kanji Indexing by Pattern), explained here & with a sample here. If those links don’t work, go to http://www.kanji.org and browse; there’s plenty of information there!
Japanese also uses Kana, which are a syllabary, and that’s put into a particular order which one could call alphabetic.
Chinese is alphabetized when they need it to be— e.g. in sorting and retrieving data. That’s what Pinyin is used for. Ever wonder how to type in Chinese on a computer? They type the Pinyin for a character and then a menu with the different characters corresponding to that Pinyin spelling pops up; the typist selects the desired character and hits Enter. There it is. The characters can always be sorted according to the alphabetical order of their Pinyin alter egos.
I have a couple Chinese-English dictionaries sorted by Pinyin alphabetical order. There’s also a set of tables for looking up a character’s Pinyin if you don’t know it already. The first table is the radicals, sorted by number of strokes; in the second table under each radical are the full characters, again sorted by number of strokes.
I have seen old Chinese dictionaries from before Pinyin was invented. They are sorted in stroke order, but whew! it takes me forever to look up a character that way. First you have to figure out which part of the character is the radical. In some characters the radical is clearcut, but in other characters there could be found more than one candidate for “the” radical. So if you don’t find it under one, you have to try the other. To really know the exact number of strokes, you have to know the order in which they’re written. So if you haven’t taken any Chinese calligraphy classes, looking up characters can be excruciatingly hit-and-miss. Pinyin is a Great Leap Forward in information-science efficiency.
This can get to be a nasty problem, actually. You might want to do research under “collating sequences”. Sometimes, ordering used in a particular language or culture depends on multicharacter sequences, or is preferential to characters in the middle of strings, and cannot be determined character by character.
For instance, Thai is tonal, and also, the written language places vowels in an odd fashion. Alphabetical order is based on the first consonant in the spelling of a word, even if the spelling begins with a vowel. The vowels are sorted second, third the tone marks.
The ß character in German is considered equivalent to “ss” in ordering as well as pronunciation, so it should come between “sch” and “st”, character sequences that occur often in German words.
A minor version of this sort of thing sometimes occurs in English - you sometimes see names beginning with “Mc” moved to the front of the M’s on lists of names, so that “McGruder” comes before “Mathews”.
yabob: I believe the last point you mentioned is called “Library Filing” or something along the lines. Years ago, way back before the desktop computer came into being, let alone prevalent, I did volunteer work at the local library wherever I happened to live. The first thing the Librarian told me was always, “Make sure you file the Mac, Mc, and Ma names in the front of the M.”
This was formerly done more than it is now; done in Britain more than in America; and done in Scotland more than in England. The name of the rule was “Mc as if Mac.” The Mcs and Macs were treated as the same spelling and interfiled. For some reason, I don’t know why, the Scots indexers argued vehemently to preserve this rule, after the other delegates to international indexing conventions decided to drop it. (I used to index books for a publisher, can you tell?)
Gotta watch out for how different languages alphabetize characters with diacritics. For example, in German, ä, ö, and ü are alphabetized right along with regular a, o, and u respectively. But in Turkish, ö, ü, and the dotless i are treated as separate letters with their own places in the alphabetical order. One thing that always trips me up is how in Finnish and Swedish, ä and ö are displaced to the very end of the alphabet, following z. I can never get used to looking for ä at the end instead of the beginning. What were they thinking? In Pilipino, they write the /k/ sound with k, not c. However, k is alphabetized in third place, between b and d. Where you would expect c to go. I think I know why they do this; it’s a hangover from Spanish influence as the letter c is used to write the sound of /k/ in Spanish.
For Chinese, bopomofo (which equates to the Japanese katakana) never really caught on because it was used as a learning tool. Unlike the Japanese katakana, which is widely used to denote foreign words.
I’ve heard that Taiwan has dropped bopomofo (zhuyin fuhao) in favor of pinyin.
A lot of common stuff in China like phone books are arranged in pinyin order rather than stroke order. As pointed out in an earlier post, this is so much easier – at least for a non native speaker of Chinese. For looking up words you know how to pronounce in the dictionary, I can find it much faster with pinyin than a native speaker can using radicals or stroke order.
The main problem with Pinyin is the same as that of all other phonetic languages, that the memory of a word (character) is not the same as the memory of sound of said word (character). That’s why when I look at things written in Pinyin, it take sme far longer to realise what they are about.
Not least because any given Pinyin could stand for several different characters! That’s why it hasn’t replaced characters entirely, and why it never will.
I thought Taiwan was still using Wade-Giles romanization. When did they start using Pinyin?
The Library of Congress romanized their enormous Chinese holdings according to Wade-Giles up until only a few years ago, when they finally switched to Pinyin. They had been wishing to convert to Pinyin for a long time, but the superhuman size of the task daunted them. Of course, the longer they put it off, the huger it got. I guess they eventually developed the technology that made the conversion easier. When I cataloged in a library with Chinese holdings a few years before LoC switched to Pinyin, I felt guilty having to use Wade-Giles, because it meant nothing to the young Chinese college students who had never seen it. So I made added title entries all in Pinyin just to accommodate them. I’m glad LoC finally got that 800-ton gorilla out of the way.
For European languages, the sort order can be found at this page (warning, the links on that page go to short pdf pages).
One thing to note is that some European languages have what they call secondary order. What this means is that two different forms (say an A and A with a diacritic) are interfiled unless the word is otherwise spelled the same. Then one of them will always be sorted first.
To be specific, umlauted vowels in German are alphabetized as if they were spelled with an E following them.
Yes, and the capital dotless i is the same as the regular capital I, while the dotted i has a dot on its capital. Recently, I was doing some automatic searches for palindromes among Turkish placenames and had to make some special mods to my program just because of this. The regular tolower() and toupper() functions do not work correctly for the letter I in Turkish.
They’re separate letters, but why they had to put them at the end is unknown to me. Maybe it was just to make sure that they are treated as separate letters and not as a variant of the regular letter.
One thing I wish is that the four Scandinavian languages had agreed on which extra letters to use. Swedish and Finnish use å ä and ö in that order, while Norwegian and Danish use æ ø and å in that order.
Most probably it was because the main part of the alphabeth was regarded as an international standard.
Alphabetising as a whole is a very tricky matter. Just think of all varities with different diacrits. In which order do you place them? (the answer is that we disregard the diacrits, at least I do. Perhaps I should put them in some internal order, but I don’t think anyone cares) To my knowledge there is no standard as such in Sweden. I once sat in a working group with representatives from all sorts of establishments with an interest in sorting with the goal to establish one. In the ene it just petered out as we all had different rules because of our different needs. Libraries, for instance, do not group ü together with y as is shown in the link you provided, but with u. The telehone directory, however, does.
How do people who speak Cantonese and no Mandarin type? I’m shocked by the number of people in Hong Kong who can’t type Chinese. If I can type Chinese they should be able to! The lack of a standardized Cantonese romanization system (and the fact that, despite knowing the alphabet, native speakers’ stabs at romanization are generally horrible; who told restaurant owners to romanize ‹L as “kee?”) seems like it would make things difficult.
In Japanese dictionaries, order is determined by the 1st syllable of each entry, then the subsequent syllables and their position in the syllabary. – like the alphabet. This is called the “aiueo” order.
ie a i u e o
ka ki ku ke ko
sa shi su se so etc.
However, the syllables with voiced consonants eg ga gi gu ge go are not given their own row; instead they are treated as though the consonants are unvoiced, and so you would see “kaku” (write, draw) near “gakusei” (student).
There’s also the iroha order, based on an old Japanese poem that never repeats the same hiragana.
“Iro wa nioedo, chirinuru wo / Waga-yo tare-zo tsune-naran? /Ui no okuyama kyookoete /Asaki yume miji, ei mo sezu /” (as spoken)
The actual order is as follows
" i ro ha ni ho he to chi ri nu ru o wa ka yo ta re so tsu ne na ra mu wi no o ku ya ma ke fu ko e te a sa ki yu me mi shi we hi mo se su",
of which the syllables “wi” and “we” have gone out of fashion. “n” is ignored and that is ok because “n” does not start any word in Japanese.
The discrepancies in the hiragana and the spoken form of the poem is largely due to the evolution of the Japanese language since the poem was first written.
Floater, this may be of interest to you. Vietnamese (which uses a latinate alphabet called Viê.t Ngu+~) does consider the letters with diacritics (discounting tones for the moment) to be separate letters from the ones without diacritics, and thus those letters come in a particular order in the Viê.t Ngu+~. The tone markers are also considered, in a manner of speaking, to be letters.
For example, to spell the Vietnamese word for the First Person Singular Pronoun, tôi, one recites its spelling this way:
[ul]ô
ô, i
ôi
t, ôi
tôi[/ul]
One word for ‘bad’ has the same basic letters, but has the rising tone, so the recitation for its spelling is:
[ul]ô
ô, i
ôi
t, ôi
huyên tôi[/ul]
BTW, it took me many years to learn that spelling recitation Vietnamese children have no problem at all learning.
What xejkh is referring to, I believe, is the so-called “Syllabic N” in Japanese, which is a specific kana used to denote the /n/ at the end of a syllable. Words that begin with ‘n’ in alphabetic systems begin, in the Japanese syllabary, with na, ni, nu, ne, no.