Rules on alphabetization

Forgive me if this seems dopey, but it has been over a quarter century since I’ve been to school (college wasn’t school. it was a kegger!).

What is the rule when making a list alphabetically, when one word contains a hyphen?
Why does Wal-Mart come before Walgreens in the business white pages of the telephone book?

Symbols are usually sorted first, before numbers and letters. Thus, “-” has a higher value than “g”.

My last name is McCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
and it always gets screwed up on alphabetizing

Sometimes it’s alphabetized as two words as in Mc Cxxxxxxxxxx
and sometimes it’s alphabetized as one word… McCxxxxxxx

Where that comes into a problem is when searching on a computer by last name. Since you never know how it was keyed in, I frequently just have to search by Mc, and my first name.

E3

Many alphabetizations are now done using all characters in a string, according to their values in the ASCII table.

Prior to computerization, the general rule was punctuation followed by lower case letters followed by upper case letters. (It was that convention that contributed to how the ASCII assignments were made for the various characters.) Subsequent to computerization, the lists follow the ASCII (or, on IBM or Burroughs mainframes, EBCDIC) patterns. Interestingly, the rules for numbers were not consistent, as demonstrated by the fact that ASCII places numbers before letters while EBCDIC places numbers last of all. Find text books from thirty years ago and you will see numbers placed at either the beginning or end of the index in different books.

There are no “rules” for alphabetization. There are only styles.

Styles come in all shapes and sizes. Some call for strict letter order across entire entries. Some separate out entries with breaks - whether a space or a hyphen or other not-letter character - from those that are longer.

So whoever is putting Wal-Mart before Walgreen is emphasizing the Wal over the Walgreen.

As long as you apply a style consistently it doesn’t matter which one you use.

Yeah, computerization has made the whole thing a lot more complicated.

Before, i was always taught that Mc and Mac were both alphabetized as if they were “Mac.”

So, alphabetical order would be:

Macadam
McDonald
Macpherson
Marsden

Now, of course, a computer would order them:

Macadam
Macpherson
Marsden
McDonald

With numerical order, it’s got even more complicated within computing. For example, on my old computer, running Windows ME, files with the following numbers would have been ordered in the folder thus:

01.jpg
011.jpg
1.jpg
12.jpg
23.jpg
4.jpg

But Windows XP’s file arranging system attempts to sort numerically-named files in a more intuitive manner, so on my current computer these files are ordered:

01.jpg
1.jpg
4.jpg
011.jpg
12.jpg
23.jpg

It’s all very confusing.

How would you alphabetize the artist formerly know as Prince. :smiley:

Again, no matter what you were taught, this is just one example of a style. Not every alphabetical style put Mc and Mac together. Many did not.

The confusion comes from bad teachers teaching styles as if they are rules.

And you’ll notice that i never said “the rule was.” I said “I was taught.”

You yourself have said that “As long as you apply a style consistently it doesn’t matter which one you use.” Well, the best way to be consistent is to stick with a single style unless given a good reason to do otherwise. This is probably why my school teachers figured that they would teach us one style, so at least we would all be consistent in school. That’s not bad teaching, it’s smart teaching.

To clarify: Does this mean that the order was abc…xyzABC…XYZ, or that it was aAbBcC…xXyYzZ?

I’ve also seen M[sup]c[/sup] treated as a single character, which falls in between M and N. Occasionally, one even sees names starting in “Mac” treated as if it were this special character.

And to follow on to mhendo, OSX 10.4 uses the “new” system of sorting numbers, too. It’s nice when foo10 follows foo9 instead of being between foo1 and foo2, but it’s annoying when you get bar01 bar1 bar02 bar2 bar03 bar3.

abc…xyzABC…XYZ
Thus:

benAvram
binAbdul
Binabally
Subject to Exapno’s caveat regarding styles vs rules, you can see how they tended to work by looking at the ASCII chart. As noted, M[sup]c[/sup] and M[sup]ac[/sup] were often given separate treatment that did not conform to a consistent pattern across all usages. Similarly, O’Donnell would tend to have appeared before Odalisque (the apostrophe trumping the capital D), but it really did depend on the style of the publisher. (For example, benAvram and binAbdul were liable to be treated as Benavram or BenAvram and Binabdul or BinAbdul by publishers unfamiliar with Middle Eastern nomenclature.)

But what happens when you get out in the real world and run into a style different than the one you are taught?

Teaching one thing when the world uses a hundred is bad teaching, no matter how you try to justify it.

Teaching (and requiring students to remember) 100 different styles for alphabetizing would be even worse teaching.

Actually, many of those will sort numbers as if they were written out. That is, the number 1 will be sorted between ond and onf[sup]1[/sup] even if written as a digit. Dictionaries still do this. For example, M-W Collegiate Dictionary has the entry 12-step immediately after twelve-month and 24/7 immediately after twenty-fourmo. Note that most dictionaries also ignore punctuation and internal spaces in their alphabetizations.
[sup]1[/sup] M-W doesn’t have any words starting with either ‘ond’ or ‘onf’. The words immediately before and after the ‘one-’ section are oncoming and ongoing, which look like they should be antonyms, but aren’t.

What, no Ondine?

Only with the undine spelling.

Back about 25 years ago, part of my job was filing cards into library catalogues, so I can remember what the rules were then, and I’ve seen them change as computers took over the job.

Basically, it has always been filing word by word. (Though most dictionaries file letter by letter, so that it doesn’t matter if you have “black bird”, “black-bird” or “blackbird”). However, back in those card filing days, the library had a rule that hyphens were treated as spaces if each part was a word, and ignored if one of the parts wasn’t a word. Of course, that’s a rule that you can’t give to computers.

We used to file “Mc” as “Mac” – that’s another rule that’s gone.

Numbers were filed as if written in full in the language of the title*. So you hads to be a bit of a language expert: the movie title “8 1/2” was filed as “Otto e mezzo”, because, of course, it was in Italian. I remember have a Serbian title starting with a number, and having to work out what the number was in Serbian … computers have taken all that fun away from us.

And then there is the problem of abbreviations like “U.S.A.”: do you file that as if it is “Usa” or as if it is “U S A”?

This gets even more fun when you begin to consider non-US character sets.

The new international standard for encoding and storing alphabetical (& syllabic) characters is something called Unicode, which includes details on such things as how to alphabetize Thai.

The technical term for “alphabetize” is “collation”, and here’s more than you’d ever want to know. http://www.unicode.org/reports/tr10/ . The first couple of pages-worth of text give a good non-technical introduction into the problem of internationally standardizing somethng that’s far from standard. Even with a common alphabet, the Germans collate differently from the Swedes.

Such is life in the globalized e-commerce world.

Have any of you jokers ever been teachers?

You teach the concept of alphabetizing schemes; you teach examples of different types of alphabetizing schemes; you assign lessons that involve using one of those examples of alphabetizing schemes.

What you do not do is say, here’s the way to alphabetize*.

To be fair, I have no way of knowing whether mhendo’s teacher ever did anything that wrong or whether that’s just what got retained out of that class. What teachers teach and what pupils hear are often two wildly disparate things.

  • except to say that “you will use this method in this class and not the others”