How did this change in the Spanish alphabet affect software?

When I studied Spanish in middle school years ago, I was taught that CH, LL, and RR were considered distinct letters in the Spanish alphabet. Years later, when I did some internationalization work on a Java database product, I learned that each locale had its own alphabetical sort order, and that the Spanish locale treated CH, LL, and RR as distinct letters for the purposes of ordering. So, for instance, the sequence ABCHD would come after ABCD and before ABCE (because CH comes between C and D).

Someone just told me that, as of 2010, CH, LL, and RR are no longer considered letters on their own by the Real Academia Española.

First, is that true?

Second, if so, has the change been adopted by software internationalization standards?

Third, if it has, what effect did this have on software and data? For instance, if a database has an index on a Spanish character column, and the sort order changes after the index is created, wouldn’t that mean the index is in the wrong order?

This sort of change used to set off alarm bells in my head when I was a software engineer. Can someone tell me whether this really was a problem, and if so, how the development world dealt with it?

I assume they made this change specifically to conform Spanish alphabetization to the rest of Western Latin alphabetization in this modern computer / database age, at the cost of making a mess of all existing software and data?

I don’t know specifically about Spanish, but a very similar problem exists for the umlauts in German. To my knowledge, there are at least three different conventions for sorting German words that contain umlauts:

  1. Converting the umlaut into the corresponding vowel followed by e (so that ä would become ae etc.) and sort accordingly;
  2. Treat the umlaut as a distinct letter sandwiched, in alphabetical order, between the corresponding vowel and the next letter (so that ä would be after a but before b);
  3. Ignoring the umlaut completely and treat the word as if it simply contained the plain vowel.

#1 is the most common convention in Germany itself. #3 is, I think, frequently used internationally. #2 is rather old-school.

Then again, German doesn’t have a central prescriptive rule-setting body the way Spanish has the Real Academia, so usages may vary considerably.

I actually suspect that it was quite common for software not to implement it, likely more common than software that did. And the ones that did would be the bigger companies who would still be releasing new products, and so could update their behavior.

Here is the statement from the Academy:

Se excluyen definitivamente del abecedario los signos ch y ll, ya que, en realidad, no son letras, sino dígrafos, esto es, conjuntos de dos letras o grafemas que representan un solo fonema. El abecedario del español queda así reducido a las veintisiete letras siguientes: a, b, c, d, e, f, g, h, i, j, k, l, m, n, ñ, o, p, q, r, s, t, u, v, w, x, y, z.
[…]
Al tratarse de combinaciones de dos letras, las palabras que comienzan por estos dígrafos o que los contienen no se alfabetizan aparte, sino en los lugares que les corresponden dentro de la c y de la l, respectivamente. La decisión de adoptar el orden alfabético latino universal se tomó en el X Congreso de la Asociación de Academias de la Lengua Española, celebrado en 1994, y viene aplicándose desde entonces en todas las obras académicas.

Deepl(dot)com translates this quite acurately as:

The signs ch and ll, are definitively excluded from the alphabet, since they are not really letters, but digraphs, that is, sets of two letters or graphemes representing a single phoneme. The Spanish alphabet is thus reduced to the following twenty-seven letters: *a, b, c, d, e, f, g, h, i, j, k, l, m, n, ñ, o, p, q, r, s, t, u, v, w, x, y, and, z. *.
[…]
Since these are two-letter combinations, words beginning with or containing these digraphs are not alphabetized separately, but in their corresponding places within c and l, respectively. The decision to adopt the universal Latin alphabetic order was taken at the X Congress of the Association of Spanish Language Academies, held in 1994, and has been applied since then in all academic works.

Just nitpicking on the OPs original statement:

So, for instance, the sequence ABCHD would come after ABCD and before ABCE (because CH comes between C and D).

is not necessarily true, as it is not clear whether ABCHD means A - B - Ch - D in the old Spanish notation or if it means A - B - C - H - D. Not only Ch, but also C and H were letters. The new rules (well, new as of 1994) prevent this ambiguity, which should, I believe, suit a programmer fine.

The whole point IIRC is that CH, LL and RR have specific sounds/dipthongs/whatever and acted in writing essentially like things that would have their own letters, but just used combinations of existing letters.

So why bother reserving keyboard letters for a “LL” rather than just have the typist type two "L"s in succession and produce an identical printed word?

It’s sort of similar to what happened with the English letter þ being replaced with “th” over time.

The coolest thing to me from this is the new word (to me) abecedario.

I’m used to hearing the humdrum alfabeto in Portuguese (and apparently Spanish), but that doesn’t roll off the tongue nearly as neatly as abecedario. Kind of like talking about our ABC’s instead of the alphabet.

And it looks like Portuguese has it as well–abecedário, with an acute accent over the last “a”–though it is probably the ten-dollar word that most folks don’t use.

No, I don’t believe we are talking about keyboards at all. That was the discussion about the letter “ñ”, and that was embarrasing, so I won’t go into it. Want to read a bunch of lies? Here. Everything about this article is wrong. And it is just a couple of months old.

My pleasure :grinning:

Contemporary sources say otherwise.

Spanish Alphabet Loses Two Letters - Los Angeles Times (latimes.com)

Keyboards are the things were you hit a key and something appears on your screen, right? There was never a key for “Ch”, “Ll” or “Rr”. You seem to claim that, that is not correct. You always had to hit twice, even with the old mechanical devices with the ink ribbon. But there is a key for “ñ” (or “Ñ”) in Spanish keyboards.
I read your sources and don’t see the relevance to what I am saying in reply to what you wrote. You seem to confuse alphabet and keyboard.

No, it isn’t. Not at all. That is my point.

What’s the difference of saying that CH, RR and LL are digraphs, and not actual letters/symbols, if not to simplify things for the purposes of keyboards, etc…? The article I read and a lot of other things I found say that it was done specifically to make things easier in the era of computers and keyboards.

Which is what happened with the þ; it was easier with the advent of the printing press (from Germany, where the þ isn’t used) to use existing T and H instead of an English-specific þ character.

A digraph is a combination of two graphs (single letters) that represent a sound. Like the English “sh” or “ch” (that can represent even two sounds, or phonemes: like in church (twice!) or like in loch or Bach). You can make a digraph with a keyboard pushing two keys, one after the other. You cannot make a “þ” pushing no matter how many keys. You need an extra key for that (and an extra ASCII code, or whatever is used today instead). The same is true for the German “ß”, the Spanish “ñ/Ñ”, the French “ç”… No matter how many keys you hit, that symbol will never appear on your screen (there are workarounds, as I have just managed to type them here, of course, often involving the [Alt]-key).
It seems to me you are still confusing letter/alphabet and keyboard/symbol.

Apparently , the 1994 change had mostly to do with dictionaries- the LA Times article doesn’t mention keyboards at all and ends with

It was taken mainly to simplify dictionaries and make Spanish more computer-compatible with English.

I suspect the significance of deciding they were no longer letters was for the purpose of filing, putting things in alphabetical order and dictionary type listings. From what I understand, when “ch” was a letter, there was a section in dictionaries for “ch” as a separate letter that came after all the words that started with “cu” and now the “ch” words come between the words starting with “ce” and those that start with “ci”.

I am sure that the separate section for “ch”, etc at least in some systems caused the same problems as the variants of “Mc” cause in American filing systems - some have “Mc” right after the “Ma” names and others have then right before the “Na” names. Some computers treat McDonald, Mc Donald and Mac Donald differently

But what do Spanish dictionaries have to do with English?

If you want another example, Swedish and German have different, incompatible, collation orders, but apparently that’s OK.

ETA the officially official statement linked to by @Pardel-Lux does not mention English at all.

As the OP says, it was about alphabetization. It was always easy to type ch, rr, and ll, using C,H, R, and L. But the rule originally was that each of those came afterwards.

As for ñ, that is still a separate letter, which comes after n. And since you can’t combine two characters to make it, it has a separate key on Spanish-language keyboards.

I was under the impression that, at some point anyway, Ñ = NN in Spanish. By what date was that officially no longer true?

Spanish dictionaries don’t have anything to do with English - I was describing problems that happen in American* filing systems where sometimes “Mc” is treated as a separate letter and sometimes “Mc” and even " M’ " are treated as if they are all spelled “Mac”. I am sure that dictionaries were not the only situation where “ch” and “ll” were treated as separate letters - but since everything I’ve read specified dictionaries, I assume that there are many situations where they were not treated as separate letters and that someone looking for the file of someone named “Chavez” might have to look in a couple of places depending on how that particular system treated “Ch”.

  • I don’t know for certain that it happens in all English speaking countries.

BTW, “rr” was not to my knowledge a standalone “letter” at any point in at least the 30 years before 1994.

I am reminded by this of the time I was in the Danish city of Aarhus, which I believe is the second largest city. I had a Jutland phone book (long time ago) which had the various towns list alphabetically. But I couldn’t find Aarhus. Eventually, it occurred to me that an alternate spelling was Århus, although that isn’t much used and Å is the last letter of the Danish alphabet. And there it was, practically the last town listed, at the very end of the directory. The point is that the Danes alphabetize any word with aa in it as though it was å. I think they do the same with ae, as though it was æ. I am not sure what, if any, digraph can replace ø, maybe oe. But those three letters end the Danish alphabet, with å = aa the very last.

The Dutch treat IJ as a single letter, even capitalizing both in proper names like IJsselmeer. I wonder how they alphabetize that. Handwritten it looks exactly like ÿ and they sometimes substitute that.

To be distinguished from I followed by J as in “bijou” and “bijectie”.

The standard/official/dictionary way is to sort it just as i followed by j. That does not mean you might not encounter other schemes, like ij = y, in a Netherlands telephone directory or some other text.