Why do some places on the internet replace apostrophes with odd characters?

Collation is inherently more complicated in Unicode than it is in ASCII. Unicode is closer to how complex it is in reality, once you move beyond only caring about a subset of English-language text.

Another thing which becomes more complicated is letter case: For example, what is the upper-case version of ß? Ask any German speaker, and they’ll tell you it’s SS. Two letters. However, the letter pair ss does occur in German text, so there’s no way to always know what the lower-case version of SS is. If you’re trying to do it with a simple algorithm, you lose. The real world doesn’t work like that.

(Interestingly, there’s a very rare character which is a single-letter capital ß: ẞ. It does occur in German text. It was not invented for Unicode. It is vanishingly rare, and essentially never occurs in modern German text.)

The upshot is, nontrivial text processing needs to be done using specialized libraries. This has, really, always been the case, it’s just that now, multilingual text is technically feasible, making it impossible to ignore some things the ASCII-only world let us gloss over.

The old roundtrip casefolding myth. It has led to much heartbreak. And did you know Unicode also has titlecase?

Unicode is hard. Really hard. Yet people still think they can handle text with cheap hacks from the punchcard days.

The only round-trip Unicode cares about is Unicode to other character encoding to Unicode: The text must be the same out the back-end as it was on the front-end. This has lead to Unicode preserving oddities and arguable mistakes from decades ago.

Yep. And the only way to justify such nonsense is the idiotic fact real languages have it.

In a way, I blame ASCII for having been designed so well: So many hacks work in ASCII: Want to go from a numeral to a number? Subtract ‘0’. Want to sort alphabetically? Sort numerically by character code. Want to flip case? Flip a bit. Easy-peasy, and you get enough of English to convince people that characters not in ASCII are weird or special somehow.

Now that world is gone, and good riddance. The habits die hard, though.

Found another one in the wild today.

From a Yahoo! news article:

The next sentence had properly displayed double quote marks. But the (apparently) single quotes and apostrophe failed to convert.

I always enjoyed the Turkish (dotted vs. undotted) * (upper vs. lowercase) I. Case folding is, as you’ve said, very far from trivial.