Why do apostrophes so often get screwed up on the WWW?

I’ve noticed this phenomenon for many years. Apostrophes get replaced with a string of random characters, and it seems to be a completely different set of weird characters every time. What causes this, and how has this problem not been fixed? (I’m seen it with even more frequency in recent months). I also have to wonder how it’s not caught before it’s published.

Here are a few examples: Here, here, here.

Encoding. It looks good on whatever tool you’re using, and possibly even in your browser when you view it locally, but turns into junk when viewed in whatever encoding the actual web-site tells the browser to use. It can also actually have been turned into junk by having steps from your editing to the finished file that can’t handle the original encoding.

you can press alt and 0145 on your numeric keypad to get a ‘ which looks like an apostrophe and doesn’t cause those problems…

There was a thread with detailed techie information on this question a couple years ago.

The same encoding problems crop up when you go between applications, operating systems or character sets of different langauges sometimes. Whenever we take text written on a PC in English Microsoft Word and drop it into a layout designed on a Mac in Japanese Illustrator, charaters like smart quotes, bullet points and long dashes often come out mis-coded.

I think what motivates these problems is that some software tries to use only a basic character set, more specifically the ASCII characters represented by the binary 0000000 to 1111111 (7 bits), and some other software tries to use fancier characters with longer representations. The ASCII character set includes a singlequote and a doublequote, and these are the same whether it’s the beginning or the end of the quoted phrase. Some software tries to figure out which end of the quoted phrase each of these is on and substitute different singlequotes or doublequotes accordingly.

This is also weird if you deal with programming text or work with the customary symbols for feet and inches, because in this context they aren’t for quoting and the distinction between starting and stopping a quotation doesn’t apply.

Blame it on Microsoft Word. People copy from Word and paste into web browser form. Word doesn’t play nicely with much of anyththing (other applications, other versions of itself, its own users, etc) and wreaks havoc with upper-ASCII characters.

OK, in all fairness to MS, it appears that they are finally going to support open standards for file format in upcoming versions of Word. Be nice if it’s true. Overdue but nice. Let’s hope they mean the open standard file format will be the default format for saving Word files and not just an obscure “Save As” option.

The first example has no required doctype so the browser flounders attempting to render the page. More importantly, the actual code is poorly written contributing to the morass.

The second one is a coding error, either with the CMS used or human error.

The third appears to bre human error caused by the contributor.

In all of the above the pre- and post-quality checks appear to be missing.

For the specific problem of an apostrophe being replaced by something random, I think the other posters have it. You can also end up with text missing, though, since the coding for some software assumes that an apostrophe is a single-quote, and may treat text in single-quotes in some special manner. Likewise for double-quotes. For instance, a few versions ago, vBulletin couldn’t handle double-quotes in thread titles. The thread title was stored in double-quotes in the database, and if there was a " somewhere in the title, it would assume that meant that was the end of the title. So, for instance, if I posted a thread asking

Why is " the symbol for inches?

The database would contain

“Why is " the symbol for inches?”

and when the rendering software would try to render that, it would think that the title ended at the second ", and you’d see

Why is

as the title of the thread. Similar things can happen with apostrophes, if that’s what the software is using.

To follow up on that tangent… I once submitted a joke poem to a poetry contest at poetry.com (yeah, the scammers) with the title <no title> . It was accepted into the database, but broke poetry.com’s index.

One thing I’ve noticed on the Internet ,is that on many message boards and blogs ,and occasionally email ,people often leave a space before a comma ,not after. I’ve never encountered it in handwriting ,typed and printed correspondence ,or any other media. Why is this practice so prevalent online?