I believe that this is caused when if you use smart quotes, i.e., the curly of slanted quote marks and apostrophe that Word automatically uses and the font that’s rendering the page you see does not support this.
Yes, the problem is when the application you’re creating the words on substitutes the neutral ASCII quotes or apostrophes with directional quotes or apostrophes. And then when someone opens the text in an application or browser that doesn’t support the directional quote, it displays a mess instead.
This is why directional apostrophes are pure evil. Stop creating them, for crying out loud.
If you’re using something shitty like MS Word to write web pages, the software is fond of auto-replacing the ’ character with curly-quotes that are encoded as a different value. The ASCII apostrophe has a value of 39. Curly apostrophes do not appear in the 7-bit ASCII character set, nor in the 8-bit ISO-8859-1 character set, which is a superset of ASCII. They do appear in the 8-bit Windows-1252 character set, which perversely mirrors ISO-8859-1 except replaces a bunch of control codes with things like the curly quotes and the Euro sign.
So, why the string ’ specifically?
If you take the right curly single quotation-mark character, which in Unicode exists at the codepoint U+2019, and you mistakenly encode this value in Windows-1252 encoding, you get the bytes 226, 128, and 153, which correspond to the Windows-1252 characters ’.
There are a few ways to encode an apostrophe in Unicode. The traditional one is code point U+0027, directly equivalent to the ASCII apostrophe.
What you’re seeing is U+2019, Right single quotation mark. If you look at that page, you can see that its encoding in UTF-8 is 0xE2 0x80 0x99.
It so happens that :
U+00E2 = â
U+0080 = €
U+0099 = ™
As for where the error came from, it’s probably the author of the Web page who didn’t encode things correctly. Possibly they forgot to check the encoding when exporting the text from their text editor to HTML.
No. Typography shouldn’t bow to temporary technical limitations. Character encoding standards are here, they’re widely-implemented, and they’re not going away.
You’re naïve if you imagine that a small number of 20ᵗʰ Century reactionaries will be able to keep computerized text in the typewriter age. ¡Viva el estándar Unicode!
It depends. The problem might be caused by the web server sending the wrong content encoding header (in which case overriding that in the HTML might work (but the better solution is to fix the server config.)) But it can also be caused by something earlier in the chain server-side, like a misconfigured database, which is sending out stuff in the wrong encoding, or badly-written server-side code, which is failing to properly decode stuff from the filesystem or database.
Sometimes encoding problems can be notoriously difficult to track down.
My question in all this is why have we been seeing this on the web for like 15 years? Can’t the browsers or someone make it so that it works right?
Is this an unfixable problem?
Guessing at character encodings is far more problematic than just rendering the crap that you’re given. Every shitty browser bug in history has been caused by trying to use flawed heuristics to guess what the author “really means.”
Gus: What do you mean by “It looks normal when I type in while composing the post but displays incorrectly in the actual post.”?
As far as I can see the symbol in your post is an ampersand. There are many shapes of ampersand, just like there are many shapes of the letter “A”. Each font draws the letters and symbols as slightly different shapes.
The font used inside the text input box and the font used to display posts are different on the SDMB. Which is a pretty common thing to see in web forms.
The fact the ampersand is a different shape has nothing to do with the issue that the OP is asking about.
& is in ASCII. Every Internet-connected Web-browsing computer on Earth agrees what byte to interpret as the & character. This is down to either a font difference or incipient insanity on your part, and right now, your description of what’s confusing you isn’t enough to rule out either.
I can explain fonts. Nobody can fully explain insanity.