Why do some places on the internet replace apostrophes with odd characters?

Just_Asking_Questions · May 26, 2015, 4:23pm

I have seen this on and off, but I’m not sure what is going on.

Someone will attempt to post a word such as “railroad’s” and it comes out “railroadâ€™s”, consistently replacing the apostrophe with “â€™”.

What causes this? Why are some writings affected and not others? And why these characters?

Is this on the writer’s end, or the internet site hosting the test, or my end? I’ve seen it on message boards as well as “regular” websites.

Really_Not_All_That_Bright · May 26, 2015, 4:34pm

Oooh. Good question!

kayaker · May 26, 2015, 4:40pm

Yep. I haven’t a clue.

OldGuy · May 26, 2015, 4:40pm

I believe that this is caused when if you use smart quotes, i.e., the curly of slanted quote marks and apostrophe that Word automatically uses and the font that’s rendering the page you see does not support this.

Lemur866 · May 26, 2015, 4:54pm

Yes, the problem is when the application you’re creating the words on substitutes the neutral ASCII quotes or apostrophes with directional quotes or apostrophes. And then when someone opens the text in an application or browser that doesn’t support the directional quote, it displays a mess instead.

This is why directional apostrophes are pure evil. Stop creating them, for crying out loud.

friedo · May 26, 2015, 5:00pm

It’s not about the font, it’s about encoding.

If you’re using something shitty like MS Word to write web pages, the software is fond of auto-replacing the ’ character with curly-quotes that are encoded as a different value. The ASCII apostrophe has a value of 39. Curly apostrophes do not appear in the 7-bit ASCII character set, nor in the 8-bit ISO-8859-1 character set, which is a superset of ASCII. They do appear in the 8-bit Windows-1252 character set, which perversely mirrors ISO-8859-1 except replaces a bunch of control codes with things like the curly quotes and the Euro sign.

So, why the string â€™ specifically?

If you take the right curly single quotation-mark character, which in Unicode exists at the codepoint U+2019, and you mistakenly encode this value in Windows-1252 encoding, you get the bytes 226, 128, and 153, which correspond to the Windows-1252 characters â€™.

Heracles · May 26, 2015, 5:01pm

There are a few ways to encode an apostrophe in Unicode. The traditional one is code point U+0027, directly equivalent to the ASCII apostrophe.

What you’re seeing is U+2019, Right single quotation mark. If you look at that page, you can see that its encoding in UTF-8 is 0xE2 0x80 0x99.

It so happens that :
U+00E2 = â
U+0080 = €
U+0099 = ™

As for where the error came from, it’s probably the author of the Web page who didn’t encode things correctly. Possibly they forgot to check the encoding when exporting the text from their text editor to HTML.

Derleth · May 26, 2015, 5:15pm

No. Typography shouldn’t bow to temporary technical limitations. Character encoding standards are here, they’re widely-implemented, and they’re not going away.

You’re naïve if you imagine that a small number of 20ᵗʰ Century reactionaries will be able to keep computerized text in the typewriter age. ¡Viva el estándar Unicode!

Just_Asking_Questions · May 26, 2015, 6:51pm

Thanks for the quick explanation.

So it would seem that when you see this it is because the original text was composed in MS word and moved over without checking the encoding?

friedo · May 26, 2015, 6:53pm

This particular mis-encoding happens when the curly-quote is encoded as UTF-8, but served by something that thinks it’s in Windows-1252.

Saint_Cad · May 26, 2015, 7:15pm

I concur if you are sending me stuff to put on the website and I ask for text file and you decide to “help me out” by sending me a Word doc.

To answer the OP, this is probably what happened and the web manager didn’t catch and change it.

leahcim · May 26, 2015, 7:43pm

Is this the kind of thing that can be fixed by declaring an encoding in your HTML like this article describes?

friedo · May 26, 2015, 7:55pm

It depends. The problem might be caused by the web server sending the wrong content encoding header (in which case overriding that in the HTML might work (but the better solution is to fix the server config.)) But it can also be caused by something earlier in the chain server-side, like a misconfigured database, which is sending out stuff in the wrong encoding, or badly-written server-side code, which is failing to properly decode stuff from the filesystem or database.

Sometimes encoding problems can be notoriously difficult to track down.

Hermitian · May 26, 2015, 7:58pm

My question in all this is why have we been seeing this on the web for like 15 years? Can’t the browsers or someone make it so that it works right?
Is this an unfixable problem?

Acsenray · May 26, 2015, 8:04pm

Hear! Hear! Bravo.

friedo · May 26, 2015, 8:11pm

Guessing at character encodings is far more problematic than just rendering the crap that you’re given. Every shitty browser bug in history has been caused by trying to use flawed heuristics to guess what the author “really means.”

GusNSpot · May 26, 2015, 9:23pm

So, this is why when I use a shift 7 I get this & in the posted post but shows as a normal and symbol in the post reply window?

Or it it because I don’t use IE or Chrome or ______ ???

It looks normal when I type in while composing the post but displays incorrectly in the actual post.

Huh???

Chronos · May 26, 2015, 9:27pm

That is a normal “and” symbol.

LSLGuy · May 26, 2015, 9:31pm

Gus: What do you mean by “It looks normal when I type in while composing the post but displays incorrectly in the actual post.”?

As far as I can see the symbol in your post is an ampersand. There are many shapes of ampersand, just like there are many shapes of the letter “A”. Each font draws the letters and symbols as slightly different shapes.

The font used inside the text input box and the font used to display posts are different on the SDMB. Which is a pretty common thing to see in web forms.

The fact the ampersand is a different shape has nothing to do with the issue that the OP is asking about.

Derleth · May 26, 2015, 9:38pm

& is in ASCII. Every Internet-connected Web-browsing computer on Earth agrees what byte to interpret as the & character. This is down to either a font difference or incipient insanity on your part, and right now, your description of what’s confusing you isn’t enough to rule out either.

I can explain fonts. Nobody can fully explain insanity.

Topic		Replies	Views
Would Someone Please Explain What This Symbol Means In My Humble Opinion	47	2008	July 24, 2021
What the â€? Factual Questions	29	3658	April 30, 2013
Why do apostrophes so often get screwed up on the WWW? Factual Questions	10	1645	September 12, 2006
Why strange symbols for punctuation? Factual Questions	7	1659	November 6, 2015
For all the reasons Microsuck is going to hell, Alt+0146 is the hellishest The BBQ Pit	47	6875	January 20, 2009

Why do some places on the internet replace apostrophes with odd characters?

Related topics