Why strange symbols for punctuation?

Gary_T · November 6, 2015, 3:55pm

I was reading something online where ‚Äô appeared where there clearly should have been an apostrophe. I have a vague understanding that this might happen due to some sort of format mismatch, but I have two questions:

Why not just use an apostrophe? If they can come up with those characters, surely an apostrophe is possible, isn’t it?
How/why did that particular odd collection of symbols get selected to substitute for an apostrophe? It just seems quite bizarre.

Telemark · November 6, 2015, 4:04pm

As you mentioned, it’s probably a character set issue. There isn’t an apostrophe character, there’s just a code that tells the browser to display an apostrophe. Depending on which character set you are using, the code for apostrophe is different. If you’re expecting it to be in one character set and it comes in another, the system will display what corresponds to the character in that character set. It could be an issue with Unicode - a universal encoding system that is common but not universal.

To sum up:

They did use an apostrophe in the character set they were using
That’s just what the encoding corresponds to in the character set being displayed.

Gary_T · November 6, 2015, 4:09pm

So is it correct to see this as a parallel to alt codes?

Is it possible that it an apostrophe would appear if I used a different browser to view the item, or does it have to do with the browser used by the originator?

Are there similar codes for every character, including letters? If so, why don’t I ever seem to see codes instead of letters? If not, why isn’t there just an apostrophe instead of a code for an apostrophe?

Kamino_Neko · November 6, 2015, 4:28pm

What you’re seeing isn’t the code for the apostrophe, but what is coded for by the code that the original text used for the apostrophe in the encoding that it was outputted as. What you see will depend on just what the encoding mistake that caused the problem was.

And it most likely happened because it was supposed to be a curly quote, rather than a ’ - the latter would generally be the same across encodings (as letters are, which is why they came out correctly), but the curly quotes aren’t - and the encoding used for the display either doesn’t have curly quotes, or encodes them to a different point.

This, BTW, is called Mojibake, Buchstabensalat, and a bunch of other things. (Article also goes into more detail of how it happens.)

Blaster_Master · November 6, 2015, 4:30pm

Alt codes are essentially exactly this; you’re directly accessing the ASCII table for the character set you’re using. Generally, the same symbols are in the same place for different sets, but not always.

It could be a browser issue. It could also just be a font issue. Sometimes fonts either don’t store common symbols in the same order for some reason, or there’s a similar symbol elsewhere in the table that got it screwed up.

As an example, I was recently doing some work directly manipulating character sets in some code I was writing but ran across an error when attempting to parse on a hyphen. As it turned out, some of the hyphens were hyphens and some were actually dashes (often shown as a double hyphen), but in the particular character set I was using, both looked the same, so I was really confused by the parsing until I looked at the actual ASCII code for the character

For ASCII, there’s a standard for the basic set of characters, so all the digits, lower case, upper case, and common symbols are supposed to be in the same place. The issues like this usually come in the extended portion. One browser or character set might use the standard apostrophe, and another one might prefer a non-standard one for some reason. I’ve seen these issues particularly when jumping between browsers, word processors, character sets, etc.

That said, typically this just results when the designer either didn’t test it in common browsers, you’re using an uncommon browser, or it’s using some obscure font or formatting or whatever that you don’t have.

Gary_T · November 6, 2015, 4:51pm

Thank you all for the replies, which are helpful.

leahcim · November 6, 2015, 5:07pm

The sequence of character is the UTF-8 encoding of the “right curly single quotation mark”, as seen by a browser expecting basic ASCII. That character is unicode code point U+2019.

In UTF-8, code points larger than 127 are encoded as multiple bytes, which in this case are (in hex) E2 80 99. If viewed by a non-unicode-supporting browser (or just one that hasn’t identified that the page is in UTF-8), this gives “‚Äô” in some character sets.

The reason this wired encoding is done is so that UTF-8 strings correspond exactly to ASCII strings if code points less than 128 are used. That is why the page renders mostly correctly, even though the encoding is wrong.

Nava · November 6, 2015, 5:27pm

And some are missing a lot of symbols: it’s relatively common to encounter fonts that don’t have any diacritics, the crossed o, double vowels… so if you’re writing in a language which includes those, you’ll get most of the text in the chosen font but those letters come up in a different one. It’s still the right letter, but the whole thing looks up more or less like this.

Topic		Replies	Views
Why do some places on the internet replace apostrophes with odd characters? Factual Questions	44	8216	May 29, 2015
Why do apostrophes so often get screwed up on the WWW? Factual Questions	10	1653	September 12, 2006
Would Someone Please Explain What This Symbol Means In My Humble Opinion	47	2016	July 24, 2021
Why do I get these weird text strings instead of apostrophes and quotes and such? Factual Questions	7	642	May 20, 2018
Quote mark coding - why quote marks are changed into different characters Factual Questions	9	1817	July 14, 2004

Why strange symbols for punctuation?

Related topics