Would Someone Please Explain What This Symbol Means

:arrow_right: ’ :arrow_left:

I have never encountered this before. I checked the certificate (valid), I used Brave Browser this time, 3rd party cookies are blocked, HTTPS is enabled, encryption is enabled and VPN was enabled.

It’s driving me nuts, when trying to read the article I have to figure out what word is supposed to be used instead of the ’ symbol.

Anyway anyone know?

That’s a symbol that shows up a lot if the code page of a website does not match up with the one expected by the web browser. There are different ways to encode the symbols beyond typical letters, numbers, and such, and the wrong one can mess up a page.

I tend to remember these showing up mostly when pages use “curved quotes” instead of the standard straight ones.

To figure out the exact problem, we’ll need more information. Is it just on one site, or nearly all sites when you see this?

Every time I’ve seen it, it’s either an apostrophe or an allegedly smart ellipsis. I generally figure it out via context.

As a young child, one who was forced to used experimental I.T.A. to encourage reading, I believe that symbolizes the long “A” sound. Granted, I.T.A. was discontinued long ago, so I’m sure that different graphics are now used to showcase the long “A” sound

It’s a Unicode right single quotation mark, printed in UTF-8 but interpreted as ASCII. Note from the link that the UTF-8 encoding of the character is 0xe2 0x80 0x99. You can look these up in an extended ASCII table and find that they correspond to “Latin small letter a with circumflex”, “Euro sign”, and “Trade mark sign”–exactly what you see displayed.

Incidentally, it’s almost always a bug with the website. Nothing to do with your browser. Someone pasted Unicode text into an input box, and somewhere in the process of storing in the database, retrieving the text, and displaying as a web page, there was an implicit treatment of UTF-8 as ASCII. It comes up most frequently with curly quotes and the like because word processors tend to auto-insert them, and they don’t exist in ASCII. Common English-language alphanumeric/punctuation characters have the same encoding in UTF-8 as ASCII, and therefore don’t have the same problem.

OK ! But why? It just about renders the article unreadable, not worth the freeking hassle!!
Every sentence has it at least one sometimes more. Did they do it on purpose? Just wondering…

Ok… posted at same time… I sure am glad there’s a reasonable explanation!!! Thank you!!

There’s no excuse today. It’s because some web developers are bad at their jobs.

However, historically there is a bit of an excuse. Transitioning from ASCII (the character set that computers used for a very long time, but is Western-centric) to Unicode (the new character set that supports Asian languages, emojis, better punctuation, etc.) was very painful, and developers often had to put in workarounds for bugs in other software, and these workarounds often caused bugs of their own, and so on. A decade or so ago, bugs like this were sorta excusable. Just not now.

Really cool site, so…ASCII obsolete now? Unicode is the world norm now?

I appreciate the information .

For anything user-facing, yes, ASCII is obsolete and Unicode is the standard. If your website doesn’t support emojis or non-Latin characters, it’s probably not going to be very popular. For other things, like programming languages, ASCII is still common.

Though it’s sorta academic because as I mentioned, the UTF-8 encoding for Unicode is equivalent to ASCII for all of the common characters. A simple text file in English and with no special punctuation will be exactly the same in both cases.

Hmmmmm. I think I should seriously study coding. It’s very interesting. From what little I do know it appears to be language. It’s gotta be! Isn’t it?

Not sure what you mean by “it appears to be language”, but yes, I find programming/coding very interesting (I’ve been doing it for over 80% of my life now).

I think many people would benefit from learning some coding skills. It helps cultivate logical thinking and if nothing else, it’s often like solving a puzzle. I would recommend starting with Python, and using an online tutorial such as this one.

I will, thank you! Are you familiar with
https://inrupt.com/solid ? I stumbled across them a couple months ago. I don’t have a clue what I’m doing. Tim Berner is changing the internet as we know it. Python is something I need to know at the very least to go forward.

I agree if it’s only one website, but, the way the OP said it, I thought they were saying they were seeing it on many different websites. So I thought their they may have accidentally enabled an option to force a specific character encoding.

In my expience, the culprit is not usually ASCII but Windows-1252, though UTF-16 or any other non-UTF-8 Unicode can also cause these issues.

To this day I still tend to use HTML entities to avoid this issue altogether for any characters outside of the original ASCII set. (Well, except emoji, because no emoji keyboard I know uses them.)

Technically, you’re correct (the best kind of correct), but that and most other “code pages” were just supersets of ASCII and only affected the characters 128-255. I’m lumping them all together under the ASCII banner. There are a couple of others that it could conceivably be, but I agree that 1252 is the most likely.

However, I am extremely confident that we are seeing a right single quotation mark (generally used as a “curly apostrophe”), encoded as UTF-8, but displayed as ASCII with some extended code page. It would be one where the hex bytes 0xe2 0x80 and 0x99 render as ’. It would be too coincidental for it to be anything else.

I see this bug so often that it barely registers anymore. I suppose it’s possible the OP enabled some strange setting. I don’t know anything about the Brave browser.

I hadn’t seen that before. Tim Berners Lee is certainly a good guy, but I can’t really figure out what the system is for. The main data harvesters like Facebook aren’t going to outsource their crown jewels to some open source system outside of their control. I guess it’s possible that organizations that are legally mandated to ensure privacy (hospitals, say) might want to use a service like this.

Since ASCII only goes from 0 to 127, and does not include no foreign “Euro sign” nor “Latin small letters with circumflex accents” but you are seeing such garbage characters, the problem definitely has nothing to do with ASCII.

If you want such characters, you can use some form of Unicode, however in an encoding like UTF-8 (by far the most common today, ~97-100% of web pages) codes 0–127 are exactly the same as ASCII, so this is a distinction that makes no difference as long as you stick to those characters.

Your Euro Sign comes out encoded as 3 bytes: E2 82 AC (base 16), however it should appear as a single glyph: €. It is literally impossible to interpret those as ASCII characters. According to @Dr.Strangelove 's table, E2 is a “Latin small letter a with circumflex” in Windows-1252 encoding, so I guess you are using some sort of Windows browser with the encoding manually overridden to Windows-1252? I have encountered such issues, but only with Microsoft browsers and only in the 20th century. (Again, >95% of web pages use UTF-8 so no way that is not the default default.)

If I do not use any non-ASCII characters, how would you know which I was using? :slight_smile: And if I use non-ASCII characters, well, then ASCII was never an option in the first place.

ETA Unicode has all sorts of problematic garbage in it— this was already being debated in Version 1 (just to give you a taste, there are loads of duplicate characters, but not consistently)— but by now you more or less have to deal with it.

ETA2

There are, naturally, libraries already available so that your Perl (or whatever) program can do Unicode text normalization.

Yeah, the beauty of UTF-8 is that, for actual ASCII characters, the encoding is exactly the same, unlike in other attempts encoding Unicode.

But I do get lumping all the 8-bit extended ASCII codes into a single group, separate from Unicode.

Oh, and you’re probably right that there’s no way to accidentally force the wrong character encoding on modern browsers. I had to use an extension to do that for a site.

Come to think of it, the OP might want to look into this extension:

It’ll give a list of encodings, so they could just try different ones to see if it makes the site work properly. I’d definitely try Windows 1252 first.

@DPRK

I am using a Moto g STYLUS . The website was located in Australia. The site is bookmarked somewhere amongst hundreds I refer to.That was my first visit. I’ll look for later.

I use Startpage, Opera, Brave, Epic, and Google alternating back and forth until I decide which one I like.

With the changing innovation in technology I feel like the internet is going be more monetized and possibly even taxed in the future. With GDPR there will be more legal liabilities for the user. What do you think?