What the â€?

Little_Brother_1 · April 24, 2013, 8:25pm

Why don’t computers automatically “translate” text so that punctuation and other characters in web pages and e-mails aren’t replaced by weird symbols, e.g. “â€”.

Although I lack technical expertise, I am dimly aware that this arises from the incompatibility between (among?) numerous “character sets” and network protocols. Paradoxically, even something called “unicode” comes in several flavors that do not mix well.

This annoyance affects not only personal e-mails, but business e-mails, web pages, etc. I’m sure that most end-users don’t know what to make of the geeky error messages asking whether a reply should be sent “as is” or “Unicode”, etc.

So again: why isn’t there a programming/software fix for this by now? Even simply deleting incompatible characters and/or leaving a blank space would be an improvement. (While they’re at it, they also need to fix HTML so that certain punctuation, e.g. the numeral “8” next to a right parenthesis “)”, doesn’t automatically morph into an emoticon.)

Keeve · April 24, 2013, 8:29pm

Note to moderators: This OP sounds like a rant, but I really hope someone can shed some factual light on it, so please leave this thread here in GQ!

Telemark · April 24, 2013, 8:38pm

Computers don’t do anything, software does. There are lots of programs out there that would need this functionality, and with display issues there are many characters that wouldn’t look right anyways.

That’s the advantage of standards, there are so many to choose from.

There are many solutions, all with advantages and disadvantages. You can throw away anything you don’t understand but that means you’re destructive to documents that pass through you and that’s typically a bad thing. You can fail to display anything you don’t recognize, but that can cause all sorts of confusion by itself. You can attempt to translate but without knowing what the source was or the destination is it’s pretty hard to guess.

There are dozens of character sets and encodings because the problem is hard. No one designed languages, they don’t all line up together in a simple format. Throw in languages like the various Chinese/Japanese character representations, Arabic, and shudder Thai and any solution you chose as a one-size-fits-all ends up as a mess.

Polycarp · April 24, 2013, 8:45pm

Unicode is a system in which every symbol used by humans for communication, including the Latin alphabet but also the Greek, Cyrillic, and Arabic ones, the ideographs of Chinese, the syllabaries of Japanese, the yen, Euro, pound sterling, and other money symbols, the accented, tilde-ified, and umlauted letters of various European languages, and punctuation. MS Word seems to default to using Unicode. The accented letters, smart quotes, sigle symbol for ellipsis (…), and similar Unicode symbols are often not displayed well by browsers which use ASCII or ANSI coding. The â€ is probably your computer’s best guess as to how to render the HTML rendering of Unicode symbol #8817, which is actually the ellipsis. And so on.

Musicat · April 24, 2013, 8:50pm

Then we have oddities like the interpretation of the Milwaukee Doubletree Hotel page, which Chrome thinks is in Japanese, and asks if I want to translate it. If I do, nothing changes, so it’s a strange sort of Japanese in the American Heartland.

leahcim · April 24, 2013, 9:03pm

Au contraire, Unicode symbols (in the UTF-8 encoding) are very well displayed by browsers which assume ASCII and ANSI, which is why you can see most of the text just fine without only a few spots where it goes weird. Try interpreting a JPEG stream as a PNG and you wouldn’t get ten bytes in without failing.

UTF-8 is designed to be the same as ASCII for the low characters (0-127). Maybe it would have been better to crash out entirely to drive home the point that you’re doing something non-sensible.

Blakeyrat · April 24, 2013, 9:52pm

There are well-established ways of using the correct character set in webpages, this has been perfected for decades. The sites you’re seeing this problem on were coded by people who don’t know what they’re doing.

In short: it’s not a computer problem, it’s a people problem.

Telemark · April 24, 2013, 11:09pm

It’s not quite that simple. There are hundreds of legacy programs and systems that would be difficult and expensive to upgrade, and most for little gain. Things are probably better in countries with more complex native character sets. Even if you upgrade your personal machine there may be weak links along the way that make perfect transmission problematic.

njtt · April 24, 2013, 11:57pm

My version of Chrome doesn’t do this. (Neither does Firefox.)

Sunspace · April 25, 2013, 12:04am

Hmm. Looks normal to me. さよなら？

leahcim · April 25, 2013, 12:45am

Since this part of the rant doesn’t seem to have been addressed – this has nothing to do with HTML at all, just a program that believes that users would prefer to have a simple way to write an emoticon than a simple way to write an equation.

Emoticons have never been part of any HTML standard (although if the browser wars in the 90s had gone on any longer, IE would probably have supported them at some point .)

BorgHunter · April 25, 2013, 1:17am

And this is the root of the problem. UTF-8 is the clear solution for encoding text, and it’s well-supported in all modern software. The problem is all the software that isn’t modern, and all the webpages using some crazy character encoding because they were written before UTF-8, and all the programming languages which are still stuck on 8-bit characters and Latin-1, or just to confuse things, languages using UTF-16 (lookin’ at you, Java).

TimeWinder · April 25, 2013, 1:52am

A system badly enough designed that it is even possible to be used incorrectly by the majority of people isn’t a people problem, it’s a computer problem.

However, I’m glad that this was perfected for “decades” (HTML was created in 1990, so they must have been quick about it) – that means that the billion or so dollars a year my company spends on it can now be reclaimed so that we can blow it all on hats.

Musicat · April 25, 2013, 3:28am

One man’s “normal” is another’s Japanese.

Firefox doesn’t have translation links to Google, IMHO, so that’s not surprising. Chrome is highly customized, so I’m not surprised there, either. It’s likely that something in my system is misleading the browser engine.

But in case you don’t believe me, here is a screen shot, with only the extreme top and bottom (showing tabs) parts removed. Nothing else has been edited.

I haven’t been browsing Japanese sites or doing anything else that might trigger such a function in Chrome, but this Milwaukee hotel site (and only this site) brings up the “translate?” message every time. Weird.

eschereal · April 25, 2013, 3:36am

Except, so far, systems are designed by people, not computers, so at the beginning, it really is a people problem.

scr4 · April 25, 2013, 3:47am

Yes, they’ve been working on the problem for over 2 decades.

And your company spends “billion or so dollars a year” dealing with character set compatibility issues?? Even if you work for Google or Apple, I doubt they spend 2% of the revenue on this issue alone.

Senegoid · April 25, 2013, 5:34am

Perfectly intelligible Romulan.

And the problem is . . .?

Senegoid · April 25, 2013, 5:37am

Here’s a good primer on Unicode character codes:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.

AaronX · April 25, 2013, 5:55am

Wà̴̵̶̷̸̡̢̧̨̛̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̯̰̱̲̳̹̺̻̼͇͈͉͍͎́̂̃̄̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̽̾̿̀́͂̓̈́͆͊͋͌̕̚͠͡ͅit till you deà̴̵̶̷̸̡̢̧̨̛̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̯̰̱̲̳̹̺̻̼͇͈͉͍͎́̂̃̄̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̽̾̿̀́͂̓̈́͆͊͋͌̕̚͠͡ͅl with stà̴̵̶̷̸̡̢̧̨̛̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̯̰̱̲̳̹̺̻̼͇͈͉͍͎́̂̃̄̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̽̾̿̀́͂̓̈́͆͊͋͌̕̚͠͡ͅcked dià̴̵̶̷̸̡̢̧̨̛̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̯̰̱̲̳̹̺̻̼͇͈͉͍͎́̂̃̄̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̽̾̿̀́͂̓̈́͆͊͋͌̕̚͠͡ͅcritics.

si_blakely · April 25, 2013, 7:23am

AaronX, that does something very weird (Firefox 20.0.1, Windows 7).

About 15-20 years ago my father-in-law took a stab at comprehensive internationalization for computing systems (he was a lecturer in computer science at a university in NZ) - something more than just Unicode. To really get a grip on the scale of the problem he started studying Mandarin, to the point where he could read and speak the language. I don’t think the interface library developed very far, but he enjoyed the journey (which eventually included a trip to China).

Topic		Replies	Views
Why do some places on the internet replace apostrophes with odd characters? Factual Questions	44	8212	May 29, 2015
Would Someone Please Explain What This Symbol Means In My Humble Opinion	47	2008	July 24, 2021
Get off my internet! (Old Fogey question about little boxes with letters in 'em) Factual Questions	22	2240	August 9, 2008
t?sti? f?n?t?k f?nt About This Message Board	40	3533	September 1, 2009
Emoji title test 😀 😃 😄 😁 😆 😂 ☺️ 😊 😇 About This Message Board	47	2511	December 9, 2017

What the â€?

Related topics