I’d like to expand on my claim that there’s frequently no easy way to solve these sorts of problems. However since we don’t know much beyond the vague details of the op, I’m going to construct a fictional example of the problem posed by encodings, so I can better illustrate why these problems don’t have a straightforward “just make it work” type of fix. The details here are simplified and not meant to reflect any specific problem someone is having, although it’s meant to be analogous (and the technical details of how encodings work should be accurate).
Let’s say I write a simple web page. In fact, it’s not even a proper web page at all, there’s no HTML or anything, it’s just this:
hello everyone!
Except I’m not satisfied with that, I want a copyright notice in there, so I look and sure enough, there’s a handy character I can use for the © symbol. So I put that in:
hello everyone! © copyright by me
(If you don’t see the © that’s ok! It’s kind of the point I’m trying to make…)
Now at this point, I’ve made a small mistake. I used a character that looked ok to me, but here’s how that copyright symbol looks to the computer (in binary):
10101001
See that very first “1”, all the way on the left? That’s a problem… what it means is that there’s not enough information in this data for another computer to know how to display it. It’s an “extended” character, and that means that this data is encoded in some way and has to be decoded by whatever is going to display it. I didn’t notice this when I wrote it, because my computer is naturally decoding it the same way that it encoded it… so it looks the same. In this case, my computer is set to use the “latin-1” encoding, so as long as everyone else’s computer assumes everything is in latin-1 they will see the same thing.
Now throw the browser into the mix. Let’s say the popular open source browser FireWombat 1.0 started out by assuming everything should be latin-1 if there was no indication otherwise. This worked well at first because FireWombat was developed in the US and was mainly used by people in the US. And my site always looked just like I wrote it, even if you loaded it up somewhere like Japan, because even though they don’t really use latin-1, FireWombat is set to use it anyways, so they still get my copyright symbol as I intended, and all is well.
But, FireWombat gets popular, and people in other parts of the world start writing web pages. They of course use annoying* non-latin-1 encodings with other characters (the ones in their actual language). They are annoyed because when they write their version of the “hello everyone!” page in their own language, it comes out as garbage, since FireWombat always assumes everything is in latin-1. They have to set special things on their webserver that they don’t understand, or put mysterious code in their page that they don’t remember to do, just to get FireWombat to display their page right.
So FireWombat 2.0 changes. Now it uses the user’s computer’s default encoding to display everything. People in japan see their own pages now the way they intended them, although here in the US we just see garbage whenever we look at one of their pages. That’s ok though, because who understands Japanese anyways, right?* Unfortunately, when they view my page, that one copyright character is garbage. I get complaints from my users in other countries, who wonder why the site broke (these people take their copyright notices very seriously).
Even worse, the developers at FireWombat continue to get complaints. Users want their browser to default to a “new” (I should say newer) encoding called unicode, which is meant to be universal to all languages. This way all pages will look right to all people no matter where they are. Except when FireWombat 3.0 comes out and starts defaulting to unicode for everything, it breaks my old site. Since that copyright symbol (with the first “1” set) is not valid in unicode, FireWombat replaces it with the funny little square with a question mark in it, because it doesn’t know what else to do with it, and figures you should know that something is wrong.
I am outraged, and I complain. FireWombat developers insist I can fix this by setting something on my web server to tell the browser I want it displayed using latin-1, but I’m not very technical and besides, I didn’t change anything, they changed, it works just fine under FireWombat 1.0, so they should just fix it. But FireWombat can’t go back to 1.0 behavior, because now they’ll break all of the sites that rely on the new behavior. We are at an impasse, and someone is going to lose.
Again, this is mostly a mocked up situation and is not meant to reflect anything exactly (the real situations and arguments are much more mind-numbingly technical). The short version is that there is lots of existing content, and lots of new content, and constantly changing browsers from different companies that have to decide between appeasing new users and preserving the functionality of millions of existing pages. Something is going to break somewhere.
- I’m not really this obtuse, I’m mocking the mentality that got us into this mess in the first place.