I run several websites in different languages. Just heard that the Danish one is displaying Æ, Ø and Å incorrectly in IE6.
I checked the CMS code (I have access to the masterpage) and it has a default Doctype definition of:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
I’ve searched around, and I can’t find anything official or definitive from the W3C about whether substituting “DA” for “EN” would make a difference.
However, I get the impression that using a content-type meta tag with “UTF-8” would fix this completely, for all languages by correctly rendering 128+ ASCII characters.
Can content-type and DOCTYPE be used together? What’s the purpose of DOCTYPE anyway? Am I on the right lines?
As far as I know, the EN language attribute of the doctype is an archaic remnant of the SGML standard and is not interpreted in any browser. In fact, the proposed doctype for HTML 5 doesn’t even have a language attribute.
If your document is UTF8 encoded, then adding that content-type meta tag should fix your problem. See also HTML Document Representation
On the web, you’re ALWAYS using some content-type, since the web server should send at least a “Content-Type: text/html” header for html resources. If you’re always using utf-8 for html, you can probably configure your web server to send the correct “Content-Type: text/html; charset=utf-8” header automatically. The meta tag is just a “trick” to specify headers in the HTML file directly.
The DOCTYPE specifies the kind of SGML file you’re sending. For HTML this usually includes HTML or XHTML, plus a version number, and a variant (strict, loose, frameset) and an xml prologue for XHTML. There are subtle and not so subtle differences between the common doctypes, but a reasonable intro can be found at CSS - Quirks mode and strict mode
In any case, yes you can use both. There is some overlap between doctype and content-type, but the content-type is normally not part of the document itself; it describes to the browser what the file it’s receiving is going to be - HTML, gif, pdf, whatever - while a doctype is a way of specifying a subset of SGML - and HTML and XML are subsets of SGML (and XHTML is a subset of XML).
Additional bonus material: you can specify the language of the document via another meta tag, or via the lang attribute of most HTML elements. See Putting language attributes in HTML note that this has no effect on the charset/encoding you’re using so it won’t help your problem (though it might help automatic translation, search engines and screen readers, so it’s usually worth adding that information if the language isn’t english).
Marvellous! That worked. The problem seems to be that some browsers over here - in particular IE6 - by default are set to “Western European”, a now-defunct IS standard. I put UTF-8 in as my content-type and bada bing, my Danish colleagues’ browsers now adapt to the correct charset.
And lots of useful bonus info in addition. Thanks, Parentheses, you’re not superfluous at all!
I have run into this problem when English pages did not display correctly in foreign computers. I found the reason is they have the default charset to be something else so that if the page itself does not specify a character set then it will be interpreted incorrectly. The solution is to specify the correct charset in the page. The relevant code is
or whatever set you want to use. In my experience that solves the problem for all computers used to view the page, no matter what default code they have set.
sailor regrettably, your answer is entirely wrong: the first half of your answer is the root of my problem, and the second half is irrelevant.
The “western europe” charset I described is in fact the defunct ISO 8859-1.
2, The gb2312 charset is a Chinese one.
Using UTF-8, a Unicode standard, incorporates all the character sets usually required, and simultaneously gets around the issues I described in the OP.
I am not sure i understand your post. I do not know what you mean by “entirely wrong” because that tag has worked for me. And I do not know what part you consider irrelevant.
I think you misunderstand my post. I know full well “The gb2312 charset is a Chinese one” because I use it. I am not telling you to use Chinese code. Those two were just examples of the use of the meta tag which I just pulled from a couple of my web pages.
If you want to use UTF-8 just use
but I have run into problems with people telling me pages in UTF8 were not rendered correctly. While I favor use of UTF-8 it seems it is not the universal solution some think it is so I still use specific codes for specific pages.
IIRC there’s been a certain amount of reluctance in East Asian countries to accept unicode*, and in many countries the default is to use some “local” encoding scheme, so it’s possible that a fair amount of users over there just don’t have the fonts/software installed to correctly render a Chinese unicode text.
case in point: the pretty popular programming language Ruby, which was developed in Japan but is currently used world wide for all kinds of tasks, especially web programming, has only really bare-bones support for character encodings of any kind, while Perl 5, developed & designed mostly in the US and Europe, has excellent support for pretty much any useful character encoding with automatic conversions built in.