URLs in other alphabets

Was wondering about this, I often think of all the webpages I run across in languages I can’t read.

Then I was thinking about places like Russia, India, China, and Israel not to mention Arabic speaking lands, they all use different alphabets.

So I am assuming that the URLs they type in are of those alphabets and I can’t get to those sites, cause my keyboard only can type roman letters.

Is this true? Or do I have to Google them in advanced settings and pic the country and hope they are indexed?

How do urls work in other alphabets?

[Disclaimer]: NOT computer/internet expert. [/D]
I can only give anecdotal data, but all the internet places I have ever been to in the Middle East, Pakistan, India, Russia, etc. have had Latin alphabet keyboards with an option for changing the layout from English to, say, Hindi or Cyrillic. For the URL you have to use the Latin alphabet, and you can then change to a local one for emails etc.
(OTOH I know that it has just become possible to use the special Danish characters æ, ø, and å in URLs, but those are still variants of the Latin alphabet. Obviously YMMV).

What Panurge said. The vast vast majority of URLs use only the basic ASCII character set.

However there is a system called IDNA (see Wikipedia) which effectively encodes non-ASCII addresses typed into a browser and yields a string of acceptable characters which form the “real” URL. So for instance a user in Russia can type in a URL in Cyrillic, which will lead to a URL registered as a string of apparently meaningless ASCII characters. I think that the suffix, e.g. “.ru” still has to be entered as is, though.
I am not sure how widely this is used as yet.

As for everything, Wiki also has an article on this: Internationalized domain name

I remember that a few years ago, ICANN made it possible to use umlauts (ä, ö, ü) in domain names. Practically nobody uses it - even German companies which have an umlaut in their names prefer to use the ae, oe, ue transliteration (although they will usually also register the variant with the umlaut, for the sake of completeness and to prevent phishing attacks). So same results here as the one mentioned by Panurge.

As for the part of the URL after the domain, browsers will convert non-ASCII strings to a numerical represntation. E.g. you can type in


http://ru.wikipedia.org/wiki/Чикаго

but the browser will then retrieve the ‘real’ URL which is


http://ru.wikipedia.org/wiki/%D0%A7%D0%B8%D0%BA%D0%B0%D0%B3%D0%BE

There is a Bulgarian version of Google, and you can indeed search in Bulgarian, but I’ve never seen a URL in Cyrillic. They just transliterate the word into Latin letters.

One benefit of this is that all of my students know what “w” is. They have trouble with other letters (I say “e” and they write “i”, I say “s” and they write “c”), but they never get confused about what a w is. Which is convenient because it has the longest (and potentially most confusing) name of any letter.

Similarly, the Japanese wikipedia page for <もののけ姫> (Mononoke Hime, i.e., Princess Mononoke in English) has the URL:


http://ja.wikipedia.org/wiki/%E3%82%82%E3%81%AE%E3%81%AE%E3%81%91%E5%A7%AB