Google Results Pages URLs

okay, i’ve been trying to “crack the code” of the google results page URL. so far, i’ve gotten this:


Results for lyrics
http://www.google.com/search?hl=en&ie=ISO-8859-1&q=lyrics

Results for Google
http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=google

Results for Shinedown
http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=shinedown

Results for Memento
http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=memento

Results for “Three Days Grace”
http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=“three+days+grace”

Results for Closure+lyrics
http://www.google.com/search?hl=en&ie=ISO-8859-1&q=closure%2Blyrics

Results for “chocolate chip cookie”
http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=“chocolate+chip+cookies”

Results for blue yellow
http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=blue+-yellow


so, my first question is why do some searches use the format http://www.google.com/search?hl=en&ie=ISO-8859-1&q=query while others use http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=query?

my second question may or may not be a little more complicated, or at least require a longer answer. in the URLs, the quotation mark is represented by %22 and the plus sign is represented by %2B. so, is there a list of “codes” to represent symbols like this on Google? and are they used anywhere else, or just on Google? i experimented some and found that 10 followed by any alphanumerical character turned up that character in the search, but i figured it’d be easier to take advantage of other people’s knowledge about this than to spen hours searching for different numbers along with the percent sign.

I’ll leave part one for somebody else, but for part two, here’s more about URL encoding than you ever wanted to know:

http://www.blooberry.com/indexdot/html/topics/urlencoding.htm

Regarding %22 = " and %2B = +, the 22 and 2B are hexadecimal representations of the ASCII code for those characters.
ASCII Character Map (Adobe format)
The percent sign is the escape character that notifies the 'puter that the next two characters are hexadecinal values.

The linked table shows four columns. The first is the decimal value of the character. The second is the image of the character (assuming a “standard” Latin typeface and without getting in to alternative character sets). The third column is the binary value of the character, and the fourth column is the hexadecimal representation.

This code may be tough to crack. When I searched for “lyrics” with Google, I got something rather different. Here’s your result, followed by my result:



http://www.google.com/search?hl=en&ie=ISO-8859-1&q=lyrics
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=lyrics&btnG=Google+Search


The “btnG” is simple enough - “Google Search” is the name of the button I picked. I can’t explain why the encodings are different, though - “ISO 8859-1” vs “UTF-8”.

But also, none of the searches I ran stuck that little “lr=” in. I don’t know what it could be.

The “lr” parameter is used to restrict the language of pages. With “lr=lang_de”, Google will only search German pages, for example. “lr=” probably gets added due to a preference setting that does not restrict the language.

This is just a WAG but maybe that’s browser dependent. In IE (at least in 6.0) under Tools - Internet Options - Advanced tab, in the browser section there’s an option “Always send URLs as UTF-8”. Maybe it changes depending on whether or not that option is checked?