What does adding a URL to a host file do?

IIRC, adding a URL to a host file means that whenever my computer tries to go to that address, it will hit an internal dead-end. How absurd of an oversimplification is that?

The question arises because I have the FireFox extension NoScript set to block by default, and whenever I get to a page I routinely have to select two or three addresses to achieve functionality. I infrequently use “temporarily allow all” even on trusted pages, because I’d rather not let doubleclick or facebook or other advertising sites that I never use run anything on my machine. I don’t care if it’s harmless or not, I don’t want it.

It would simplify things, though, if there was some way to blanket block Doubleclick (or others I want to add). Would adding it to my hosts file work, or is the visited page bypassing it to trigger their code?

I hope this wasn’t a completely incomprehensible question…

Thanks,

Rhythm

No, not quite, it won’t hit a dead end unless you tell it to a hit dead end.

In addition to the URL, you must specify the IP address associated with that URL. Instead of your computer reaching out to a DNS server to resolve the hostname, your computer will instead refer to the Host file and use whatever IP address is specified there.

If you put in your hosts file that ads.doubleclick.net (for example) resolves to 127.0.0.1, then if your browser is called on to connect to the ads.doubleclick.net server - say to retrieve http://ads.doubleclick.net/tracker.gif - the url will be invalid because nothing exists at http://127.0.0.1/tracker.gif

The way these ads work is that the browser needs to send info to the ad site’s URL, so info is being served right through your browser and not from the Web site itself. So the call to the URL is coming from you. Otherwise, the IP address that doubleclick would get would always be the IP address of the Web site the ad is on.

That’s not how ALL ads work, but a good chunk of them. So you won’t be stopping all ads…and you will probably also see broken image placeholders and/or get javascript errors. So blocking ads via hostfile might not be the most aesthetic way of doing it.

It’s important to remember that you can’t add a full URL to the hosts file. You can only add hostnames to it, which means everything on that host is effectively blocked for you. There is no selectivity and there is no real intelligence: It’s a blacklist that’s as dumb as a box of rocks.

Privoxy is a much more intelligent way to accomplish the same thing. It’s configurable to a huge degree, but the defaults are sane and it’s generally easy to live with.

Hosts files cant handle urls, they can only handle host names. So if you want to block casalmedia then you need to put this in your hosts file:

127.0.0.1 ads.casalmedia.com

not:

127.0.0.1 http://ads.casalmdia.com/annoyingad.gif

That said, yes, this works. You can find precompiled hosts file out there with lots of ad servers. Adding just ads.doubleclick.com wont help as their ads may be delivered at ads2494.doubleclick.com or some other subdomain. No, you cant do wildcards like *.doubleclick.com either.

FWIW, I wouldn’t bother with this approach. Firefox’s adblockplus works fine for me.

Why not install Adblock Plus?

ETA: oops, missed last line of HorseloverFat’s post.

Ad blocker plugins are one way to go, but I prefer the HOSTS file method. I use the one at:

which is updated often and I don’t have to worry about extraneous plugins bogging down my system resources.

The way URL addresses are resolved goes like this.

First, the computer looks in its local hosts file. If it finds an entry for that host, it uses it.

If that doesn’t work, it sends a request to whatever DNS server you have configured (usually supplied by your ISP during the automatic configuration of your IP, DNS, gateway, etc). If that DNS server has an entry for that host, it gives it to you. If not, it goes up the chain of DNS servers. If none of them have the address cached anywhere, you eventually will reach one of the root DNS servers of the internet. If it ain’t there, it don’t exist, at least not as far as the internet is concerned.

There are a couple of ways that the DNS lookup can get screwed up. One is if you intentionally put a bogus entry in your local hosts file. This can be used to block evil web sites. You put an entry in for the evil site and give it an IP of 127.0.0.1, which is the loopback address (it always points to your own computer). If you put a loopback entry for evilspawnofsatan.com in your hosts file, when you try to access evilspawnofsatan.com, your computer goes oh, I know where that is, it’s at 127.0.0.1, and it sends a request for data to your own computer. Your own computer is probably not configured to do anything with it, so you end up with a resource not found. It’s an effective way to block evil computers, because it doesn’t rely on any specific protocol. It blocks web pages, ftp access, you name it.

That’s not really what the hosts file is intended for, though. I happen to have a bunch of computers in my basement, and these computers are all on a local network and they don’t publish their names out to the internet. So, if I do a DNS lookup on them, I’m never going to find them. So, I can create entries in my hosts file that tells me where each computer is. So if I had a computer named BUBBA and his IP address was 192.168.0.13, I could just put that into my hosts file. Then if I typed http://BUBBA it would go to that computer. In the early days of the internet, this is how the whole thing worked. After they had to manually keep track of hundreds of hosts they realized this was never going to be practical, and implemented the DNS lookup method instead. They left the old method in place, though, in case you ever need to use it for your own purposes.

While you can take advantage of the hosts file to block evil sites, evil sites can also infect your host file (via trojans or viruses) and point valid addresses to their own evil systems.

Intentionally blocking or re-routing a site in your hosts file isn’t the only way a DNS lookup can go wonky, though. As I mentioned, when it can’t find an address, it goes up the chain of the DNS hierarchy. You don’t want every DNS request to go all the way to the root servers. They would get overloaded darn quickly and the entire internet would grind to a halt. You want your ISP to cache as many hosts as it can, so that you only have to go so far as your ISPs DNS server for most requests. Your ISPs DNS server is supposed to have an exact copy, or close to it, of what is in the root DNS servers. Now obviously, if something changes it takes a while to get out to all of the DNS servers all over the internet, but you can also have a thing called DNS cache poisoning. This is when some evil site manages to hack into the ISPs DNS cache. You try to access boards.straightdope.com, and your ISP accidentally ends up sending you to evilbobscomputer.com instead.

So, yeah, adding a loopback entry in your hosts can deny access to certain evil sites, but that’s not all your hosts file does. It’s really part of the whole system of how names that a human can read are translated into numbers that a computer can use.