How do I make my webpage invisible to search engines?

When the page first went up (it’s located in a buried list of subdirectories to a main, visible page with no links to our secret one), the search engines didn’t find it. Now I see that the search engines hit it, but I don’t want them to (it’s for family use only). Any thoughts?

Someone who’s an active XML or HTML programmer or an active VBB administrator will have to expand on this, but there’s coding you can insert that effectively tells search engine spiders hitting it to go away. Something like < robots = No Robots > but I’m not sure of the exact coding.

A couple of these q&a may apply:

http://www.google.com/intl/en/webmasters/faq.html

Besides cloaking and meta tags, you could password protect it.

As a practical matter, if you are trying to prevent others from viewing by finding it through a search engine, unless someone is specifically looking for your page, it’s unlikely many will accidentally find it or any harm will come from such accidental viewing.

Well, here’s some info on the Robots.txt file.

Any spider engine is supposed to check this first and obey the requests in it, but it can do whatever it wants.
Robots are also supposed to have a userAgent string like “GoogleCrawler” or something, so you could block that, but my spider can call itself “Mozilla 1.1/IE Honestly I Am” if it so desires.

The only other two ways I can think of preventing all automated access would be:

  1. Something that detects the many serial requests a spider would probably make, and ban that IP
  2. Pre-emptivly ban a list of all IPs you expect spiders to be coming from.

Really though, if you’ve put something on the net for public access, eventually someone’s crawler will come by to harvest emails or resumes or whatever the heck else it’s meant to do anyway.

robots.txt is the name of the file. It does need to have a specific format, More info here:

http://www.robotstxt.org/wc/exclusion.html#robotstxt

As far as I know, all search engines honour the robots file, even though it’s true they don’t have to; kind of a politeness thing.

You could simply not link to this page on a web site (distribute the link by email) - you might need to change the name of it so if a google user finds it, they get a “page not found” error.

abby

Thanks, one and all, your responses are really helpful.

Sorry if I’m late to the party, but yeah, robots.txt is what you want.

I have my résumé on my website in DOC, PDF, TXT, and HTM formats. It used to be that if you entered my name into Google, all of my personal information (name, address, phone) showed up as the second or third link on the “results page”. I modified my robots.txt file to NOT include that page in any spidering and it disappeared in a few weeks, once Google had re-indexed my site.