How do I make my webpage invisible to search engines?

drpepper · January 13, 2005, 12:56am

When the page first went up (it’s located in a buried list of subdirectories to a main, visible page with no links to our secret one), the search engines didn’t find it. Now I see that the search engines hit it, but I don’t want them to (it’s for family use only). Any thoughts?

Polycarp · January 13, 2005, 1:00am

Someone who’s an active XML or HTML programmer or an active VBB administrator will have to expand on this, but there’s coding you can insert that effectively tells search engine spiders hitting it to go away. Something like < robots = No Robots > but I’m not sure of the exact coding.

aahala · January 13, 2005, 1:11am

A couple of these q&a may apply:

http://www.google.com/intl/en/webmasters/faq.html

Besides cloaking and meta tags, you could password protect it.

As a practical matter, if you are trying to prevent others from viewing by finding it through a search engine, unless someone is specifically looking for your page, it’s unlikely many will accidentally find it or any harm will come from such accidental viewing.

Nanoda · January 13, 2005, 1:14am

Well, here’s some info on the Robots.txt file.

Any spider engine is supposed to check this first and obey the requests in it, but it can do whatever it wants.
Robots are also supposed to have a userAgent string like “GoogleCrawler” or something, so you could block that, but my spider can call itself “Mozilla 1.1/IE Honestly I Am” if it so desires.

The only other two ways I can think of preventing all automated access would be:

Something that detects the many serial requests a spider would probably make, and ban that IP
Pre-emptivly ban a list of all IPs you expect spiders to be coming from.

Really though, if you’ve put something on the net for public access, eventually someone’s crawler will come by to harvest emails or resumes or whatever the heck else it’s meant to do anyway.

abby · January 13, 2005, 5:48am

robots.txt is the name of the file. It does need to have a specific format, More info here:

http://www.robotstxt.org/wc/exclusion.html#robotstxt

As far as I know, all search engines honour the robots file, even though it’s true they don’t have to; kind of a politeness thing.

You could simply not link to this page on a web site (distribute the link by email) - you might need to change the name of it so if a google user finds it, they get a “page not found” error.

abby

drpepper · January 14, 2005, 3:43am

Thanks, one and all, your responses are really helpful.

Rex_Fenestrarum · January 14, 2005, 5:22am

Sorry if I’m late to the party, but yeah, robots.txt is what you want.

I have my résumé on my website in DOC, PDF, TXT, and HTM formats. It used to be that if you entered my name into Google, all of my personal information (name, address, phone) showed up as the second or third link on the “results page”. I modified my robots.txt file to NOT include that page in any spidering and it disappeared in a few weeks, once Google had re-indexed my site.

Topic		Replies	Views
Blocking Search Engines from web pages Factual Questions	4	822	March 20, 2009
Search Engine / HTML question Factual Questions	14	813	March 20, 2004
Calling webmasters and html gurus: Keeping a site/page from being indexed Factual Questions	6	1003	February 21, 2009
Why Is The White House Hiding All Search References To Iraq on Its Web Site? Factual Questions	11	1169	January 26, 2004
With search broken, why not let search engines in temporarily? About This Message Board	2	777	March 21, 2008

How do I make my webpage invisible to search engines?

Related topics