Search Engine / HTML question

Is there a command I can place (perhaps in the header) that would prevent a search engine from cataloging my page (e.g. adding it to their search engine)? Thanks!

I don’t believe there is, however, most (if not all) search engines will remove you if you specifically request it.

In the actual HTML, no.

However, you can put a robots.txt file in the root directory of the website (such as http://www.websitename.com/robots.txt) and that should do it. There are some spiders, however, that will ignore a robots.txt file.

For how to construct your robots.txt file, try this: http://www.robotstxt.org/wc/norobots.html .

http://www.google.com/webmasters/3.html
http://www.robotstxt.org/wc/norobots.html

Great minds think alike, eh, Cugel?

Well, here’s the problem. My website uses frames, though Google is cataloging every page but the main one for some bizarre reason. I really don’t mind that it’s indexing them, but it can be troublesome since visitors to said pages are effectively missing half of the site (because of the frames). I’ve remedied the problem somewhat by adding a link to the index on every document; but it still bothers me that Google neglects the home page.

Well damn, in the time it took me to write my last message, I get 3 f’ing replied, haha.

Anyways, thanks guys, I’ll look into those to see if they help.

Not strictly true. As well as the robots.txt, there is a ROBOTS meta-tag that works on a page by page basis, however many search engines ignore it. Google does not.

But that wouldn’t solve your problem, Duderdude2. The reason Google is doing what it does is discussed in the Google webmasters section. It’s because you are using frames. Google is not ignoring your home page - it is returning the URL of the page that actually has the information requested by the search. It doesn’t know that it should be part of a frameset. It will only return your homepage if that is the page that has the data or if it thinks your entire website matches the search query. If you tell the search engine not to look at your sub pages, then you will get no hits at all, because they will never be looked at.

DancingFool

Info on the ROBOTS meta tag.

Robots.txt is your best bet, though; if you find any search engines that aren’t adhering to it, ban them.

As for your specific problem: I don’t think blocking Google from accessing those pages is the answer, especially since it’s the single largest referrer on the 'net. Most users will enter your web site by finding an innermost page on a search engine; this is the nature of the internet and you should design accordingly instead of resisting it.

You could also try using JavaScript to force frames.

Possible ways to address your problem:

  1. Don’t use frames [best, as frames are evil. I know of no commercially successful web site that uses frames.]
  2. Include easily visible links to the frameset page on the framed page, so people who find the framed page in a search engine will be able to navigate to the frameset
  3. Include Javascript code that, when the framed page is opened out of the frameset’s context, opens the frameset instead (in this case the frameset’s navigation must lead users easily to the desired content.

That, indeed, is the problem.

One way to help (but not solve this problem) would be to put appropriate keywords in a meta tag on the index page. This might mean that google will pick this up rather than just the actual data holding pages.

Another thing you can do is use Javascript in the framed pages to make the page load correctly. If you feel like mucking around with things like query strings you can even make a specific page come up (like if Google indexed a content page) when you put the frames back in. But I’d make sure the frames are necessary first; they’re almost always more trouble than they’re worth.

Anyway, this should work. Haven’t tested it though.



<script language="javascript" type="text/javascript">
<!-- // there may still be a couple browsers that can't handle script tags
if (window.frames && window.top == window.self) { // is the page not in a frame?
    location.href = '/index.html';
}
//-->
</script>

If all search engines all followed the rules of the game, yes. However, many do not and the spiders will search every area of your web site, even if you use a robots.txt file and use the META tags to disallow searching. It only takes one search engine to find things you do not want seen and the game is over. It may take time, but it will happen.

If you don’t want something searched, don’t put it on the web.

I agree with the other posters that frames :mad: are the source of your problem. Dump them as soon as you can. This search engine difficulty is only one of the problems they cause. Another example: if an impressed visitor to your website wants to send an enthusistic email to his friends saying “you’ve got to look at this great webiste!”, it’s really hard for them to cut-past your url into their email. So you lose that word-of-mouth recommendation, which is one of the most useful parts of the web.
Real professional web designers mostly stopped using frames circa-1999.