SDMB not crawled by search engines?

Johanna · June 2, 2002, 4:05pm

Ever notice that no matter how many times you Google something, it never picks up any pages from SDMB? Is there some kind of firewall here to keep search engines out?

yabob · June 2, 2002, 4:41pm

I HAVE seen SDMB pages in google occasionally. Not often, I will admit.

There is a recognized standard to preclude crawling by webrobots:

http://www.searchengineworld.com/robots/robots_tutorial.htm

http://boards.straightdope.com/robots.txt does not exist, but http://www.straightdope.com/robots.txt does, and contains:


User-agent: *
Disallow: /bonus/

This suffices to keep well-behaved robots from crawling past the published straight dope “front door”, which might be how they would normally reach the message boards.

There could be some blocks placed against known search engines at other levels, of course, either at a firewall, or simply by IP blocking in vBulletin.

BTW, that “Disallow:” line advertises another path under www.straightdope.com, which produces the straightdope banner and footer with the text “Hey! You’re not supposed to be rooting around in here!” for the content. Does any admin care to comment on what’s in /bonus/?

yabob · June 2, 2002, 4:47pm

DUH!

Excuse me. I’m an idiot. I stated that backwards. The www.straightdope.com/robots.txt file allows all robots in, except that it disallows them from crawling “/bonus”. So, it DOESN’T stop robots from crawling the links from the front page, only down that mysterious /bonus path.

yabob · June 2, 2002, 5:04pm

More on this. There is also a robots <META> tag which is supposed to be honored:

http://searchengineworld.com/metatag/robots.htm

The tag doesn’t seem to be present in the SDMB pages. IP’s of known search engines could still be blocked by other mechanisms, as I said.

femtosecond · June 2, 2002, 7:13pm

Hehe, I found that, too. But you dare to ask? :eek:

This thread may be interesting to you: Why isn’t the SDMB indexed on Google?

No limiting on our side, it seems. At least Google limits its crawling on dynamically generated sites, but we know by now of at least two archiving sites (www.archive.org and www.boardreader.com) which didn’t encounter any form of resistance when spidering our site.

I’m still eager to hear from our ‘board-officials’ if they think losing some bandwidth to traffic generated by crawlers is an issue.

And about the /bonus/ thing.

Topic		Replies	Views
What's up with the way Google indexes the SDMB? Factual Questions	13	1194	August 24, 2004
Why can't I use Google to search the Straight Dope archives? About This Message Board	8	1389	January 8, 2005
Why isn't the SDMB indexed on Google? About This Message Board	34	1812	April 11, 2002
With search broken, why not let search engines in temporarily? About This Message Board	2	777	March 21, 2008
What determines whether a given web page will be included in a Net search? Factual Questions	9	923	September 4, 2000

SDMB not crawled by search engines?

Related topics