It doesn’t appear that way any more. 
jdavis There have been other threads and posts regarding other sites besides Google who are also indexing the content hence making the pages searchable is there any plan to try to stop these other sites?
It doesn’t appear that way any more. 
jdavis There have been other threads and posts regarding other sites besides Google who are also indexing the content hence making the pages searchable is there any plan to try to stop these other sites?
I currently don’t have plans to do anything about other sites ignoring what is a voluntary agreement between all web sites. We already are using the accepted mechanism to specifically disallow them from doing what they are doing and have been for some time. It looks like they are not that interested in voluntary agreements. Can anything be done about a party not abiding by a voluntary agreement? I don’t know. Would even a successful result of getting them to stop making the pages searchable be worth the time and cost involved? I’ll speculate and say it would not.
No reason to throw yourself at the mercy of the social contract of the web. It’s fairly straightforward to protect your content, at as high of a level as apache. (and I see you’re not even blocking BoardReader in the robots.txt! who knows if they’d respect it?)
SetEnvIf User-Agent "BoardReader" NoSDMBForYou
SetEnvIf User-Agent "Googlebot\/2\.1" NoSDMBForYou
SetEnvIf User-Agent "^Slurp" NoSDMBForYou
SetEnvIf User-Agent "^wget" NoSDMBForYou
<location /var/www/sdmb/>
Order Allow,Deny
Allow from all
Deny from env=NoSDMBForYou
</location>
WebmasterWorld has a good robots.txt for locating perps (too much Law & Order), and here’s Apache’s mod_setenvif doc.
Oh sorry, they don’t respect a wildcard block. That’s not very nice…and I forgot a couple circumflexes
SetEnvIf User-Agent "^BoardReader" NoSDMBForYou
SetEnvIf User-Agent "^Googlebot\/2\.1" NoSDMBForYou
SetEnvIf User-Agent "^Slurp" NoSDMBForYou
SetEnvIf User-Agent "^wget" NoSDMBForYou
<location /var/www/sdmb/>
Order Allow,Deny
Allow from all
Deny from env=NoSDMBForYou
</location>
Although “BoardReader” appears to be the entire user agent from what i’ve seen so maybe not necessary…depends on how they identify themselves to you.
alterego, if the given spider is malicious enough to not obey robots.txt, what makes you think it won’t just generate a user agent string on the fly and bypass those kinds of checks, too?
Diogenes, you may now dampen the lantern. We have found jdavis. (If anyone doesn’t get that, GOOGLEIT!) 
Just for shits and giggles I came back tonight in IE. 4 ads for dating services. Must be all the love for jdavis. Fuck those crawlers are accurate!
And as soon as I posted that it went to 1 ad for robots and 2 for web-crawler services.
Hmmm…
Turtle-feltching wax figurines.
I’ll let you know the latest deal in car care products. 
And because you mentioned them in your post there they were again.
I wonder if the bots crawl the quote boxes, or only the original text?
Lemme see here…
Robots lead to Asimov leads to science fiction leads to Star Trek leads to Star Wars…
What do we get now? Mechanical goats?
4 ads for trademark services. (as of the response to the quoted post) I shit you not. This is almost more fun than blocking them. I think I’ve found another angle on this whole ad deal. The mods are so sick of breaking up schoolyard tussles they’ve come up with a way to bring all but about 10 of us together in harmony.
The bastards! 
No worries, I’m sure we can all get back on track once the novelty wears off.
BUSH SUCKS!
[over to you]
BoardReader does identify itself, but it doesn’t obey robots.txt. This could be expanded to a white list. It’s not difficult to check your usage logs and grab a list of the legitimate user-agents that your visitors are using. Someone could spoof it, but it’s pretty obvious when someone is spidering and someone is surfing. You can easily set up a spider trap, employ rate limiting, etc… There are easy ways to do all of this. The goal shouldn’t be 100% - locks keep honest people honest. These are reasonable measures to take if you have valuable content.