Mods: This thread is not intended as a GD. However, feel free to move it if the Doper responses are more attuned to debate than answering the question.
There is nothing nefarious in the use of a robot.txt file on a web site. On the contrary, a robots.txt file is used to assist search engines so that they do not have to collect information that is irrelevant to users.
However, in viewing the White House robots. txt file one notes that all references to Iraq and only references to Iraq are unavailable to searches by search engines.
No, no. The robots.txt file prevents searches by external search engines. An internal search engine can be manipulated by the site owner to only locate what the owner wants you to see.
Just a WAG, but www.whitehouse.gov is a popular site for people looking for information on iraq. It might make sense to disallow searches for searches on iraq on those pages which don’t contain the term iraq. Even Mr. Bush’s hamsters have their breaking point.
I did a bit of looking at the robot.txt files of various government organizations.
The CIA, FBI, Senate, DOE, Air Force, NASA, Secret Service, Supreme court, Federal Election Commission, Federal Reserve, Homeland Security, and FirstGove sites have no robot.txt files at all.
The House, FDA, NSA, DOJ, USDA, Army, Joint Chiefs, FDIC, and OSHA sites have small restriction files, from a few lines to ~25 in length.
The only other site that approaches the whitehouse in the size of its robot file is the EPA.