Why Is The White House Hiding All Search References To Iraq on Its Web Site?

Duckster · January 25, 2004, 6:30pm

Mods: This thread is not intended as a GD. However, feel free to move it if the Doper responses are more attuned to debate than answering the question.

Background:

Source: http://www.searchengineworld.com/robots/robots_tutorial.htm

There is nothing nefarious in the use of a robot.txt file on a web site. On the contrary, a robots.txt file is used to assist search engines so that they do not have to collect information that is irrelevant to users.

However, in viewing the White House robots. txt file one notes that all references to Iraq and only references to Iraq are unavailable to searches by search engines.

Why is this?

Ringo · January 25, 2004, 6:47pm

I don’t know how the robot file works, but a search on the White House site just now for “Iraq” turned up 1,923 results.

Duckster · January 25, 2004, 6:50pm

No, no. The robots.txt file prevents searches by external search engines. An internal search engine can be manipulated by the site owner to only locate what the owner wants you to see.

Ringo · January 25, 2004, 6:53pm

I don’t think it’s disallowing all references to Iraq; it looks more like a list of specific files and/or subdirectories.

Fear_Itself · January 25, 2004, 6:54pm

Google found 19,800

Shiva · January 25, 2004, 6:55pm

A Google search confined to the whitehouse.gov domain found about 19,900 hits for “Iraq”

http://www.google.com/search?as_q=iraq&num=10&hl=en&ie=UTF-8&oe=UTF-8&btnG=Google+Search&as_epq=&as_oq=&as_eq=&lr=&as_ft=i&as_filetype=&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=whitehouse.gov&safe=off

Tapioca_Dextrin · January 25, 2004, 6:58pm

Just a WAG, but www.whitehouse.gov is a popular site for people looking for information on iraq. It might make sense to disallow searches for searches on iraq on those pages which don’t contain the term iraq. Even Mr. Bush’s hamsters have their breaking point.

e.g the page http://www.whitehouse.gov/firstlady/recipes can safely be assumed to be excluded from an iraq based search.

Mops · January 25, 2004, 6:59pm

This became a topic of public discussion in October of last year.

robots.txt disallows crawling by file name, not by file content. The robots.txt change excludes many file paths that obviously don’t exist:

Disallow: /infocus/everglades/iraq
…
Disallow: /infocus/rx-medicare/iraq
…
Disallow: /infocus/teacherquality/iraq

What the person who ordered this intended to do is anyone’s guess. My bet is managerial stupidity.

typhoon · January 25, 2004, 7:16pm

The question is why would the White House site let you search with the internal search engine but not the external one?

Because the external ones cache pages.

robo99 · January 25, 2004, 9:04pm

Whoever is managing the whitehouse.gov web page could be doing a better job. For instance:

http://www.whitehouse.gov/index2.html

is an old page from June 2003. If I were running their web page I would clean this stuff up fairly regularly.

friedo · January 26, 2004, 12:25am

The external ones also obey robots.txt as a matter of convention. The file doesn’t “force” any search engine to do anything.

Squink · January 26, 2004, 5:28am

I did a bit of looking at the robot.txt files of various government organizations.
The CIA, FBI, Senate, DOE, Air Force, NASA, Secret Service, Supreme court, Federal Election Commission, Federal Reserve, Homeland Security, and FirstGove sites have no robot.txt files at all.
The House, FDA, NSA, DOJ, USDA, Army, Joint Chiefs, FDIC, and OSHA sites have small restriction files, from a few lines to ~25 in length.
The only other site that approaches the whitehouse in the size of its robot file is the EPA.

Topic		Replies	Views
What determines whether a given web page will be included in a Net search? Factual Questions	9	923	September 4, 2000
How do I make my webpage invisible to search engines? Factual Questions	6	5903	January 14, 2005
Search Engine / HTML question Factual Questions	14	813	March 20, 2004
/_vti_bin/shtml.exe/_vti_rpc --why was someone looking for this on my website? Factual Questions	13	11401	November 2, 2002
How can I stop Webcrawlers? Factual Questions	16	1560	September 28, 2001

Why Is The White House Hiding All Search References To Iraq on Its Web Site?

Related topics