Web Site Statistics Question

I run a website for non-profit organization and get weekly statistics on activity within the domain and I have two questions concerning the statistics. I have searched the web and have found information on all of the statistics I get except Failed Requests and 404 Document Not Found. Basically my goal is to do some causal analysis and try to prevent as many of the errors / failures as possible.

  1. I am guessing a failed request is a request for a page / image that was not successfully processed / served by the server. My question on Failed Requests is how can I determine what the failures are and what might be causing them. Are Failed Requests cause by my bad coding or is this more of an issue of my web host’s hardware / software, etc…? Would there be more detail on this in my server log?

Is there some Failed Request ratio (failed requests / total requests) that has been deemed acceptable by the internet community. Right now I am running about 10% Failed Requests.

  1. 404 Document Not Found – right, I know what a 404 is but can I try and determine why / where they are occurring? Is there a way to determine what pages are being requested and failing? If the not-founds are a result of someone else’s bad link I would like to contact them to get things corrected. Again would there be more detail on this in my server log or do 404’s not show up in the log? About 10% of my total hits are 404’s.
    I use FrontPage 2000 and have run all of the reports for bad links and don’t really have any in my website (the report returns zero dead links except a few on the links page that go outside the site).

Any guidance or direction in either of these areas is appreciated.

WAG, but I’d bet that a lot of the bad requests are due to various bots that are sent out by everybody from Google to evil merchants of spam. I get a lot on my site, too, and nobody’s ever complained about anything (well, except for the time my forum was sending out corrupt cookies, which gave everybody 500 errors.)

Frontpage probably isn’t helping the situation much.

lets see here it is about 4 AM, so I will do my best, forgive me if the spelling is horid, I am only lit by the screen on the LCD :slight_smile:

404 errors are simply page not found. There are basically 2 ways this can happen, lets say yous site is called www.site.com and someone goes to it, that is loading up Custom Application Development Software for Business - Salesforce.com or something of that nature, if they were to click on a link and there was no page on the server, either you did not ftp it there or you made a wrong link, they get a 404 error page.

If you can say for sure that you have none of this going on, then the next area to look at is in failed image links, same as above scenario, just with images.

But more than likely, here is what is happening, you had the site, it was all built and pretty and worked, and it stayed that way for some time, then, all the little search engines out there went out, sent little programs out to “crawl” your website. The made a “index” or record of the location of all the files that make up your website. Now, when someone searches for your site, they can click on a link and go to it. All is fine and well up to this point. Then, you decide to redo part of the site, or change the name of a few links or files. This is where the search engines barf, they still hold all the links to the old pages that do not exist. Not much you can do but wait for the search engines to update the index. Try it out, search for the site in google and test out the links, see if you get any 404’s, I bet you do.

If your stats show “referrers” you can look in there, and it could also be that others are linking ot your website, and have forgoten to update the links if you changed the pages. You could also see what search engines are sending all the traffic to you and locate the pages, and perhaps put pages in place that “redirect” to the main page. Again, this depends on the level of reports you are getting.

If you can get your ISP to give you the “raw” logs, you are in complete luck, you can find out all sorts of info from them, but not for the newbie, you will need to know a little about internet protocols as well as the type of web server your isp uses to “decode” the log.

Finally, ask your ISP if you are allowed a custom 404 page, if so, make one, either put a redirect on it or a link to the main page. This will not get the logs to stop reporting the 404, but it is nicer on the user.