A non-technical friend of mine recently asked me if there was some way to find out when a page on the Web was last updated. My initial response was to check “File->Properties” in IE, but I quickly realized that this is not at all helpful. I tried checking the source code of a few pages, but most didn’t have the update date embedded anywhere.
Is this actually a retrievable piece of information, or is there no way to tell when a web page was last updated without having access to the server on which it is stored?
This is assuming, of course, that the web page does not go out of their way to post the date or display it using some fancy tag/Java/etc.
Unfortunately, you are at the mercy of the developer of the web site in question. It’s considered netiquette to post that information at the top or bottom of your page, but thats all voluntary. Some HTML editors may time code the work, but once again not all.
You can try to look up the directory listing of the page, but most sites are configured to block you from the directory contents.
So I’d say, no.
(Though a lot of search engines use ‘spiders’ to catalogue websites and they need to know if a webpage has changed recently. Maybe if you search through a site on Google, they can figure when a page was updated last.)
The browser must have a way to find out, since it has ways to “synchronize” your “offline” content.
I’m sure you can get the browser to give you the info.
It may only give you the info if you ask for the download, though, in which case you no longer care, since you’ve bought the overhead.
When you request a web page, extra hidden HTTP headers are sent to your browser. They contain all kinds of interesting information like the date “Last-modified”. Here is a list: http://www.w3.org/Protocols/HTTP/Object_Headers.html Developers can also add their own custom headers according to the specifications. Unless the web page developer displays this info on the page, you’ll have to use a bit of code to read the last-modified date. I’m assuming this is beyond the scope of the OP, so I won’t post any code here. If it’s really important, email me, and I’ll whip up a code snippet. The headers that your browser receives are entirely dependent on the web server hosting the files. For example, pages that produce dynamic content, like Active Server Pages, do not send the date stamp to the client browser, because that information is (mostly) irrelevant.
Try View>Page Info in Netscape. The bottom pane will show you the value of the Last Modified field that the server gave you (as described by evilhanz). This only works on pages where it works. On the rest there is no way to find out the last modifed date if it’s not explicitly given.
FYI, Soul of the Machine, the way the browser synchronizes is to send a “conditional GET” request to the server. This just says “get this page if it’s been modified since this date”. The browser knows the date on the stored copy of the page that it has, and just asks whether it’s changed since then. It doesn’t need to know the date the page changed.
And of course, you have no idea if the date is correct. Since the date is a 32bit number, one interesting proposal I read suggested using it to track website visitors (browser sends document date back to site to check for new versions).
Each visitor is sent a different date, uniquely identifying them even if they block cookies, and are using a proxy.
If the site has a visitor counter, will it register as being updated every time someone visits?