How to capture output from a web page

I hope I can make my two related questions understandable,

a. I want to write a script or program which will be able to use the search function on a web page and be able to parse the results into a form of my own choosing (onscreen browser interface or a file, i.e. excel file or html file)

Is there an easy way to do this? I don’t mind using a Microsoft product like Access, etc.

Is there a good website out there which will explain how to do this? I learn best by example.

I use a site which uses https (I have to log in) where I can run queries to retrieve visual information from a database into a hyperlinked list format. (I get a list with hyperlinks to images) The site uses javascript or java (you have a choice) and I have to go through several screens of choices before I get my results and the results are only listed 10 per page. Sometimes I have hundreds of results. The process of doing these queries is arduous, not least because the site is SLOW, and I have to sit there sometimes for several minutes waiting for the next screen to come up. It would be nice if I could automate the whole thing so I can go do something else while I’m waiting. In other words, I want to input my parameters into one screen and then get the results in the next instead of clicking through several web pages.

b. I imagine theoretically I could write a script of some sort, but I wouldn’t know how to deal with the https. Is there a site which explains this as well?

In general it’s a pain in the butt to write code to interact with a website. You have to capture samples of the interaction on each page and reverse-engineer any scripting on the page to figure out exactly what messages and parameters you really need to send to the host to trigger the results you need, and then you have to filter out all the extraneous human-interface BS that comes back to locate the meat yuo’re looking for. And you have to do that for each page cycle until you get to the final results page.

And just when you get it working, somebody at the website changes their code a little bit to add one more link or button or whatever and suddenly your program hiccups and you have to reverse-engineer all over again to get it working again.

BUT, assuming you’re a programmer, it’s not impossible.

IE doesn’t have a scripting engine that works the way you want; its java, etc., engines have the pages driving the browser and you want the browser to drive the pages.

But Windows does expose an http object model so you can get at it from VBscript or JScript under the WSH. Here’s the first page of the relevant Microsoft SDK, and here’s some sample code which downloads a file from a site.

If that’s all gibberish to you, well then unfortunately I don’t have a better suggestion. But depending on your scripting skills you may find enough there you can run with it.

Good Luck.