Problem: I run a search at a website which gives me pages and pages of links leading to separate data records (none of this is a secure site) I’d like a quick and easy way of automating the download of all of this data into a spreadsheet, database or comma delimited file.
Can anyone point me to a quick solution? I know there are countless ways of achieving my goal. The data is all text.
Do you have EXCEL? If you do you might try under the ‘Data’ menu Import External data|new Web Query. It will allow you to go to the web page and select a table (it will have a yellow box with an arrow in it). It will import the table which you can update periodically.
That won’t work because I don’t want to scroll through hundreds of pages, each with ten links that contain one record each.
I want to create a table of all the final records, not a table of links and I don’t want to spend all day clicking the “more” button for the next page of ten links. There must be some program out there.
I have found a pattern of how the numbers change for each page, I just need an easy way of going through the iterations of the records to grab them. In other words, I could probably generate a list of page addresses; how do I grab the contents to put into a table?
You’re joking, right? The search isn’t on Google, it’s on some public website, a gov’t site that gives you back data links in pages of 10. It doesn’t give you ten records on a page, it gives you links to ten records on a page.
I’ll give you the chance to be rude again but it is only because I know what I am doing. Do you feel like posting the link to a page? A fully developed solution would probably need a programming language like python but there are lots of intermidiate levels of automation for that sort of thing. There no single on that will work however.
I use Autohotkey for most of my quick-and-dirty Web data-mining needs. It’s free, fast, and easy to learn. You should be able to get something running with just a few minutes’ work, then refine it from there. When your program is stable, you can compile it into a stand-alone .EXE for distribution to computers that don’t have Autohotkey installed.
Sometimes it’s trivial to mine pages for data; in more complex cases, it’s simpler to launch the page in the browser, launch the page source code viewer (e.g. Ctrl-U in Firefox), and parse the source for telltale tags and text strings. Autohotkey macros can do all of this, while you go off and do something more productive with your time.
I’ll echo Shagnasty’s request for a link to a typical page that you want to mine.
[Disclaimer: I have no connection with the developers of Autohotkey, except as a satisfied user.]