Saw this on another message board I frequent:
By “use a reader on a web page and transfer the data”, he means an automated process to gather baseball statistics from sites like ESPN and MLB.com and transfering them to a data file. Is this guy just talking out his ass or what?
Not at all. I have a script (written by a generous SDMB fellow) that “reads” a website every night at 7PM and saves it to my PC. To get baseball stats all you would need to so is program more “page parsing” logic.
Dim oXMLHTTP, sHTML
Dim oFS, oTS, sFileName, sFilePath
Set oXMLHTTP = CreateObject(“MSXML2.XMLHTTP.3.0”)
sHTML = oXMLHTTP.responseText
Set oXMLHTTP = Nothing
sNewHTML = Replace(CStr(sHTML), “/q”, “http://finance.yahoo.com/q”)
sFileName = “up” & CStr(Year(Date)) & “-” & CStr(Month(Date)) & “-” & CStr(Day(Date)) & “.htm”
sFilePath = “C:\Documents and Settings\Administrator\Desktop\StkStuff”
Set oFS = CreateObject(“Scripting.FileSystemObject”)
Set oTS = oFS.CreateTextFile(sFilePath & sFileName,True,False)
Set oTS = Nothing
Set oFS = Nothing
’ Run manually the first time to get Norton’s blessing.
I’ve written many such “screen-scraping” programs in Perl. Pretty trivial stuff, as long as they don’t radically change the layout of the page. (But if they do, all you have to do is change your parsing logic to match.)
Note that “scraping” is falling out of vogue, though, because it requires maintaining the reader to follow changes in the format of the parent page. Any site that’s going to present updated information on a regular basis really should have an RSS feed instead, which can be parsed easily.
Generally scraping is only used as a last resort; sites which want to distribute their information usually provide an XML feed of some sort (usually RSS). But if you need to extract information from a site that doesn’t want to bother, then scraping is the way to go.
Thanks, I knew I could count on Dopers to fill me in.
I did this for an online game I play. There’s an economic portion to it and I wrote a script that would regularly collect the prices, supply and demand on the different items; this way I could plot graphs of said item over time in an effort to gain an edge on the competition.