I need a way to copy webpage->MSWord

Part of my job is to copy pieces of data from different web pages into MS Word documents. The pages are always in the same exact format. At least ten of the data fields are always in the same place and always get copied to the same location. I’m tired of doing this manually.

Example: Let’s say I process order forms from customers for buckets of paint. For each order, I have a page with about 100 items on them. For the Word document I produce, I need to pull only 15 of those items (e.g. amount of paint, color and brand) off the web and into Word in specific locations of a template I already have built. The other 85 data points can be discarded. I then need to manually add in other data points from other sources, though I don’t see how that affects things.

What I’d love would be a macro that says “Word, go to http://…com/85645786, go to line 36 of the page, copy characters 2 and 3, and paste it in line 56 of the .doc after character 8.” What’s the best way to build a program or macro that will do that? Please note that I’m pretty good with learning code, but I don’t have much knowledge of it already. I’d like the easiest method possible. If this is doable, it’ll save 30 man-hours a day and a lot of monotony.
Thanks!

Perhaps a program like this will work for you. http://www.iopus.com/

I can’t install a program on these computers, and we certainly can’t spend that kind of money.

You want some way to automatically perform a very complex task, that would save $300 per day (conservatively figured at $10 per hour for 30 hours per day), without installing any software and for free?

Good luck with that.

You’re probably not going to find a better solution that the one already offered. Unless you’re an expert programmer with the right development software, it’s highly unlikely that you’re going to manage to build your own.
The other alternative is to find a better source/method for data acquisition than c&p from the web.

Is the data you’re pulling from your webpages or someone else’s?

If it’s yours, I’d recommend going to your IT dept with a data request. That information is in a database somewhere, and it’s a hella lot easier to get data from a database than copy & paste from the web.

If it’s someone else’s data you’re using, then my best recommendation would be to set up a spreadsheet and store the data there for use, rather than c&p each time from the web. You’d have to figure out how often to check for updates, but if you’ve accurately represented the type of data, it wouldn’t be changing that often. Alternatively, if you’re pulling from manufacturer websites, contact them and ask for a datafile of their product data. They might laugh at you, but it’s worth a try.

I agree with redtail23.

If you really need something free, look around for something called a “Web Scraper” … something like this. Of course, you have to download it so I’m not sure how it would work for you…but you might also find some code out there so you can build your own Web Scraper and run it from your machine.

It’s the government’s database. I have to go in and pull up records. I copy out parts of that record and stick it in Word. Then I do some other stuff to the Word document and send it out to be edited. I never revisit the same record, so putting it into a spreadsheet wouldn’t do me any good.

Could someone explain why this is a complex task? I plan to manually feed the program the correct URL. I just want it to return character 3486 and stick it in position 7248. If I can hyperlink with ease, I would have thought this would be pretty easy. No simple batch file or similar script could do this, you say?

ETA: The reason I can’t buy a program or install it is that the computers we use are classified. They don’t touch the NIPRNET and you can’t install anything without it going through a billion levels of bureaucracy first.

You can do this in VBScript without installing anything. Below is an example which scrapes a page’s raw HTML to a Word document.

Paste it into a text file with a .vbs extension and customize the value of the strURL variable with the page’s URL.

Getting the exact content you want and inserting it into the right spot in an existing document will take some fiddling.

Please don’t use this in any way which is contrary to the website’s terms of service or your IT department’s guidelines.


On Error Resume Next
 
strURL = "http://www.example.com"
strOutputFile = "output.doc"
 
Set objXmlHttp = CreateObject("Msxml2.XMLHttp")
objXmlHttp.Open "POST", strURL, False
objXmlHttp.onreadystatechange = getref("ReportStatus")
objXmlHttp.Send

Function ReportStatus()
    If (ObjXmlHttp.readyState = 4) Then
        strText = objXmlHttp.responseText
           
        Set objWord = CreateObject("Word.Application")
        objWord.Visible = True
        
        Set objDoc = objWord.Documents.Add()
        Set objSelection = objWord.Selection
        objSelection.TypeText(strText)
    End If
End Function

It’s that fiddling that I need help with. On the page, I have the option of exporting to an HTML view, ICML, TSV, PDF, KMZ, or saving as a file.

What about this: I can copy and past the code into Word to Notebook or something. Then have a macro scrape through that and pull out the stuff I want, placing it in the word document? If possible, it would suffice to have one that says “Go find this keyword, copy the word just after it, and place it here” Then I would just build 10 or 15 of these macros and execute them in order.

Pretty pretty please?

Well, that’s the rub: how many data fields are you pulling, because even is it eleven such fields, where ten are in the same place all the time and the eleventh field’s place varies, I don’t see how this macro will save an appreciable amount of time (I take it that the marginal effort of copying ten more fields is negligible).

You say it’s a government database. You probably could request under FOIA and pay the government a small fee to run the query on their database (rather than copy it through the public-facing webpage) and send it to you on CD-ROM. I did this very thing with the DOL’s pension supervision records years ago. It cost like $35 or $350 or something, as I recall.

It looks like you might be able to DOM/XPath stuff in VBScript:

http://techdos.com/content/view/77/69/

Note: I only skimmed the article, very briefly. But it might help.

I’d like to look more into this, but it’s hard without access to the data in question (or more importantly, the format that data is in). In some cases XPath would work better, in other cases regular expressions, and in yet other cases a simple substring.

I know the stuff you’re working on is classified, but could you modify the data (make it all burgers and milkshakes) while retaining the HTML layout and post the source somewhere? A declassified version of your Word template would be helpful too, though the HTML is more important.

If you aren’t allowed to buy or install anything on a classified computer, your bosses prolly aren’t going to be happy about you programing it either.

I do similar things at work all the time with Excel. I have reports where the formatted report is one worksheet and the data for the report is pasted straight into other worksheets. So I export the data from one source as a text file, dump it in a raw data worksheet and the correct cell of the report worksheets says, for example =left(rawdata1!,2,6). I sometimes use vlookups for the purpose and have reports that are made up of 6 different sources. The end user only gets the formatted report and the template is saved.

I think you’re overlooking the spirit of it. Any Word macro can jeopardize your security, big time. (Used to be that half of viruses were in this form.) But if you don’t care, and you just don’t want it in your Add/Remove Programs, then you can have any old stand-alone executable made for you (or vbs macro, if you prefer). Programmers on a site like RentACoder will be happy to create one to your requirements for under (probably well under) $100. I’ve had the same done myself. It’s trivial for them, just be precise in what you want.

I think – or at least hope – that he’s trying to build a macro himself (with help from people here) so that he can see and understand the entirety of the source before running it.

I guess that could apply to purchased macros too, but not .exes.

Since you say you’re using classified computers, perhaps you need to go through those levels of bureaaucracy.

Closed.

samclem Moderator, General Questions