…to allow VB programmed string/text manipulation. Is there a way to do this without involving expensive OCX’S?
If your using VB.NET, check out the Microsoft Office interop. It allows .NET programs to programatically call Office for certain functions. If you just want to extract the text from a word doc, then you can just open up the file in the background, select all the text and copy into VB.
However, Office is required to be installed on all machines running that code. That may be a problem.
You don’t say which versions of VB or Word you’re using, but generally speaking the first step would be to add the Word object library (if it’s available) to your project:[ul]
[li]Open the Project menu[/li][li]Click References[/li][li]Scroll down to the Word library (in VS6 it’s “Midrosoft Word 8.0 Object Library”)[/li][li]Check the box and click OK[/li][/ul]This exposes the Word programming interface to your project. I’m not too familiar with Word, though I’ve done quite a bit with Excel; but googling “word visual basic automation” should point you to something with more detail. (I’ve done a quick looksee at MSDN, but they’re concentrating on .NET at the expense of the legacy products. Understandable, I guess. . . .)
If you can’t add the library, I would suggest using a macro within the document. If you’re going to be working with multiple documents, you could create a template—paste the text in, manipulate it, and copy it back out.
I’m not a VB programmer, but if nobody else answers, here are two things you could try:
Use VB for Applications (or whatever it’s called) – Word’s built in macro language – instead of standalone VB.
Use Word or the free Word Viewer to open the files (in the background or visible to the user), then either save it as another document type or use the clipboard to copy and change text. You might have to fake some keyboard or mouse movements for this and I don’t think it’s a very good way…
Maybe this page will help a bit: http://www.vbdotnetheaven.com/Code/Jul2003/2123.asp
The best solution: Wait for somebody who actually knows what they’re talking about
(Oops, that was obviously a late simulpost.)
As others have mentioned probably the best way to approach this is to write the code directly in Word using VBA.
If the machine on which the code needs to run does not have Word installed then you’re into a more complex situation. You really don’t want to get into the whole arena of parsing the native Word file format.
Options I would suggest in this case are either saving the Word documents out as RTF (Rich Text) format or, if you have Word 2003, in XML format.
In the first case you should be able to efficiently process the simpler RTF file using regular expressions (reference the Microsoft VBScript Regular Expressions 5.5 object). If you use XML then the MSXML parser should be handy, though I’d recommend a SAX approach if you’re dealing with large documents.
Problem solved using VBA.