I’ve got a small number of pdfs that I would like to have in Word format. Can anyone recommend a good conversion utility? I want freeware or modestly priced shareware, I don’t want to pay $70 for something I’ll use only once. A demo version that expires after a few days would be fine, as long as it’s fully functional.
I’ve searched and found a few but none I’ve tried are suitable, having one or more of the following problems:
[li]limited to 4 pages[/li][li]output is text only, does not include the graphics from the pdf.[/li][li]outputs a series of forms text objects, rather than editable word processor text.[/li][li]demo version inserts random characters into the output.[/li][li]$70 registration fee required.[/li][/ul]
I haven’t tried it, but I did use cuteFTP a while back and was very pleased with it.
FWIW, I usually just cut and paste text when I need to quote it from a .pdf. Only I just tried to C&P a page with graphics on it into Word and the graphics did not transfer. (I did get all of the text by using Select all, and I did not try copying and pasting the image itself.)
I used something called, I think Able2Extract, which supposedly could be set to copy text only, enhanced text (I think that was bold, italics, underlining, etc.), or graphics. But if it did graphics, it did the whole page as a graphic, so the text could not then be edited. With the other options the graphics came out as either blobs or boxes.
It wasn’t free, but it wasn’t terribly expensive, either. The trial version, which we used first, would only do four pages at a time and IIRC had some kind of watermark that had to be dealt with.
I believe that a new version of Open Office should be able to do it. We use the Star Office version, and I’ve been told the new release will allow you to edit pdfs, which should allow export to Word. I haven’t had time to try the beta yet, though.
I just downloaded hellopdf.com. The free version only converts 3 pages and a maximum of 3 conversion per document. So I’m stuck. The pay version is 39.95. I suspect the other “free” converters offer the same deal…
Acrobat was intended for make the formatting of a document–pagination, layout, font–constant, no matter where it might be viewed or printed. It is possible to inhibit copying of text out of a PDF, but that wasn’t the main purpose.
The reason that there are very few tools to do this is that it is very, very difficult due to the nature of a PDF. It is not another document format (like Word or Wordperfect or OpenOffice). It is a set of instructions for the printing (or display) of a page. And the generation of the Postscript is geared towards optimal printing, not editing. This means that individual characters may be individually located on the page, thus breaking up words. Graphics may be Postscript vectors, compressed bitmaps, and other things. Trying to work backwards to convert page output to something useful can be nearly impossible, as word processor primitives do not map to the postscript ones.
Thus there are plenty of restrictions and few comprehensive solutions, and they cost money. If you want the tools, you will probably have to pay - or wait for OpenOffice 3.0.
Have you tried the Adobe full Acrobat package? Although I don’t use it, I understand it has many features and I think this is one of them. Could cost a few bucks, or whatever you call money down there, but might be useful for other tasks, too.
Postscript stores text as text internally, and PDF is very close to Postscript. It shouldn’t be hard for a programmer conversant in the language to extract it, and it certainly is not impossible.
Not necessarily. Sometimes text may be converted to a postscript path if the driver decides to. But the main problem is that the text elements in the PDF iare not necessarily in word order. And spaces don’t have to exist if the word spacing has been modified by justification. So just finding the letters does not help, you need to build up the whole page and scan along the lines to figure out the words and spaces - a bit like OCR.
I am not saying that the problem is insoluble, but it really is hard to solve. So few people have done the job, and expect some money for their efforts.