Third-party utility for converting PDFs to Word files?

Every once in a while I need to get large amounts of text from a PDF. A few paragraphs or pages aren’t a big deal, but 40, 50 or 100+ pages are a nightmare. Acrobat has an export and save as utility, but it’s fairly weak. I know nothing is perfect, but I was wondering if a third-party app would do a better job. Any suggestions?

Thanks,

Rhythm

Oh, if it makes a difference, I occasionally have access to the Quark or InDesign files (and both programs are installed on this machine).

PDF files can be limited by their creator as to what the recepient can and cannot do with the data contained there-in. If the files are protected no simple software can override those protections put in place. That’s why in some PDF files you can highlight text and copy and paste it, but in others you cannot.

No protection on the files–in many case we’re the ones who created them. We often have the original Word files, but they rarely match the PDF (revisions during layout get made to the InDesign/Quark file). But after layout, Acrobat saves the Word file with pictures, pull-quotes, etc. inside the document, and there are lots of spacing issues.

I assume it’s a small feature of Acrobat so they don’t spend time improving it (plus, why would they want to make it easy to move from Acrobat to Word?). I know there are Reader substitutes, and I believe non-Adobe programs to create PDFs–have any of them paid more attention to this?

Alternatively, would starting from the original Quark/InDesign (if available) be a better route to go? Again, the programs are installed on both Macs and PCs here.

For what it’s worth, the export to Word with Acrobat 10 (X) is supposed to be significantly improved. You can download a trial version to test it out.

If it’s just the text you’re after (no formatting), then pdftotext will probably do the job. It’s part of Xpdf, and is Free Software.

Yes. Too much information gets lost when rendering the page markup of a DTP package to PDF. You are always better using the source file/application rather than the PDF output.

For example, a DTP package may kern text in very specific ways. In the PDF output, this may result in words being broken into fragments that are specifically placed on the output page (and maybe not in order). This will make it difficult for any PDF_to_Text tool to reassemble into a text stream. However, the DTP package does know that there was a complete word in the original text. Similar factors apply to styles, fonts and other formatting features.

Many PDF generators do use a full text stream, and text can be extracted. But not all, and the more complex the original tool used, the less likely it is that you can extract useful text information from the PDF.

Si

I have had good results with ABBY, but of course your choice will depend on many factors. Check out this list - you should be able to find something you like.