Recommend me a PDF-to_Word document converter

I’ve been asked to find an application which will convert a PDF file over to a Word-compatible document file and vice-versa. This program should retain formatting of text as well as graphical content during the conversion process. I should also mention that the application that we’re looking for will most likely be running on a Windows platform machine.

Google turns up quite a few applications that say that they will do this, but I want someone with first-hand experience to give me a quick review on what they use.

So, tell me about your PDF converter program and how well it works. Thanks in advance for your assistance!

Word-to-PDF: Adobe Acrobat. Not the Reader, the full Acrobat. I’ve seen other shareware or open-source versions, but some of these have IME, compatibility problems on NT or XP.

PDF-to-Word: Your options are very limited. It is chosen by companies for publishing Web documents in part because of the inability of others to edit PDF documents. The PDF file is basically a wrapper around the PCL that your printer understands. Essentially, PDF allows the Reader to interpret the printer commands and display an image of the document. I haven’t seen any satisfactory solutions for a true file conversion.

Word to PDF:
Ghostscript. This works well, but is strictly command line stuff. You’ll want to add Ghostview to make life easier. Ghostview is a GUI for Ghostscript. Install a driver for a postscript printer (ones from HP work fine, you can also download the Adobe generic postscript driver for free,) and set it up to print to a file. Print your Word document useing the fake printer, then open the “printed” file with Ghostview, then export as PDF.

OpenOffice.org V1.1 can import Word and export PDF. This works well, but check to see that OOo correctly imported your Word document. Sometimes it doesn’t quite get it right.

Get your local computer guy to install a fake printer on your Linux server and have it route through Ghostscript and dump the finished PDF into your Home directory.
PDF to Word:
Nothing that works worth a damn. PDF describes where to put stuff on the page so that it looks right no matter where you print it. It does not preserve the logical structure of the document. The conversion programs have to guess what is what, and they screw up rather more than not. The situation is almost as bad as trying to scan a document and then convert it to a Word doc. You don’t have the OCR to bother with, but the program still has a mountain of work to do figuring out the page layout.

Thanks for the input! I knew that conversion from Word to PDF is relatively easy, but I appreciate the recommendations and information. I’ll relay your opinions to our Publishing department and let them know that the conversion process will probably end up being hit-or-miss no matter what package is chosen.

Any other opinions out there?

PDFCreator is a great open source conversion utility for Windows. It creates a virtual printer which you can use to make a PDF from any program. I haven’t had any significant problems with it under Win2K or XP.

I don’t know of any good way to convert a PDF to Word format.

There is a fairly laborious way to convert some PDFs to Word. However, the text will absolutely have to be reformatted (and possibly even re-entered). You also have to deal with matching the fonts of the original PDF’s text, if that is a priority. Lastly, if the PDF has text running across photos or other inconsistent backgrounds, or if the text runs in any direction but straight up-and-down like a newspaper, then you are basically out of luck. If the PDF text runs only over white or solid-colored background, and is oriented straight up-and-down, then you are in business.

First off, a PDF can be opened in Adobe Photoshop as a TIFF file (or opened in Adobe Illustrator as a native Illustrator file). In Photoshop, you’ll choose the resolution before the file opens – if this is something that has to be printed (whether from a printing press or an office printer) later, you need to choose a fairly high dpi (at minumum 300 dpi).

The PDF text will be a part of the TIFF file, and will be completely uneditable. If you need to modify the text in any way, you will have to erase all the areas with text from within Photoshop (assuming consistent background behind the text). With text over a white background, you would “paint over” the text in white. With text over a solid colored background, you would use the Eyedropper tool to select the background color, then “paint over” the text in the background color.

What you are left with is a page-size TIFF containing all of the PDF’s graphic elements, and none of the PDF’s text. You can bring this TIFF into Word, and then enter in all of the text over the TIFF. This would be world’s easier to perform in Quark Xpress, Pagemaker, or maybe even MS Publisher … bbut if Word is what you’ve got, it’s what you’ve got.

Often, you can copy and paste text from a PDF. This is not invariably true, IIRC, but true often enough. So it’s likely you won’t have to re-type all the text from the PDF. You will have to hope that the kind of formatting you need to do to the text can be performed in Word. If it’s straight up-and-down text aas you’d see in a newspaper or textbook, then you can probably manage. Word can do columns … not sure about columns of different widths. Also not sure if Word has any kind of a text-box feature, which would be needed for things like photo captions inset into the main body of text. The original PDF really has to have simple formatting for this to work using Word.

Guy Incognito, if I may ask: why do your Publishing folks want to convert from PDF to Word? If they need a PDF modified, it would be easier to ask the party that created the PDF to re-create the PDF with the changes you guys need. A PDF is meant to essentially be an immutable end product. Perhaps you guys need to ask for different kinds of files from the other end. You really don’t want to get too involved in re-creating docs from PDFs (especially in Word!) if there is any reasonable way around it.

Basically, PDF is a form of electronic paper, in the sense that it is how some people output information from their software. You should not expect it to be all that much easier to convert a PDF to a Word file than it is to convert a page out of a magazine.

The format was just not made to be the middle step in any workflow. It is only supposed to be the end product. More precisely, it is supposed to be the step before display or printing, whichever is the actual last step.

I’ll have to agree that if you want to edit the entire content of a PDF in the original format, you need to get the original file that was made into a PDF. If the original is not available, recreate an original in the appropriate application so you’ll have one around next time you need it.

      • How exactly are you people saving docs in OpenOffice as PDF’s? I have version 1.01 (downloaded quite some time back) and that option is not listed for me at all, and there is no “export” choice on any of the menus that I can find…??? I find no mention of “PDF” in the help files at all, only a way to print a postscript file by printing-to-file…
        —[not that I need it, but I have been wondering this since the last PDF thread]
  • Anyway, I will agree that trying to convert a PDF into a Word doc is a crap shoot, because the Word doc is not intended to support the (more numerous) types of document formatting that PDF’s are. I have never heard of a “good” program for this.
    ~

It’s not that Word lacks the support for PDFs. Neither ‘supports’ other types of documents. They’re two different formats, intended for different purposes. PDF was designed to be ‘creatable’ from third-party systems and software. It wasn’t designed to provide the information necessary to make an editable document.

PDFs lack the formatting that is necessary for a word processing approach to be possible for the file. There’s no “good” program to create a word processible document from a PDF because it would require an astonishing amount of computation, bordering on AI to interpret things which can only make sense in the context of language.

I ask the same question as another respondent: Why do you want to do this? Perhaps I can recommend a better solution that PDF to Word. Since you say “I’ve been asked to find”, perhaps I can give you advice that will prove that the requestor is more than a few keys short of a typewriter…

      • What I meant was that what appears to be a very simple document in a PDF (a single-layer doc with 3 text columns across, with a couple images set into the margins between text columns, intruding partially into the text columns on both sides) can be impossible to duplicate in Word. This layout is fairly common in magazine print, yet Word basically can’t do it at all. Word simply does not have the formatting capabilities that Postscript allows. For one example, in Word you can only embed an image inside a text object–you cannot have a free-floating image. And so, this is the reason Word doesn’t let you put an image anywhere you want, and Postscript/PDF does.
  • And you are not-necessarily-correct, as far as parsing PDF’s [that contain text] goes: there are multiple text frames in a layout/PDF/Postscript doc, but (in English-language and various other versions) normally the text is automatically “threaded”–it begins at the top-left of the first frame, runs downwards line-by-line to the end of the first frame, and then automatically begins at the top-left line of the next text frame, until the end. You can create and fill free-standing text frames, but there are practical reasons why this is not done–particularly with big documents that are normally created in a word-processing program, and then c&p’d into the first frame of the layout program’s document. Then either the user manually adds threaded pages, or the layout program keeps adding threaded pages onto the document until all the text is visible. The fact that the text automatically threads is what makes the format so easy to use in the first place, and attempting to use the text frames out of order defeats that.
    ~

While it’s simple to create a PDF document from a MSWord document, it’s almost impossible to convert a document back to MSWord.

A lot has to do with the way you create your PDF document. If you convert to PDF, you can cut and paste into MSWord, but you’ll have to do all the formatting over again. If, however, you SCAN to PDF, you can’t even do that much. You might be able to view the document in another editor and then cut and paste but all I’ve ever gotten was gibberish with the editors I have. Either way, you have a lot of work ahead of you when it comes to reformatting a difficult document. As has already been mentioned, you need the full version of Adobe Acrobat to do anything but look.

OOo V 1.1 and later have this option, earlier versions didn’t.

Word itself does conversions in both directions.

In Word there is an Export function.

You can open a PDF document in Word. Simple documents that were originally developed in Word work best. Documents with more complex formatting will not be converted properly for much editing. This is particularly true of things like fancy résumés or newsletters that were made with layout software.

These things were not true 16 years ago, when the question was asked. :roll_eyes:

Whoops! :dizzy_face: Missed that. I usually don’t contribute to zombies.

I wonder if we had a mass necromancy episode caused by a spammer hitting PDF-related threads. I’ve seen several refreshed recently with no actual new post, a hallmark of a spammed thread that’s been cleaned up by a mod.

Flagged

Yeah, I didn’t go rooting around for old threads, I was looking at ones that had recent activity.