saving .pdf file as text?

Is it possible to save a .pdf file as either a text or Word file? If so, how?

Not directly, but you can hit CTRL/A to select all, and then copy the text to the clipboard. It can then be pasted into a document.

You can also use the adode website to do a free PDF to HTML & cut & paste from there if you want.
It depends on the complexity of the doc. Just search the web for pdftohtml.

Oh, that should be :slight_smile:

There’s a program called GhostView (you’ll also need GhostScript) that you can download that will perform the conversion.

The only problem with the copy and paste is that you lose the formatting of columns, indents, etc.

It’s not without its benefits, though.


I use a software package called Paper Port, which serves as a sort of driver for my Visioneer brand scanner. I first convert the PDF document into a Paper Port document, and then convert again into Word. It’s very convenient, and IIRC it preserves the column formatting.

Bear in mind a potential pitfall accomanying all of the methods mentioned in this thread: if the author of the PDF document did not perform a “paper capture” in this document, then you won’t be able to copy or convert any text at all–as far as your computer is concerned, it’s just a great big (nontext) image.

And of course the best way to get to the text in a PDF document is to use Adobe Acrobat, if you’re willing to spend around $300-$400 for it–you can capture your own documents and do a format-preserving copy and paste.

Paperport may be using OCR (optical character reconition) which it does use in converting scanned images to editable text - actually i’m 99% certain that’s how paperport does it because paperport stores image files not text. So your pdf file would be subject to ocr error

No, OCR wouldn’t enter into it because the characters have already been “recognized”, assuming that the PDF file has already been through the capture routine.