Work problem here. I’ve got an approximately 160 page document that was scanned as a PDF. I need to update it and edit the format. Is there any way I can copy the text into a word or text file so I don’t have to have the entire thing retyped?
It’s a good quality scan, so it seems to me that there should be some way to select and copy the text. and then paste it into a document.
::whimpers at the thought of retyping the whole thing::
Acrobat’s a read-only file type. If you own the full version (not Reader), you can make limited text edits and such, but I’ve never heard of a way (bar jumping through hoops with cracking software and such) of copying and pasting the text from a PDF into an editable format - that’s not what PDF is intended for, so apparently they decided it was unnecessary.
Hopefully someone will prove me wrong, but that’s been my experience in three years working with the damn things.
PDFs are intended to be un-copy-and-paste-able. They are intended to be “secure”, un-editable documents so you can give them to your customer and the customers cannot foobie them up.
Either take **bordelond’s ** advice to find the author or hire a temp. You have my deepest sympathies.
To do this, you want to research Optical Character Recognition (OCR) software. There are a lot of products that do this but, OmniPage is one of the better products out there. As I recall it can take scans you have already made and stored on your pc and covert them to text, but you have to have them saved as some picture format (TIFF as I recall).
The first thing you should do is try to get the original document the PDF was made from. Then ignore the PDF and edit that file.
If the text is in the PDF as text, use the Text Select tool to select it. The icon for the Text Select tool is a capital T. That should work in Reader as well as the full version.
However, if the original was really scanned to make a PDF, what you’ve got is a big graphic. It isn’t text, it’s a picture of some text. In that case, your only hope is to OCR the picture. This will probably involve using the Graphic Select tool (next to the Text Select tool) to select each full-page graphic, then copy-paste it into a program that can do Optical Character Recognition. If the original scans were good enough, and they didn’t JPEG-compress them too much when they made the PDF, you may get the text back.
If neither of these work, then you are definitely going to have to retype.
I wouldn’t hold much hope of getting the layout of the page right with either of these methods. You’ll just have to start over on that part,
Not quite the case. PDF’s can certainly be secured against copy-paste, if that’s what the originator wants, but the format can also be freely copied from if the originator allows it. That’s why there are selection tools in Reader.
If it is a bunch of page scans in a PDF, then you need to extract the images (full-version of Acrobat is easiest, but Illustrator can do it too one page at a time) and then run them though OCR software. What will come out is a Word file that should mostly be correct as far as the text content, but the formatting might need help. How big is the filesize now?
. . . . -or someone who had the software already might offer to do it for you, since if it’s a clean scan it shouldn’t hit the OCR software with many errors–but they wouldn’t be able to do that if your email was disabled, would they?..