That was my thought too; what are you doing that requires you to edit a PDF, instead of just getting the original word-processing document?
There is no simple, pushbutton way to successfully convert a PDF document to a properly formatted, useful Word document.
I know Acrobat DC has an “Export to Word” feature, but it’s not that useful. You’ll end up with a Word document with really screwy formatting, with section breaks all over the place and stuff in columns that should be in a table and all kinds of nonsense.
I know that’s not the answer you’re hoping for - sorry. If your pdf is only a few paragraphs of simply formatted text, you’ll probably be OK, but if you have, say, a prospectus or something like that, with hundreds of pages of text and tables, you’re out of luck, unless you have a real word processing department at your disposal.
It is. PDF is meant to be the output of a process, the thing you “print” to when you’re done with whatever editing. (The exception is Adobe Illustrator – the .ai format and .pdf are almost identical internally; there’s actually an option to save .ai files as “PDF compatible” during the save process). PDF is what you send to the printer or provide to the reader so that they don’t need whatever your document editing program is. It’s “baked,” to use the cooking term, just like printing to paper.
So the “right” way to edit a PDF is to go back to the source application, make changes there, and re-export/print to PDF.
Of course, that neat model doesn’t describe real world realities, and sometimes all you’ve got to start from is the PDF or the paper. In that event, you need either very specialized software (various elements of the Acrobat suite or Illustrator) or some converter. But the converter is doing something very similar to what OCR does when starting with a paper document – trying to interpret what it’s got in order to recreate a context that is likely no longer there.
What you get will depend on the quality and breadth of the converter, sure, but also on the specific internals of the document. It’s possible (and sometimes desirable) to create a PDF that’s all but impossible to “take apart,” and for most PDF creation tools, the ability to later edit isn’t a priority – they’re often optimizing for PDF size or rendering speed instead (especially for PDFs meant to be read online).
For a specific example: while the PDF of a text document contains all the glyphs present in that text, it doesn’t necessarily contain all the letters, nor have them in any particular order or in the same numbers as the original text (and if part of the original is clipped in such a way that something isn’t drawn in the PDF, it might not be there at all.) PDF has no concept at all of things like sentences, sections, paragraphs, or blocks of text – just the positioning of glyphs and glyph sequences. Some or all of the text may have been converted into rasters (pixel images,) particularly if it overlaps other image sources. Of course, all that can be undone by sophisticated enough software, but that’s actually a very, very hard computational problem, and your average freeware converter isn’t likely to spend a lot of effort on the “hard” cases.
I have tried PDFElement, Nitro Pro, and Foxit. For my purposes, which involved doing OCR, PDFelement worked the best–however, I bought it, thought it would last forever, and after two years it stopped working. I went to technical help, which said to load the program and then get the updates, but I couldn’t even load the program, and it seemed to me they wanted me to buy something else. I got all kinds of messages about “upgrading to the next better version.”
Foxit worked great for some things but its OCR was sadly lacking.
NitroPro hosed my system and I had to call in my tech support (aka my son) to uninstall that fucker.
Years ago at a job we had Nuance, which worked with something else for OCR purposes. I don’t remember how it was at editing PDFs.
But I have had good luck converting PDF to Word using my Word 2013. Load Word, then go to “open,” find the file, and it will convert it*, and it has always done a great job of converting it. Now once in word the images are kind of hard for me to edit, but that isn’t what I’m usually working on anyway. Sometimes the text comes out in text boxes, but again, it’s easy enough to convert them to plain Word formatting.
*It seems like sometimes there is an extra step required, that involves going to some kind of dropdown box, but I just tried it and I didn’t need to do that, so that may have been something that used to be necessary but isn’t any more.
I/my small business is being sued. My lawyer emailed me a pdf of ‘interrogatories’ or questions from opposing counsel, as well as requests for production etc. He received hard copy from opposing counsel, which he scanned in & they ended up as pdf files. I have to answer the questions & also make comments all over the document. For the questions, I could just answer in a word doc since they are numbered, but I need to write comments/notes all over the document so the lawyer will have a better understanding of the case.
And a huge thank you to you and everyone else who responded. Very informative.
Your lawyer’s office is probably better equipped to convert the pdf to a Word document than you are. Law offices have to deal with this all the time.
Just a quick note to add that if the hardcopy document was scanned into a PDF, chances are that it was scanned as page images, which is not the same as actual extractable, editable, and searchable text, and probably has significant limitations on editing like inserting your comments between the various questions. A PDF document can be either of these formats, the former really being just a set of page images. Acrobat Pro does have an OCR feature that will scan the image pages internally and produce real text, but it probably also introduces numerous errors and would need manual proofreading to clean up.
As a side note, when the DOJ released the Mueller report it was provided in the page-image scanned format I referred to above. The major drawback was that it was impossible to search to find things that people were talking about. Someone – possibly CNN, who were at any rate the first to post it – did the OCR conversion into real text and provided a more useful document that was searchable and from which you could extract the text.
You don’t need a PDF editor for that. Acrobat Reader DC will let you add comments and save it.
Lawyer said he could not convert the document, plus I’m trying to limit billable charges so I’m doing everything possible myself, rather than have them do it & send me a bill.
Very helpful info. Thanks!
When I open the PDF, it opens in ‘Preview’ (mac) and I don’t see any option to let me add comments
ETA: If there is a way to make it open in Acrobat Reader DC, please lmk!
Is that link not working for you?
It is and it worked, sort of (some of the options are grayed out on my computer). But I was able to download pdf Reader and those options were available on that and worked . Thanks!!
It’s likely because your Mac is set up to automatically open any unknown pdf in ‘preview’. In order to open it in Adobe Reader, drag the file onto the Adobe reader icon.
(You can also set your mac up to always open pdf files in reader by selecting the pdf, going to ‘File > Get Info’ and changing the ‘open with’ setting to Adobe Reader. Choose ‘Change all’ before closing the ‘Get Info’ window).
I thought .ai files were PostScript files (or you can select the PDF option nowadays) with the special Illustrator data crammed in there via some trick?
But, yes, PDF was marketed as “electronic paper” that would be portable across platforms. Not sure how well that worked out, having encountered all sorts of messed up PDF files, but at least Acrobat Pro does have an option to verify that a document conforms to a specified PDF/A standard.
In any case, I agree that to merely scribble on the pages, add your own text, highlight, etc., any number of the “reader” apps mentioned can insert those in the file, including Adobe’s own. You can make it so your own annotations appear in bright red, for example.
I learned this one in the painful way: several years ago I was working on a project where we were trying to index tens of thousands of PDF documents that had scientific data and summaries and such in them. It took me quite some time to figure out why everything was so mangled when I tried to copy/paste text from them–it was because of the glyph problem you describe.
I eventually ended up buying a license for a command-line version of a popular OCR engine and running that as part of my application, with not-so-bad results.
Another gotcha with PDFs is that the spec was originally designed to be an original document with deltas appended to the end of the file. In other words, given an old-school edited PDF document, if you are clever you can find the “end of original content” marker in the binary file and truncate the rest, then you will have the original PDF, possibly before someone tried censoring out sensitive information.
It’s the slightly higher tech version of removing the black rectangle someone pasted over the secret stuff in a Word doc.
Folks who know better use proper redaction tools for this, or just scan in the printed doc after manual redaction.