If the unwanted marks on your pdf pages are consistently in the same place on each page (like in the left or top margin, for example), you can use Acrobat Pro’s crop tool to crop all the pages at once – that way you can crop out the edges of all pages. Might get you cleaner documents very fast.
FWIW, I just used Acrobat 11 to redact the same area on each page of a 1023 page document composed of scanned images, and it worked just fine. You can define a rectangular area to be redacted on the first (or any other) page and automatically copy it to other pages. The process can also be scripted.
The problem is that the pages are never copied exactly same, and not leaving black marks on the pages is one of the things we have as a distinguishing commodity to sell. So it might help, but is not at all the entire solution.
I thought I read that Acrobat Pro could remove anything it didn’t recognize as text. Has anyone else heard that?
That depends on what you mean by text. Images of text, which seems to be what you have, is not considered text in a PDF.
If you have actual text objects in a document, you can use Preflight in Acrobat Pro to separate text and other content (images, vector graphics) onto different layers, and turn off the layers you don’t want to display.
I took one of our scans of copies from books into Pro XI, and ran Text Recognition on it, saved that, then used Preflight to create different layers for text, images, and vector objects. I thought this was exactly what I wanted, but the copied text is still in the Images layer along with seemingly everything else. Turning off the Images layer makes the page blank.
So… do you make much money selling commodities you have no good idea how to provide? As Acrobat is the workbench for document scanning, storage and retrieval, it sounds like you’re a little deep in the game to still be asking how to use its basic features.
I thought the way this worked was you had an image of the page, including the human-readable text. The text layer was transparent text whose purpose was to be selectable and copyable, but would never be visible. So I’d expect turning off the images layer would leave a blank page.
And I’m not asking about scanning, storage, or retrieval. I’m asking about removal of unwanted portions of the page. Thanks for the snark.
We do have Quite Imposing Plus 2 plugin for Acrobat, which does this function very well, but we’d like a cheaper software piece we could use just for the shadow removal. We could give this program to remote workers and save a lot of time of people who know how to do a lot more than cover up shadows.
You might be right. I never heard that was the point of the OCR function. I thought I would be getting the new .pdf as text only. I copied and pasted the new text into a word processing program, and the line breaks remain as paragraph breaks, and the real paragraph breaks are gone. Unusable.