Distorting PDFs to prevent character recognition

Maybe use one of these on the cover sheet or in the body of the paper?

No, but OCR requires a scan, which is the same starting point as all copiers since about 1978, and the red-on-red or pale blue printing is resistant to clean scanning. (Tricks like filter sheets aside.)

You’re talking about two different eras in imaging. Yes, today’s scanners can produce cleaned up text from almost any color combination.

OP is talking about a PDF which is already electronic and hence is already a scan in effect. There’s no need to fiddle about with photocopier filters when the file is already there on your computer.

Possibly, but the idea was to make it relatively difficult to copy the floppy disk and copy the key sheet to pass along. I only remember this from one game, and now many years later, don’t remember which one. :frowning:

Pee… Dee… Eff. Is that something new from them XeroX boys?

A true PDF file is *already *characters and should not need any kind of optical recognition. What the OP is vaguely considering is making a blurry scan of a printed page and embedding that scan in a PDF file. The quaint historical discussion is about the various ways to put printed letters in front of someone but make them uncopyable using reasonable technology. Historically - back when we rode our dinosaurs down to the grocery store to put “coins” in an optical copier to duplicate things - there were some optical tricks that let us do that. In this day, when every mope has a digital camera and image manipulation software, it’s nearly impossible to protect printed material no matter what form, format or platform it uses.

The protection is to subvert piracy at the start by not trying to sell printed material at a price point where a ripoff is a better deal than an original. ETA: Unless you can sell it under an enforceable NDA, signed by each purchaser.

Do you need me to show you the onion in my belt, grandpa?

My point is that there’s no need to resort to photocopier tricks when you can do it electronically. Equally, the end user can use electronic means to reverse or subvert whatever you put there in the first place.

I can type about 50 wpm. And there are those who can easily do 70 wpm.

If the document is still readable by a human being, how do you (the OP) intend to foil simple transcription that would be faster than any sort of digital re-enhancement and OCRing?

Perhaps print black on black?

How about writing it out by hand, preferably in cursive, and preferably written by someone whose handwriting is not too tidy, and then scanning that into a PDF. I bet OCR would not do at all well with that, and people ought to be able to read it (though if the handwriting is poor, they might find it difficult and annoying still).

May I point out to some of the others here that the OP shows that he is aware that there is not way to make copying impossible altogether. He only hopes to make it it a difficult, non-trivial task.

What is it worth to you? If you plan to market this information the degree of difficulty for an honest user must be weighed against any possible loss by a dishonest user. If you set the bar too high, you honest users will just walk away. Your dishonest users will crack your system and redistribute it to others, probably for free (and deliberately so just to annoy you). Your reputation will suffer. What is that worth to you?

NitroPress’s comment up thread about an appropriate “price point” should be heeded. Strike a balance between ease of use by honest users and not worth the effort by potential pirates. It doesn’t matter even if you give away the information, either.

Just have the guy who wrote the check in this thread transcribe your data.

Apparently, even BofA can’t OCR his handwriting…

Are you looking to present this on the web, or sending people copies of PDFs?

If it’s just the Web, you can use FlexPaper or any number of Web apps that display your PDF as a Flash file that doesn’t allow printing or downloading or copy/paste.

What is the format of the presentation? Are you giving folks handouts of some kind? Perhaps it would be helpful if we knew a little more of the details of the project. Maybe there is a different way to provide access to the info that is more conducive to your needs like subscription access to a web site built for the content… handouts have a simple sample but website has more detailed samples/methods

Obviously, but this discussion is apples and oranges and onions. If the text is in electronic character form, absolutely no amount of trickery with the formatting or colors will prevent someone from extracting the content, using freeware tools.

If the content is images that are so distorted that OCR won’t work, humans will likely have trouble with them too. At which point we started discussing old optical tricks to prevent copying, which could work under limited conditions even in a PDF, if the content is images rather than character.

In the end, though, even the most unbreakable imaging-based protection can be subverted by… simply copying the PDF file itself.

There is absolutely no way to present info on the web, or any equivalent, and protect it against duplication. You can make it increasingly difficult at the expense of reader convenience and capability, which becomes the tradeoff.

I agree, if the OP would share the specifics of what he’s trying to accomplish, a reasonable solution might present itself.

This is an interesting idea. I will look into this. Thanks.

There is no price. As I said before, I’m not selling anything.

This is another interesting idea. I appreciate it.

Again, this would be going a bit too far. I want to just dissuade casual OCR.

For better or for worse, my readers are in their late twenties.

This is correct.

Again, I’m not trying to make copying impossible. I believe, however, that having to retype the whole thing would be too much of pain that most people wouldn’t bother.

Images don’t have to be all that distorted for OCR to not work. Ten levels of copies of copies is enough for the OCR that comes with Acrobat to garble about a third of them, which is more than enough to be useless. Copying the file itself is fine.

I am aware of this. What I want is a level of protection equivalent to making copies of copies that doesn’t waste paper.

Can you state exactly what your purpose and goals are? You’re giving it away free and they can copy the file all they want, but you don’t want them creating an editable text file of it. I’m having trouble coming up with a scenario where that’s necessary. Fill us in and we might have better suggestions for you. (I work daily with advanced publishing techniques and technology, and I’m not the only one here.)

Check out this thread over on stackoverflow.com . It talks about OCR-resistant fonts.
Is there a font that can't be recognised by an OCR? - Stack Overflow

OCR can be trained, if necessary. If the fonts are consistently presented, even Jokerman or Giddyup or Ouch can be OCRed with high accuracy. Taking something like Kreepy and blurring it might be successful… but it would be successful in being unreadable by hoo-mans, too.

Ah, but it only has to be done once and the result ‘published’ somewhere. If what you’re doing has any sort of popularity to it and enough people grouse over not having an editable version, they’ll transcribe it once and redistribute freely.

They don’t even need to OCR it. (Or retype.) Just scan it and distribute the scanned image files. It’s not like disks are tiny or internet connections are slow.