My scanner+OCR reads p. 105 as jibberish. ONLY p. 105! What gives?

I’m in the process of converting a 400-page typewritten manuscript into a Word doc. I’m using a scanner equipped with OCR software. It’s a slow, brain-numbing process, because the OCR program makes a lot of errors that must be corrected by hand, but I’m grateful that it can do the heavy lifting so that I don’t have to retype the thing myself. (I’m a miserable and slow one-fingered typist.)

Anyway, things are chugging along splendidly until I get to page 105. In a batch of about six sequential pages that all render fine, page 105 comes out as utter jibberish.

“Huh!” I say. Maybe I placed the page on the scanner upside down. So I try again. Jibberish.

One more time. Again jibberish!

So I try a different page (108). It comes out fine!

What is it about page 105 that makes my scanner and OCR program choke?

Hold on while I warm up my super clairvoyant powers, and I’ll tell you. :wink:

Maybe the paper is different. Try putting a blank piece of paper behind that one.

Or maybe fingerprints? Coffee stains? Smudges? Ribbon ran low on ink? In other words, there are a lot of possibilities. Might help if we can get a good scan of the page in question, and perhaps of a good page, for purposes of comparison.

Okay, I think I’ve got it somewhat figured out.

First of all, page 105 is stain-free and the paper is identical to all the other pages, so those ideas, I figured, were non-starters.

But I examined the text up close and noticed that there was a mis-typed character in one of the words that rendered like some weird foreign letterform on the page. I thought that maybe this one mishapen letter might be fooling the software into thinking that I had a page filled with non-English text. So, I covered the word containing the bad letter and tried again. This time it come out perfect!

Just to double-check my theory, I uncovered the faulty word and scanned it one more time. Gibberish!*

*Yes, I finally figured out how to spell “gibberish.” Sorry I got it wrong in my first post.

Just out of couriosity, what kind of OCR software are you using that’s so easily thrown off?

ABBYY FineReader.

I suspect you have left the default value set to ‘automatically detect the language’. If you are generally scanning only English text, you can set it to ‘English’ to avoid situations like this.

Oddly, no. The program was set for English. Curious, no?