I’m in the process of converting a 400-page typewritten manuscript into a Word doc. I’m using a scanner equipped with OCR software. It’s a slow, brain-numbing process, because the OCR program makes a lot of errors that must be corrected by hand, but I’m grateful that it can do the heavy lifting so that I don’t have to retype the thing myself. (I’m a miserable and slow one-fingered typist.)
Anyway, things are chugging along splendidly until I get to page 105. In a batch of about six sequential pages that all render fine, page 105 comes out as utter jibberish.
“Huh!” I say. Maybe I placed the page on the scanner upside down. So I try again. Jibberish.
One more time. Again jibberish!
So I try a different page (108). It comes out fine!
What is it about page 105 that makes my scanner and OCR program choke?
Or maybe fingerprints? Coffee stains? Smudges? Ribbon ran low on ink? In other words, there are a lot of possibilities. Might help if we can get a good scan of the page in question, and perhaps of a good page, for purposes of comparison.
First of all, page 105 is stain-free and the paper is identical to all the other pages, so those ideas, I figured, were non-starters.
But I examined the text up close and noticed that there was a mis-typed character in one of the words that rendered like some weird foreign letterform on the page. I thought that maybe this one mishapen letter might be fooling the software into thinking that I had a page filled with non-English text. So, I covered the word containing the bad letter and tried again. This time it come out perfect!
Just to double-check my theory, I uncovered the faulty word and scanned it one more time. Gibberish!*
*Yes, I finally figured out how to spell “gibberish.” Sorry I got it wrong in my first post.
I suspect you have left the default value set to ‘automatically detect the language’. If you are generally scanning only English text, you can set it to ‘English’ to avoid situations like this.