With modern OCR & digital publishing tools how much work to convert print book to digital version?

Let’s say you have copy of a very rare, semi-obscure book in print you have acquired the legal rights to or it’s so old you don’t need rights, and now you wish to make a digital copy for sale available on Amazon and other portals. How much work and expense is involved in taking it from the printed book copy you have in your hand to a decent salable digital version?

What has to be done? Is it a possible “do it yourself” project. Do you need professional help? How do you get from printed book in your hand to Kindle version for sale on Amazon? How does Amazon pay you for sales of the digital book?

If the book is valuable, you could get a flatbed scanner that scans to the edge (I have this one.) If the book isn’t particularly valuable and only rare, you can ship it to these folks who will destructively scan it cheaply. Modern OCR is actually pretty good if you have a good quality scan of clean text (I’ve used a program called Abbyy Finreader.)

The OCR program can attempt to match the original formatting as exactly as possible or export it as raw text. Formatting and copyediting the book is the most difficult and time consuming stage (especially if the book is illustrated) but none of it is something that can’t be done by someone with 2 brain cells to rub together.


It’s all simple and can be done with off the shelf tech, much of it free… except for the step of meticulously proofing the e-text against the original. That’s many hours of eye-breaking work and is best done by at least two people.

And, unfortunately, seems to most often be done by one person in a hurry who doesn’t have the best language skills in the first place.

Jobs like this are all in the details, not the tech. One glitch every few pages is enough to make the result junk.

I’ve done this, for a friend. He’s a published author, with a paperback (or two. Or more.) He had lost his original ms, and only had the paperback to work with.

So, in return for pizza, I sat in front of a scanner and fed the pages in, one by one, watching TV and bored stiff. In total time spent, less than four hours. Some scanners are faster than others.

(I know people who do this for a living!)

Publishing on Kindle is extremely easy these days, especially if the book doesn’t have very complex formatting. (I just did a pop science book with all kinds of charts, illustrations, sidebars, footnotes AND endnotes, and more. Very tricky work. But novels and long-text books are pretty easy.)

When you have a completely clean Word (or other e-text) version of it, ping me and I can help you get through the rest of the process.

There is a forum dedicated to digital copying books if you google “DIY Book Scanner”

There is a shareware Word clone called Atlantis that has a nice export to epub feature (you could then convert that to other formats through Calibre.) You wouldn’t necessarily conciser it a replacement for your current word processor, but it is cheap enough to use it just for the export filter (if you have a habit of making epubs.)

Calibre will also start from .doc, .doc, html and text files… There’s not much need for another tool to get partway there.

Put more broadly, there are are dozens of tools that do some part of the ebook process, but a word processor you’re comfortable with and Calibre are about all you really need. An html/css editor can be useful if you grasp those formats.

Atlantis does it better.

It’s a sufficiently between-the-cracks niche that I’ve never found any one best tool, including InDesign. It’s a lot like the first ten years or so of HTML editors, where you chose the one that did the most while having the fewest drawbacks for your needs.

I find it very odd that a quirky shareware program pretty much rules the roost. In 2017, at least.

I downloaded a free Nook edition of Jules Verne’s obscure novel North and South. The book was obviously digitized using software intended for that purpose, with minimal (if any) operator help.

It’s hilarious. I can only assume that the sioftware encountered pages partly torn out, water damage, and the like, and did its level best to try to accurately reproduce what it was presented with. At random intervals the pretty clear text would dissolve into a wall of badly-done leetspeak, blank spaces, and utter gibberish. Sometimes you could make out what lay behind the hieroglyphics, but most of the time you just had to skip that page and try to pick up further down the text. This was over five years ago.

so certainly inexpensive to free digitizing software was available for scanned books back then, and I can only assume (hope) it’s gotten better.

Free book. You get what you pay for. Unfortunately, it was one of Verne’s worst to start out with, and screwing with the text really didn’t help

I’m putting this one on my list to read at some point.

After all, Verne’s worst is still better than a lot of peoples’ best.

I’m a huge Verne fan, but I honestly can’t recommend it. It’s Jules Verne Tackles the American Civil War, and it’s incredibly unconvincing. It’s sometimes printed as two volumes, Texar the Southerner and Burbank the Northerner, or in one volume as Texar’s Revenge.


I used to love playing that on my Atari 2600.