I know there’s quite a few Kindle/Nook/Ipad/Kobo e-reader people out here like myself. I have a Kobo, and after going through a bunch of project Gutenberg stuff, I decided to PAY for a copy of Sir Arthur Conan Doyle’s “Sherlock Holmes and the Valley of Fear.” Two dollars and change. Big whoop.
What I’m astounded at is the number of TYPOS in the first chapter alone! “Soour” in place of “So our,” “iie” in place of “lie,” “iength” in place of “length.”
My first point in making this thread is to bitch about being hosed. Has anyone else had similar e-buyer’s e-remorse?
Second, I’m also curious as to the mechanics of this. How do people make ebooks from public domain novels, or novels that otherwise predate an electronic manuscript? Is there someone in a dark and dingy basement typing this up furiously, presumably never having heard of spell check?
They’re not typed, they’re scanned using software that recognizes the characters. I’ve just been reading from a book of Connie Willis short stories I purchased on Amazon, and obviously no one proofed it after scanning it in. The word “burn” is constantly rendered as “bum,” open quotes frequently appear as “Cc,” periods are missed, and other scanning errors appear on pretty much every page. Really distracting.
I always preview before I download on my Kindle. I have had very good luck with MobileReference versions of public domain books (and I’ve only had the Kindle for a few months). Typos are rare, the books are organized, easy to navigate, with intros and footnotes and bios and illustrations along with the book, etc. It feels as if someone put a professional effort into pulling it together; it’s not just some dope with a scanner and a server.
I have found it is worth spending $5 for the complete works of So-and-so, rather than 99 cents. You get what you pay for, even with public domain books.
Each page of a paper book is scanned on, and the text is automatically recreated via optical character recognition (OCR) software.
A volunteer then gets given the text for a specific page, together with a photograph of that page from the original book. He has to compare the two, and correct any errors.
Once all of the pages have been corrected, the text for the whole book gets put together, and someone else reads through the whole book, checking again for any errors.
And that’s just for a volunteer service that provides free e-books. I would expect e-books for commercial release to have an even higher standard of quality!
Yeah, I actually bought Stephen King’s It for the Kindle because I wanted to reread it and the library seems to have lost all their copies. Good god, the errors! Misplaced italics end tags, odd paragraph breaks, weird compound words, the works. Very frustrating.
It’s not just old books. I bought a copy of Googled for the iPad, and it had random italics, missing punctuation, misspelled words, missing spaces, added spaces…
Actually, that’s the Distributed Proofreading site, which is separate from PG (but very closely associated). And the current practice is to have 3 separate rounds of proofing, with 3 different volunteers working on each page.
Your expectation would go unfulfilled.
Commercial e-books are done the way that makes the most profit, which usually means only automated scans, with no proofing at all. (Actually, running the scans through 2 or more different OCR programs, and comparing the resulting texts can find the great majority of scannos. But commercial e-book makers don’t even bother to do that.) Apparently customers are willing to accept that low quality in e-books.
Has anyone ever tried to return an e-book for a refund because of too many errors? What happens?
[The ones that confuse me are recently-written e-books, where you know the author submitted the original in an electronic file, and that all the editing, galleys, and author proofing was done electronically also. And the e-book comes from the same publisher that produced the hard copy book. So why don’t they just take the final electronic file and use that to produce the e-book, rather than the error-prone scanning.OCR process?]
t-bonham, that’s the way most e-books are created these days in my experience. It’s only the old ones (or, of course, the pirated copies) that are still created by OCRing a hard copy. And even then there’s supposed to be a proofreading, although in some cases, like Amazon’s creation of Topaz files, they may not bother because the actual OCRed text was only for searching–the page displayed for reading was an image of the scanned words, much like a “text under the image” PDF file. Now that Topaz files can be converted to regular e-book formats, though, those OCR errors are much more apparent…
In any case, e-book and OCR errors are the bane of my existence. I wish there was a way to get paid for correcting those errors. So far I’ve submitted four e-book “corrected versions” to publishers. Only one small independent publisher has used my corrections.
This exact book is what I came in to mention. It’s not bad enough to make me regret buying the book, but it is distracting. The book was obviously scanned in by some software that couldn’t always tell the difference between a slash /, the letter l, and the number 1, or between a vv (admittedly, a rare letter combination) and a w.
Dammit, this is not an obscure book. I would have expected the publisher to have an electronic copy of the text, or at least to have proofread it. There’s no excuse for the e-book to be more error-ridden than the printed book. Don’t publishers nowadays print from an electronic copy of the text, rather than typesetting the old-fashioned way?
On the other hand, I’ve gotten some free, public-domain books that look great on my Kindle; so I don’t agree that “you get what you pay for” necessarily.
Yeah, I’ve actually never had a problem with a free book! You know, I think I AM going to complain. Do it with me, Thudlow! (Er, complain. Complain with me.)
That was my thought! The epub file from project Gutenberg was a bit twitchy, which prompted me to get the for-pay one. But going back and looking at P.G.'s “Valley of Fear” shows that those same mistakes aren’t there. So if anything, the pay one was worse than the free one…:mad: