There’s so many books out there and you could never read them all. What is stopping someone from trying to publish a lesser known book that they didn’t write, as their own? Maybe simply changing the title and a few character names?
For example do publishers search part of books on a database like Kindle or is that even possible? Or is there a similar kind of database where that can be checked?
Are there too many people out there where this would never work? Someone would always figure it out after awhile?
As Barbara Cartland got old and (probably) affected by senile demtia, at least one of her books appeared to be copied from Gerogett Heyer. But I don’t know any direct copies of anything (just the author changed) in print publishing.
In sub-genres though, it’s rife. I assume that’s because the big e-book publishers do automatic detection, so it’s only the niche publishers that get caught. For example, it is/was possible to get paid to submit electronic circuit ideas and descriptions. So some people chronically, serially, steal already published material and submit it elsewhere. The same for advertisement-supported websites: before post-truth fiction became such a money spinner, the easiest way to drive traffic to your web site was to steal and re-publish anything interesting you saw.
Publishers of course do some checking. It does cost a lot to edit, publish and market a book, and they don’t want to spend all that money and then have to retract it. Is it perfect, no, but it’s at least better then the checks when someone does “all the work” themselves and publish through amazon: how-eilis-ohanlon-found-out-her-crime-novels-were-swiped-by-a-stranger
There are cases where an author has simply stolen large amounts from little known earlier works. A bunch of them (in this case, in science fiction novels) are given in the link below. There are even cases where someone simply retyped an obscure long out-of-print novel and submitted it to publishers:
It’s occasionally possible these days to check if something is stolen by typing a phrase from it into Google, as long as the phrase is sufficiently different from phrases you would find in any book. So I just grabbed a random book from my shelf (The Game Players of Titan by Philip K. Dick), grabbed a sufficiently distinctive phrase (“the Crofts-Harrison machine tagging along”), and put it into Google. Google immediately found where it came from. I grabbed another book - Art Theory for Beginners by Richard Osborne and Dan Sturgis. I put in the random phrase “classicism returned, in Enlightenment Europe”. Google gave me nothing. I grabbed a third book - A Short History of Myth by Karen Armstrong. I put in the random phrase “henceforth, he will concentrate on building the walls of Uruk”. Google found it. Then the phrase “not to parade myself in front of Robert and Yvonne” from The Life Guard by John Wain. Google didn’t find it. Then “we are not quite done with the Bernoullis” from e: The Story of a Number by Eli Maor. Google found it. Then “the onion and ginger and cook, stirring, until the onion softens, about” from How to Cook Everything by Mark Bittman. Nope. So sometimes this works and sometimes it doesn’t.
Publishing contracts typically include a passage saying you attest that the work is your own, completely your own, not sort of somebody else’s, and hasn’t been published already in any form. That’s followed by a clause in which the author agrees to indemnify the publisher for any financial losses incurred as a result of that being untrue. It doesn’t prevent the publisher from putting out something already published, but they do try to cover themselves financially if it all blows up. But of course, most authors are not wealthy, so they wouldn’t actually be able to recover much in most cases.
I actually have personal experience of this. I was reading a short story and it rapidly began to feel familiar. A quick check and it was a copy of a Dragonlance short story with the setting and the names changed. I emailed the original author to alert him and got a thank you.
For a while, apparently, it was a real problem on self-publishing sites like Amazon. I read several authors’ commentary that it happened with science fiction books, they would find obscure back-catalog (or even current best-sellers not in Kindle store) for sale under a different name by someone else. Funny thing is, it was a serious incentive to get an author’s back catalog into Kindle or similar publishers, so plagiarists could be caught electronically. Someone remarked it was a lot more difficult with porn - apparently there are popular sites with pornographic collections and some people took other people’s work and self-published it.
I recall reading too about one fellow in the UK whose wife became a minor celebrity best-seller in the recorded piano music world, until analysis showed he had actually republished many other obscure piano recordings, and fiddled with speed and such to make it harder to compare electronically.
I wonder, unless there’s a central repository or collaboration between competing businesses, how would one business check with another to verify a lack of plagiarism? I suppose centralizing all that in Amazon has an advantage.
More importantly, it provides a disincentive for unscrupulous authors who might be thinking about submitting plagiarized work. An author may not be wealthy when a publisher seeks damages - but afterward, they will be even less wealthy.
Sounds like it’s finding distinctive names, not phrases. “Uruk”, “Crofts-Harrison”. The whole point of copyright is that the phrases from recent (i.e. last half of 20th century) publications will not generally be online. Otherwise, a book could maybe be reconstructed from its online data.
This is what made Kindle self-publishing such a wild west. People could sign up anonymously, commit plagiarism, make a moderate amount, and probably get a decent amount of money transferred out without having any face to face connection or even being in the same country.
That’s the reason that Google Books and Amazon’s “Look Inside” only give limited portions of books. If the phrase is in that portion you can find it through Google, because it has scanned millions of in-copyright books and will find phrases from them even though it won’t return the full book. The phrases that Wendell Wagner found are from in-copyright books. I assume that phrases not found are from the non-available portions of books. One book was published in England, though, and Google’s rules for those books may be different.
As a general statement, however, “phrases from recent publications will not generally be online” is not true. It’s a big “perhaps.”
So what percentage of a typical relatively current book will be online in Google Books?
(By comparison, IIRC, the plagiarism detection systems used by colleges rely on professors submitting copies of essays they have received for checking, which also get added to the database. As the database expands, the odds of getting caught get better - depending on the comparison algorithm.)
Another case: Little, Brown published Quentin Rowan’s Assassin of Secrets in 2011, and only then found out that the book was constructed of sehntences lifted from other thrillers, often without any or minimal change
This is a huge “it depends.” Some publishers allow 10%, some 20%, some 0%. I couldn’t even say that a publisher is consistent in what it allows to be shown. It probably differs by book, by author, by subject, by phase of moon. Just spend a lot of time on Google Books and you’ll see the variations.
Before ‘A thousand little pieces’ was published as a memoir it was being hawked in the publishing world as a work of fiction. When it didn’t find a taker, it was reworked into a memoir, and was then picked up, and published.
When Oprah broke the scandal that some of it WAS fiction, it was scandalous because the publisher had a signed contract wherein the author attested it was NOT fiction. Publishing contracts do indeed require the authors to attest the work has NOT been previously published.
And if caught out, the author will be pursued through legal channels to return any advances etc. And while they mightn’t be rich, and the publisher mightn’t see all his money returned, that author won’t get published again. So there is incentive to not screw over your publisher.
BTW, in US Patent Law, this issue the OP raises is known as “interference”. Suppose a Patent Examiner is about to allow a patent on a patent application. The last step of the Examiner’s search is a search of cases (applications) pending within the same class/subclass (i.e., akin to the Dewey decimal system of patents). I would say it is rare the Examiner finds anything, but if an inventor ever finds something, it can always wind up in the courts.
Specific to the OP, I should say that if your post suggests you know something as your book is about to release, it is best to come clean with your agent and publisher now rather than later. It can save you (and them) a lot of embarrassment. An agent once told me of such an incident where the publisher caught and rejected a book in time. I can’t recall the details, but the author should have disclosed his “inside information” up-front on a national story already spun into a very similar novel. It could have damaged the publisher’s reputation…if not the agent’s!