New Amazon.com feature text searches ever word in every book they sell! Wow!

astro · October 23, 2003, 2:49pm

Go to amazon.com for specs on the new feature

davidm · October 23, 2003, 3:31pm

It’s not quite every book they sell, but WOW! It is very cool. I wonder if dopers will be using it a lot to research and foonote their debates.

hawthorne · October 23, 2003, 3:52pm

Interesting. I don’t know whether they’ve got it quite right yet, but it’s powerful stuff.

[Ego-search test] It found references to records I’m on, but failed to find a reference to me in a book they have that I wrote a chapter of. It did find an out-of-print biography of an ancestor and numerous references to him in appropriate places.

Well worth keeping an eye on.

randwill · October 23, 2003, 4:02pm

How could this be? It would require a huge team of people transcribing every book in their inventory, from Ansel Adams photobooks to every version of the Bible.

If I search “Dad, your poll numbers are down.” would I get hits for all the Calvin and Hobbes anthologies?

MovieMogul · October 23, 2003, 4:02pm

Wow, I’m impressed. I put in the search words: Norman McLaren Begone Dull Care) and came up with 22 entries! Very cool.

There seems to be a ceiling of 120 on the search results. Plus, you can download the page in question if you have an account (a little screen comes up that says Authorizing Copyright, so maybe not all pages are eligible for this feature).

MovieMogul · October 23, 2003, 4:10pm

Well, I put in that quote along with search terms: Calvin Hobbes

And the first 10 items were all by Bill Watterson (without the C&H, none of the first 10 were; probably too many generic words).

Chronos · October 23, 2003, 4:29pm

Well, most of the books probably didn’t need to be transcribed. Nearly all books anymore are written, submitted to the publisher, and/or typeset electronically, so the electronic versions already exist. They just had to deal with the publishers to get ahold of those electronic versions.

As for other books (older books, or things like comic strip anthologies where the text isn’t typeset), if they list them at all, it’s probably just unproofread OCR. You’ll get a significant number of errors that way (somewhere in the vicinity of 1 in 100?), bad enough that you probably wouldn’t enjoy reading the book through that way, but still good enough that you’re likely to hit your search results.

Archive Guy, that seems like a mightily poorly controlled experiment, to me. It seems like putting any search terms together with Calvin Hobbes would give you a good representation of Bill Watterson books. And the generic words in “Dad, your poll numbers are down” should be OK, since they’re in an exact phrase, and that phrase isn’t particularly generic.

tanstaafl · October 23, 2003, 6:06pm

“Dad, your poll numbers are down” (with quotes as shown) produces zero results.

Dad, your poll numbers are down (without the quote) produces 120 results but none of them are Calvin and Hobbes. (At least, not on the first few pages.) Without quotes it seems to find results where all the words are more or less close to each other.

AHunter3 · October 24, 2003, 12:49am

It’s a cool idea but they sorely need a boolean search engine to do useful searches on that much data.

pokey · October 24, 2003, 1:02am

It’s hard for me to admit that first phrase that came into my head to search for was “nail in bum” and yet I am forced to admit it because it’s so cool that when I searced for it, The Straight Dope: A Compendium of Human Knowledge came in number 7.

Cardinal · October 24, 2003, 7:26am

If you read the “Wired” article, you find that they’re not just getting the text from the publishers, they’re actually scanning the books and using OCR. Sometimes they ship the books to low-wage companies, and sometimes they chop the bindings and use automated scanners.

Shalmanese · October 24, 2003, 1:05pm

Hmm… I can’t imagine the damage if someone manages to crack into the server and steal everything. Massive IP abuse. On the plus side, does this mean that Amazon could potentially open an ebook store with nearly every book they sell? that could be pretty cool.

SmackFu · October 24, 2003, 2:17pm

The beauty of the system is that they don’t have to have perfect copies. Just good enough OCR, and full page scans to show as the results. So you have pretty good accuracy for such things as searches, without the insane work of hand-checking that real e-book making takes (see Project Guttenberg).

Musicat · October 24, 2003, 2:29pm

We are getting close to the ideal data world – where everything ever written, past or present – is online and searchable.

I can easily see how publishers could provide computer data of each new book as it is published (but think of the security risk!) so Amazon can index it, but just how would entering older books work efficiently? I just can’t picture someone turning each page in a bound copy and putting it on the scanner; this would be too slow to be cost-effective. Or are there automated machines that can be fed a bound publication and they will turn and scan each page unattended? Or slice the binding off and feed single pages?

SmackFu · October 24, 2003, 3:10pm

More info:

Publisher FAQ, which says they need a physical copy and can’t use electronic ones yet.

Wired article, that Cardinal referenced.

Chefguy · October 24, 2003, 4:36pm

Something similar to this was started a few years ago at (I think) ancestry.com. They have a genealogy library that has been growing at a rate of about three books a day and which includes books dating way back to the earliest writings in this country. Some of it is photocopied, but some is also transcribed. A remarkable undertaking.

cmkeller · October 24, 2003, 6:04pm

I imagine that a quote from a cartoon might not be in there, because the words of the cartoon probably have not been transcribed to computer, only words of text-only books.

Topic		Replies	Views
Amazon's Search Inside this Book feature Cafe Society	10	1559	March 7, 2008
I'm pitting Amazon's search engine The BBQ Pit	41	923	March 5, 2024
Have you played with The Book Seer? Cafe Society	12	1161	June 30, 2009
You'd think Amazon would understand the concept of what a book is The BBQ Pit	51	10836	February 10, 2015
How does a publisher know for sure that a book hasn't been released yet? Factual Questions	16	2359	December 22, 2016

New Amazon.com feature text searches ever word in every book they sell! Wow!

Related topics