Has anyone ever heard of Zoom Books?

It seems it is a Canadian second hand book shop: they buy books on line, mostly non fiction. They buy old books that have very little use at first sight, then they cut off the binding, scan them, and pulp them. Here is a link in Spanish about them:

They link to another article from the Washington Post from January this year, probably paywalled:
https://www.washingtonpost.com/technology/2026/01/27/anthropic-ai-scan-destroy-books/
El Diario(dot)es is a sister newspaper of the British The Guardian, but in Spanish, so you get their general ideological direction. They write:

Secondhand booksellers have been grappling with a profound contradiction for weeks: on the one hand, they have never sold so many books. On the other, they have serious doubts about the fate of these copies. So much so that they have alerted the Ministry of Culture to what is happening.

“We are not just merchants; we also have a role in preserving, conserving, and restoring our bibliographic heritage,” notes Miguel Ángel Ortega, bookseller and president of the Professional Association of Antique Books and Collectibles (UNILIBER). “It would be highly contradictory for us to be selling books with the intention of destroying them.”

“We are facing a form of literary plundering,” says Font, the bookseller from Badalona, who, while speaking with this newspaper, receives another order from the Canadian company. “We are seeing the tsunami coming; I believe the institutions must intervene.”

On the other hand they admit:

Carlos Hernández, owner of the Mautalos used bookstore in Madrid, has sold this company about 200 books in the last month and isn’t all that pessimistic. “A large part of our inventory consists of books that people want to throw away,” he explains over the phone. “In fact, many of the books I sell are ones that people pick up from the recycling bins.”

What does the Dope think about this? Do the AI companies have a chance to learn something from all the old books that range

The selected books range from a publication on the world of castellers [traditional human pyramid] in Granollers (Barcelona) in the 1970s to a technical manual on winemaking, proceedings from conferences held 50 years ago, and dietary recomendations from the [Spanish] Civil War.

Is this dangerous? Or are they already scraping the bottom of the barrel and showing the limits of intellectual property theft?

BTW: Zoom Books seems to be paying more for the stamps than for the books.

Disclaimer: All translations based on DeepL(dot)com, but I had to correct and explain some details this time. Which is getting unusual.

What do they do with the scanned book?

According to the Spanish article (after translation):
"The sector is bewildered. For whom is this company buying tens of thousands of books and having them sent to a logistics center in the US? What does he want these specimens that have been gathering dust for years?

This Friday, another Silicon Valley company (USA) linked to AI has contacted a Spanish second-hand bookstore, which prefers not to reveal its name, and has placed an order for more than 3,000 books.

The objective, according to various sources, is to train Artificial Intelligence (AI) models before destroying these volumes and recycling their paper."

I know it’s sad to us bibliophiles, but a vast number of secondhand books end up thrown away anyway, because nobody wants them for any price. If this company is throwing them away, too, that’s no different from what would happen to them anyway.

What is different is that they’re scanning them first. What gives a book value, after all, is the information contained with it, and there’s saving that, and putting it to some use. That sounds commendable, to me.

I just read the article (in Spanish, not a translation), but I still couldn’t find an answer to my question, which is why they destroy the books afterwords. There’s mention of something called a “data wall” problem. I don’t know enough about AI to understand exactly what that is, and the article didn’t provide an explanation, but from what I could gather, they destroy the books afterwards as a way of dealing with that problem.

The only other reason I can think of that makes any sense is simply a matter of storage. Maybe they just don’t want to pay for a warehouse to store the purchased books after they’ve been scanned.

Yes, that is way to see it. But they are not putting it to public use: they are keeping the content to themselves. Even the content that is no public domain yet. Isn’t that illegal?
Remeber the krypto fool that bought the Dune book by Herbert, Jodorowski & Giroud for 2,660,000 US$ believing they were getting the rights to the idea and could make the film? Turns out they could not: the rights to the movie do not come with the book that describes the movie.
Is getting the data in some book and not giving it away for free in a searchable way not the same? Hell, even giving it away for free would be a copyright infringement IMO. Even if it is only the minutes of a conference held 50 years ago.

I understand they cut the backs of the books off to get loose pages they can scan automatically. Afterwards they have thousands of loose sheets of paper and no books, so they throw it all away because it is cheaper and easier than everything else they can think of.

That’s what I gathered as well. I suppose from context it should be obvious that the expense of restoring the book and then maintaining a warehouse to store them all is why they simply destroy them. What threw me off is that it comes right after a discussion of a “data wall” and why their AI is stuck because of it, and that this is the reason they embarked on the project. My lack of understanding on that matter is technical, not because my Spanish isn’t up to snuff. I would have been just as baffled by that aspect of the article had it been written in English.

I understand the “data wall” to be the limit of the searchable data they have reached on the freely readable internet: from Wikipedia to all the free newspapers, blogs, comments on YouTube, all the webpages without login and so on. That was “easy” to harvest, and now they have it. Those books are not part of that: they are harvesting those data the hard way. Buy, ship, scan, OCR, store.
That is very slow compared to an internet crawler and very expensive. I believe it is also very error prone. OCR is very good today, but still: old books have strange formats, weird letter types, the pages may be old and feed irregularly into the scanner, they may even have typos! Is it really worth it? Are they so desperate for additional data? I am not sure that is a good business model.

When I buy a book, I expect that I will be able to make use of the information in it, with no requirement that I use that information for the public good, and no expectation that I will share the information freely (in fact, usually an expectation that I will not). How is this any different?

You are a private person, and that is what books allow. Open any book and read what is on the ISBN and Copyright page, usually at the beginning in English books. I just took the first English book beside me and it reads:

All rights reserved. No part of this publication shall be reproduced, transmitted or stored in a retrieval system, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior consent of the publishers.
This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, re-sold, hired out or otherwise circulated without the publishers prior consent in any form of binding or cover other than in which it is published and without a similar condition being imposed on the subsequent purchaser.

What you do is legal. That is what books are for. What they do is different. Could it be illegal? I suspect so, but I don’t know.

Be it as it may: Nobody here on the Board has ever heard of Zoom Books before?

I hadn’t heard of them but it’s interesting

Zoom books is a reflection of Big Tech wanting to own, control and censor ALL FORMS of information.
Fahrenheit 451, really.

I don’t see how destroying one copy of a used book (because you want to scan it in a destructive way) is censoring it.

Yes this ^^

I don’t think there’s been very many books printed that are one of.

Even the Gutenberg Bible there were about 200 copies were printed.

Paper books are not the perfect medium. Very fragile. This may be the only way to permanently save them.

This Zoom books, I’m sure they are not alone in this.

Not the best example, since the whole point of the Gutenberg Bible was that it was possible to make many copies of it. Almost all books from before that were one-offs (there probably were some mass print runs of bibles before that, but with each page individually carved, rather than movable type).

But of course, nobody’s going to be selling a pre-Gutenberg book, no matter what’s in it, for the prices this company would be paying. They’re likely paying by the pound, not even by volume.

Well, yeah.

Only thing that came to my head.

You might be surprised how well paper preserves in anaerobic conditions. If you ever go to the bottom of a landfill (which I don’t recommend) you will find perfectly readable newspapers from a century or more ago. Cite: some Bill Bryson book, forgot which one and which page.

Interesting premise. Do internet “clouds” ever dissipate/disappear and, if so, is the information recoverable?
Also, Zoom books doesn’t preserve books online-they throw everything into a big pot and create a virtual “Mulligan Stew”, then destroy the original books.

This kind of thing has been done for years. Here’s a recent project I found while searching

But I know I’ve heard of similar projects going back a number of years.

Here is a business that does it for you (and has been around for a while)

https://1dollarscan.com/work.html

In Vernor Vinge’s 2007 book Rainbows End (set in 2025) there was a high-speed mass-scanning project where books were tossed into a shredder and the remains blown down a tunnel lined with a large number of high-speed digital cameras taking photos of the fragments, with computers reconstructing the books from the images. (The idea inspired by how genomes are sequenced.)

Complaints about it are a tempest in a teapot.

I mean, they probably do. You have to have it in a digital form to feed the “stew”, and once you have that, there’s no sense in throwing it away. Preservation isn’t their primary goal, and they probably don’t intend to share any of that text (without being paid well for it, at least), but hey, they might need it again later, and hard drive space is cheap.

@Czarcasm , well-maintained Internet clouds don’t disappear, but that “well-maintained” is key. You can’t just shrug and say “Eh, it’s stored now, we’re good”. You have to constantly keep backups, and copy data over to new media, and so on. Which is what people did before clouds: What’s different with clouds is mostly just that you’re now paying someone else to do all of that management for you.