Xerox copiers can't copy correctly

Machine_Elf · August 7, 2013, 7:10pm

It’s more like your hammer replacing your window with a wall when you’re not looking.

control-z · August 7, 2013, 7:41pm

I’m flabbergasted by a copier that would change numbers that it scans. That would make the copier worthless.

Even if it wasn’t the default setting, I can’t imagine anyone who would want to copy numbers and have them altered on the copy. That setting/algorithm should not exist in a copier.

friedo · August 7, 2013, 8:01pm

That is incorrect. OCR works just fine on documents with graphics.

aceplace57 · August 7, 2013, 8:27pm

How in the hell did these fools screw up the design of a copy machine? They worked accurately since the 60’s. They didn’t need any computer tech.

So now the new computerized copy machines are changing whats on the original? Thats just Priceless. Leave it to this generation to screw up a workhouse machine that functioned correctly for decades.

I can see the lawsuits piling up fast. Imagine a Corporation’s quarterly financial statements changed by a copy machine.

Senegoid · August 7, 2013, 8:53pm

I was a Unix Sysadmin, circa 1984-1988. We found an error in the mag tape driver, such that data written to a magtape would be corrupted in some cases. (Specifically, the first byte of a tape block was omitted, and the entire remainder of the block was thus shifted one byte up on the tape.) When I discussed this with our customer support guy, we got a typical lame excuse.

This was in 1986, not long after the Challenger space shuttle went kablooie.

I asked this question: Would you fly on a space shuttle designed on this Unix system?

Now: Would you fly on a space shuttle built from engineering drawings e-mailed to you in a PDF file generated by a Xerox scanner? (Or possibly other manufacturers too, as JBIG2 is an industry standard algorithm.)

control-z says: “I’m flabbergasted by a copier that would change numbers that it scans. That would make the copier worthless.”

No, an error like this makes all copiers everywhere (at least Xerox ones?) seriously worse than useless – even if only due to the prospect that any copier, we now know, might do this.

It is also a serious error that the problem, though mentioned in the configuration screen of some models, it not only inconspicuous, but is not in a place where it would likely be seen by most users, nor viewers of the resulting documents. One wonders if Xerox, should they be sued, can successfully cover their corporate asses by pointing out that it’s mentioned somewhere deep in the user manual.

The lame apologism and hammer analogies are also flabbergastworthy.

aceplace57 · August 7, 2013, 9:10pm

These new copy machines that generate files on servers and email are a copyright nightmare.

It’s one thing for a teacher to make 15 copies of a Carl Sandburg poem to hand out in class. It’s a violation that the industry has grudgingly tolerated all these decades. Now that poem goes to a digital file and stored on a server? Maybe distributed through email? That brings in the Digital Millennium Copyright Act and a bunch of trouble that no one wants.

I’m very thankful my dept has a older paper copy machine. It has network connections to act as a printer too but we never hooked that up. We have enough maintenance service calls from just copying. We don’t need to use it as a printer too.

Irishman · August 7, 2013, 10:25pm

To be fair, it was not intentional. It is an accidental artifact of the compression algorithm that is not looking for meaningful content (i.e. numbers), but rather looking at collections of pixels. Arguably it’s doing a bad job at that, given some of the substitutions, but I guess it depends on the image quality and print size.

The problem is that technology advances, and people want more things. A straight photocopier was not trying to save the content for later use, merely instantly reproduce it, and then forget it. Once you get into “retain for later reuse”, i.e. scanning, you get into the problem of the digital “paperless” office - data storage.

Everything is scanned and saved electronically, rather than paper files. This saves immense amounts of space from filing cabinets. However, data storage has its own similar space storage issues. How much data can you store? How many documents?

This leads to compression algorithms, techniques to take the data and remove non-content so you store a smaller file, then reconstitute your original upon demand.

There are well-known problems with any kind of compression algorithm - specifically, what do you exclude to save that space? How does the algorithm decide what to keep and what not to keep? Sort the data you save and the rules for reconstituting to make it match the original? Think digital compression artifacts in your jpegs.

Someone tried to design a fancy compression algorithm that would look at the document, visually look for “similar bits”, and then instead of saving data for the entire page, save one copy of each set of bits and save only a reference marker for subsequent instances of those bits. Thus, you exclude a lot of repetition.

Instead of writing down a person’s full name and phone number, you keep a directory that lists their full name and phone number, and then you just write down the nickname. When you see the nickname, you look up the directory for the full dataset. Similarly, the file saves the bit reference and one directory for the bit.

The problem with this algorithm appears to be in the finesse of deciding what is a similar bit, so what is different enough to save as a separate bit. Depending upon the scan quality setting and the size of the print on the original, apparently some numbers are falling into the zone where the algorithm decides they are similar enough that the differences don’t matter.

That’s the problem. It sees pieces of the page as similar that we are visually parsing more finely and are not similar for our purposes. They actually contain semantic content in the differences, but the algorithm doesn’t look at semantic content.

The more finely you parse the algorithm to make smaller reference bits, the more data you have to store, the larger your file. Eventually, you are saving every pixel.

At least that doesn’t ruin the semantic content, but it defeats the purpose of compression.

Bosda_Di_Chi_of_Tricor · August 8, 2013, 2:26pm

Xerox machines are randomly altering numbers in copied documents.

In this article I present in which way scanners / copiers of the Xerox WorkCentre Line randomly alter written numbers in pages that are scanned. This is not an OCR problem (as we switched off OCR on purpose), it is a lot worse – patches of the pixel data are randomly replaced in a very subtle and dangerous way: The scanned images look correct at first glance, even though numbers may actually be incorrect. Without a fuss, this may cause scenarios like:
Incorrect invoices
Construction plans with incorrect numbers (as will be shown later in the article) even though they look right
Other incorrect construction plans, for example for bridges (danger of life may be the result!)
Incorrect metering of medicine, even worse, I think.
To make things even more worse: The copiers in question are the common Xerox WorkCentres, and Xerox seemed to be unaware of the issue until we found out about it last Wednesday. Whats more, not only one different WorkCentre model seems to be affected, as we tested at least two with this issue (Xerox WorkCentre 7535 and 7556). Additionally, the current software release, as installed by xerox support, did not solve the issue, thus, the issue existed on the very old release we had installed, as well as on a very new one. The error has been confirmed by a xerox rental firm in the meantime, and Xerox is investigating as well, so it does not seem to be some dumb handling error or something similar (if I was thinking this, I of course would not publish it here).

This could be a major tech gitch!

Telemark · August 8, 2013, 3:32pm

You probably misread the date on the other thread

Reported as duplicate.

ZipperJJ · August 8, 2013, 4:59pm

I think we just found the answer to a lot of questions about current economic state of the world.

Idle_Thoughts · August 8, 2013, 5:07pm

Merging two threads about this.

AaronX · August 9, 2013, 2:51am

The odd thing is, this should be HUGE, but nobody seems to care. (Maybe Xerox wants it that way)

Irishman · August 9, 2013, 8:09pm

I’ve been reading more from Dan Kriesel, the guy who uncovered this anomaly.

Jbig2 apparently has two settings, a lossless version and a lossy version. The problem is with the lossy version, which provides a huge amount of compression and therefore storage savings, but is equivalently a lot sloppier with what constitutes a unique image bit. The lossless version will prevent this problem, but has to be set in the code for the device.

Dan found that Xerox headquarters was aware of the problem but for some reason many of the call-center techs were not aware of this issue. Subsequently, he has had discussions with Xerox regarding the problem, and Xerox has now released a software patch to delete that compression algorithm.

If you have an older machine or are not getting software updates (who does?), then you can work around it by avoiding the “Normal” setting and use a higher level scan.

There are rumors of the same effect in other manufacturer’s machines and at other setting levels. Caveat emptor.

Topic		Replies	Views
Distorting PDFs to prevent character recognition Factual Questions	45	14956	March 20, 2013
Do Copiers Have Microscopic Codes? Factual Questions	36	12904	December 29, 2011
The Box of Random Evil The BBQ Pit	72	6899	June 9, 2003
What's up with "no copy blue"? 10 Jul 2007 Cecil's Columns/Staff Reports	27	2717	July 12, 2007
Improvement to photocopying machines Factual Questions	32	1288	August 23, 2021

Xerox copiers can't copy correctly

Related topics