Xerox copiers can't copy correctly

It’s more like your hammer replacing your window with a wall when you’re not looking.

I’m flabbergasted by a copier that would change numbers that it scans. That would make the copier worthless.

Even if it wasn’t the default setting, I can’t imagine anyone who would want to copy numbers and have them altered on the copy. That setting/algorithm should not exist in a copier.

That is incorrect. OCR works just fine on documents with graphics.

How in the hell did these fools screw up the design of a copy machine? They worked accurately since the 60’s. They didn’t need any computer tech.

So now the new computerized copy machines are changing whats on the original? :stuck_out_tongue: Thats just Priceless. Leave it to this generation to screw up a workhouse machine that functioned correctly for decades.

I can see the lawsuits piling up fast. Imagine a Corporation’s quarterly financial statements changed by a copy machine.

I was a Unix Sysadmin, circa 1984-1988. We found an error in the mag tape driver, such that data written to a magtape would be corrupted in some cases. (Specifically, the first byte of a tape block was omitted, and the entire remainder of the block was thus shifted one byte up on the tape.) When I discussed this with our customer support guy, we got a typical lame excuse.

This was in 1986, not long after the Challenger space shuttle went kablooie.

I asked this question: Would you fly on a space shuttle designed on this Unix system?

Now: Would you fly on a space shuttle built from engineering drawings e-mailed to you in a PDF file generated by a Xerox scanner? (Or possibly other manufacturers too, as JBIG2 is an industry standard algorithm.)

control-z says: “I’m flabbergasted by a copier that would change numbers that it scans. That would make the copier worthless.”

No, an error like this makes all copiers everywhere (at least Xerox ones?) seriously worse than useless – even if only due to the prospect that any copier, we now know, might do this.

It is also a serious error that the problem, though mentioned in the configuration screen of some models, it not only inconspicuous, but is not in a place where it would likely be seen by most users, nor viewers of the resulting documents. One wonders if Xerox, should they be sued, can successfully cover their corporate asses by pointing out that it’s mentioned somewhere deep in the user manual.

The lame apologism and hammer analogies are also flabbergastworthy.

These new copy machines that generate files on servers and email are a copyright nightmare.

It’s one thing for a teacher to make 15 copies of a Carl Sandburg poem to hand out in class. It’s a violation that the industry has grudgingly tolerated all these decades. Now that poem goes to a digital file and stored on a server? Maybe distributed through email? That brings in the Digital Millennium Copyright Act and a bunch of trouble that no one wants.

I’m very thankful my dept has a older paper copy machine. It has network connections to act as a printer too but we never hooked that up. We have enough maintenance service calls from just copying. We don’t need to use it as a printer too.

To be fair, it was not intentional. It is an accidental artifact of the compression algorithm that is not looking for meaningful content (i.e. numbers), but rather looking at collections of pixels. Arguably it’s doing a bad job at that, given some of the substitutions, but I guess it depends on the image quality and print size.

The problem is that technology advances, and people want more things. A straight photocopier was not trying to save the content for later use, merely instantly reproduce it, and then forget it. Once you get into “retain for later reuse”, i.e. scanning, you get into the problem of the digital “paperless” office - data storage.

Everything is scanned and saved electronically, rather than paper files. This saves immense amounts of space from filing cabinets. However, data storage has its own similar space storage issues. How much data can you store? How many documents?

This leads to compression algorithms, techniques to take the data and remove non-content so you store a smaller file, then reconstitute your original upon demand.

There are well-known problems with any kind of compression algorithm - specifically, what do you exclude to save that space? How does the algorithm decide what to keep and what not to keep? Sort the data you save and the rules for reconstituting to make it match the original? Think digital compression artifacts in your jpegs.

Someone tried to design a fancy compression algorithm that would look at the document, visually look for “similar bits”, and then instead of saving data for the entire page, save one copy of each set of bits and save only a reference marker for subsequent instances of those bits. Thus, you exclude a lot of repetition.

Instead of writing down a person’s full name and phone number, you keep a directory that lists their full name and phone number, and then you just write down the nickname. When you see the nickname, you look up the directory for the full dataset. Similarly, the file saves the bit reference and one directory for the bit.

The problem with this algorithm appears to be in the finesse of deciding what is a similar bit, so what is different enough to save as a separate bit. Depending upon the scan quality setting and the size of the print on the original, apparently some numbers are falling into the zone where the algorithm decides they are similar enough that the differences don’t matter.

That’s the problem. It sees pieces of the page as similar that we are visually parsing more finely and are not similar for our purposes. They actually contain semantic content in the differences, but the algorithm doesn’t look at semantic content.

The more finely you parse the algorithm to make smaller reference bits, the more data you have to store, the larger your file. Eventually, you are saving every pixel.

At least that doesn’t ruin the semantic content, but it defeats the purpose of compression.

Xerox machines are randomly altering numbers in copied documents.

This could be a major tech gitch!

You probably misread the date on the other thread :slight_smile:

Reported as duplicate.

I think we just found the answer to a lot of questions about current economic state of the world.

Merging two threads about this.

The odd thing is, this should be HUGE, but nobody seems to care. (Maybe Xerox wants it that way)

I’ve been reading more from Dan Kriesel, the guy who uncovered this anomaly.

Jbig2 apparently has two settings, a lossless version and a lossy version. The problem is with the lossy version, which provides a huge amount of compression and therefore storage savings, but is equivalently a lot sloppier with what constitutes a unique image bit. The lossless version will prevent this problem, but has to be set in the code for the device.

Dan found that Xerox headquarters was aware of the problem but for some reason many of the call-center techs were not aware of this issue. Subsequently, he has had discussions with Xerox regarding the problem, and Xerox has now released a software patch to delete that compression algorithm.

If you have an older machine or are not getting software updates (who does?), then you can work around it by avoiding the “Normal” setting and use a higher level scan.

There are rumors of the same effect in other manufacturer’s machines and at other setting levels. Caveat emptor.