Do Computers Make Errors of Mathematics?

You’ve got to define 99.9% of what. FIT is the number of failing parts in a billion operating hours. Anything that is only 99.9% reliable over any kind of iteration is going to have a terrible FIT rate, and would be considered junk. In civilian life.
I heard about a radar unit in the defense world that would last about 20 hours between repairs if they were lucky.

I’m late to this thread, so someone may have already covered this:

I wouldn’t expect the software for a cash register/point-of-sale system to use floating point arithmetic. It’s better to do all calculations for money in fixed point, keeping track of where the decimal point should be. Floating point is far more prone to problems like getting 1.99999999999 where you’d expect 2.0. This has been understood since the early days of computing. The COBOL language has always had support for fixed-point arithmetic with a set number of places after the decimal point.

It’s always possible for someone to make a programming error, or for a hardware problem to lead to wrong results. I would be very surprised if a problem like this slipped through the design and implementation process into a production POS system. The arithmetic in these things is simple, and I’d expect them to be thoroughly tested. It’s a little more likely that wear and tear on an old system could lead to errors (e.g. worn keys leading to incorrect prices keyed in). Errors in basic arithmetic are very unlikely, though.

Of course, this still runs into problems. If you compute the tax on two figures individually, you will get a different answer than if you computed it on both of them at once. That could be considered an “error of mathematics” since it violates the distributive property.

It’s also a violation of the law. That’s just another example of humans misusing a tool and then blaming the tool.

But then, the same is true of floating point. Expecting 0.1 to be exactly representable in floating point is violating the design.

To answer the OP, we have to decide what it even means to make an “error of mathematics”.

For memories you overdesign. Pretty much all large caches inside of chips are built with extra rows or columns, and during test if some bits fail you run an algorithm that swaps in these extra rows to replace ones with failing bits. You can’t always repair, but you do increase the yield a lot.
You can also run a built-in self test during power up to do the same thing in the field. This is called built-in self repair.

I’m referring more to ECC here, where dynamic events (like cosmic rays) require the memory to figure out which bit got flipped and correct it. This requires more memory, of course, but really just shows up in the final cost (i.e., you’re still buying a 1 GB chip, but it cost more since it has 1.125 GB of physical memory).

It’s possible to layer further protection on top. For a satellite project, we had concerns that the flash memory wouldn’t be reliable enough, despite it already having some internal error correction. We didn’t end up doing it, but I considered adding a Hamming(7,4)+parity code on top, which would halve the storage but add single-bit error correction for each byte, and double-bit error detection.

Definitely. The opposite of silent data corruption is noisy data corruption. When we were having memory issues we turned off the reports because the ECC was fixing the problem but the console log went crazy. Any highly reliable server has this.I suspect with memory sizes the way they are these days lots of systems have it, but we didn’t do consumer products.

My experience is that there are separate settings for correctable and uncorrectable errors. Maybe you want to log both if you’re in a quiet environment, but once you move your server to Colorado, you probably want to only leave the uncorrectable error log enabled.

Since our servers were used in places like Wall Street, customers wanted to see what was going on by default. I suspect they could turn off the reports also, but we had control over what they could see.
The meetings on this were fascinating, especially since they established early on that the processor was not to blame so I was off the hook.

Fair enough - no it was a busy system with hundreds of users, so performing tens of thousands of operations per day, but every day, a few things would just not happen - in some cases, identifiable to a specific line of code (for example easiest to spot when the line is one that displayed a prompt) that just got skipped for unknown reasons. Code execution of the adjacent commands worked, just not that bit, sometimes - actually, rarely as a percentage of the total.

Rounding errors are not unique to computers. They’ve been around as long as accounting.

Suppose you make three purchases at Best Buy, and sales tax in your state is 6.0%. Below is each purchase, the amount of tax on each purchase. But the result is not an even cent amount so the tax is rounded to the nearest cent. If you add the totals up, you will get a different number than if you add the purchases first then calculate tax the total. So you would have saved a penny if you bought them all at the same time. That’s rounding error.

Purchase Tax Total Rounded
$123.45 $7.407 $130.857 $130.86
$23.25 $1.395 $24.645 $24.65
$10.46 $0.628 $11.088 $11.09
$166.590 $166.60
Round $166.59

In forty years as a programmer, I’ve only seen one. Our two year old mainframe computer developed an intermittent problem. It took me two weeks to determine that a floating point divide instruction would fail if, and only if, 3.999999999999999 was divided by 2.000000000000. It only failed about once every 300,000 times it was executed. When the problem circuit board was replaced, the problem went away.