Error in binary representation (IEEE 754)

There might be a more specialized place to discuss this but I know there are some folks around here who know this.

I was answering a question on an Excel discussion board about why the following expression

(4/20) > (1-0.8)

evaluates to True in VBA, when intuition says they should be equal. I gave an explanation about how not all decimal numbers can be represented exactly in a binary format. The result of 4/20, for example, is a repeating binary 0.0011… Because division and subtraction are different operations, it is possible that these two operations above will result in different binary representations, even though decimal arithmetic will give 0.2 for both. I have a computer science background and have at least two books on the shelf that include discussions of IEEE binary representation. I understand it fairly well but was never an expert and it’s been many years.

But the OP asked a follow-up question that stumped me. OK, so 0.2 is not represented exactly. But if I print 4/20 and 1-0.8, I see 0.2 and 0.2. So if there is representation error, why does the conversion to decimal not reflect the error?

I can think of one answer, but then it contradicts the comparison result. One possibility is that the “advertised” precision of a decimal floating point number is a little bit less precision than what is actually stored as binary. For example, storing 0.2 as 64-bit double precision might give a representational error of something like 2[sup]-52[/sup], which is about 2 x 10[sup]-16[/sup]. But maybe for conversion to decimal for printing, it rounds to the nearest 4 x 10[sup]-16[/sup]. So these two numbers are stored as different numbers, but displayed as the same number. But then why would it compare exact numbers as stored, instead of rounding using the decimal error first before making the comparison?

Because nobody really wants to see the entire number as represented in memory. If 4/20 were always shown on-screen as 0.1999999999999999 , nobody would buy Excel.

The answer is that when numbers are printed to screen, they are printed with some precision less than the error between 0.2 and its approximate binary representation. If I print out 0.2 with 35 digits of precision, I get 0.20000000000000001110223024625156540.


Because the numbers are different, so reporting that they are equal would be an error. You can set up your own comparison, subtracting the two values and seeing if their difference is smaller in magnitude than some tolerance, but you’d have to specify what that tolerance is.

As far as I know, it likely doesn’t do any rounding itself. In C, try just doing

", .2)

It will only print it to a couple decimal places, it doesn’t round the number first. I’d actually be surprised if Excel actually formatted the decimals as strings themselves, they likely use sprintf (or some equivalent depending on what Excel is written in)*.

Keep in mind that even constants have error in floating point. If I do printf("%.20
",.2) (print .2 to 20 decimal places) I get 0.200000000000000011102230246252.

I do find it somewhat disconcerting that so many businesses use Excel for financial applications when a dedicated software that uses scaled integers/fixed point decimals for money would have much less error, but I’m probably just paranoid. At least I’m not aware of any catastrophic financial meltdowns that happened to an Excel rounding error. I guess scaled integers also have weird problems too, like the fact that you have to multiply before you divide.

  • Excel actually doesn’t directly conform to the IEEE floating point standard so I may be wrong, but I suspect that their implementation is actually on top of a real floating point and wrapper funcs around the printing functions, rather than them doing weird things with void*s to [4]chars or some nonsense like that.

Huh, I read farther in that link I gave

So Excel CAN do what you ask, you just have to dive into a few menus!

Oh, you don’t huh? :frowning:

OK, not actually a rounding error, but still.

Incidentally, in what universe is this (from the OP) even close to true? :confused:

Not even Excel’s, I think.

Recall that this question is about VBA, which strictly speaking is not Excel. It appears to treat real numbers differently than Excel does in its spreadsheets.

In fact, if you enter the two expressions from my example as Excel formulas, it will tell you that they are equal.

Any universe; do the math. Here’s one cite that uses the same number as an example..

Wikipedia also explains that

The number 0.2 is 1/5. There is no fractional equivalent of it that has 2 as its only prime factor. It can’t–because 5 is prime.

It is the same reason that 1/3 cannot be represented exactly as a decimal.

If I print the double’s exact binary representation I get


Which is the stuff in the last set of brackets times two to the power of the stuff in the second set of brackets (the {} in the second brackets is the exponent sign bit, so 10[sub]2[/sub] to the power of 1111111100[sub]2[/sub]). Which is indeed 100110011001100 forever. Albeit the pattern is offset, presumably because of the exponent value.

The answer to the question in the OP is, effectively, it doesn’t show error via lots and lots of work. It is not easy to print binary floating point numbers in decimal, and old methods did show too many digits, or too few digits, or had weird rounding errors. That paper standardized a lot about floating point pretty printing, and it’s still not perfect.

The extension to this answer is that in most cases, floating point representation error occurs in the really small ranges, when I showed you the 64-bit IEEE floating point of .2 to 30 places, it didn’t occur until 17 decimal places down. Most error simply doesn’t happen in the effective printing range, which is lucky.

This is not always true. When I was teaching a C class with a friend we printed pi=3.141592653 to 9 decimal places, but as a 32-bit floating point and it was represented as 3.141592662 (I think, something like that anyway). It just turns out that most error doesn’t happen in the visible ranges, unless you get unlucky.

As for why it doesn’t automagically round for you. There are many reasons. In some cases it may not be desirable, there’s no absolute good threshold or tolerance for rounding in all cases, it would be slow to convert to decimal error before you compared every time, and sometimes (rarely, but it happens) you really DO want bitwise float ==. The lack of a good tolerance range is a big one. Even if you can lop off some representation errors, all that will do is serve to make people use == more, when in reality it’s still Russian Roulette, it’s just there’s 5 empty chambers instead of only 3.

The fact that floats often differ from what they “should” be by some small amount isn’t usually a problem, in itself. The numbers will almost always be rounded to some fewer number of decimal places before being actually used for anything. Usually. For example, if you are computing the sales tax on a purchase, nobody will care if you come up with something like 35.2685736999999999999947385 because you will end up charging the customer 35.27 anyway.

The bigger problem comes when comparisons are made among such numbers, and a program takes the wrong branch instead of what it “should” have done.

Example: You’ve totaled up a purchase and it comes to (theoretically) $20.00 but internally, it only came to $19.999999999999999999999894 – Now suppose you have a promotion where the customer gets a $5.00 discount on any purchase >= $20.00 – So your logic would be:

if ( total >= 20.00 ) {
give 5.00 discount
else {

This code would do the “wrong” thing. The receipt, which shows everything in dollars and cents (that is, to two decimal places) would clearly show the purchase totaling $20.00 yet the cash register would fail to give the discount! Customer complains to cashier; cashier calls manager; manager is stumped, and probably rings up the discount manually. Then manager complains to cash register vendor; cash register vendor (who has long since given up on complaining to the company that wrote the cash register software) complains to his third-party consultant, who endeavors to write an add-on module for the register than can trap this kind of error and work around it.

Third-party consultant succeeds in doing this, and I collect $200.00 for my efforts!

This cash register software was written by self-taught (I think) programmers who understand exactly zilch about binary representation of floats, and the whole app is shot full of this kind of error. They have fixed a lot of the reported instances of this bug by this hack: Now, when they want to compare two floats, they convert both to decimal character strings with just as many decimal digits as they need to compare (typically two, for dollars and cents), and then compare those two strings. But nobody really knows how many more instances of this bug remain.

By the way, some of you will recall that we’ve discussed this topic several times before.

About two years ago, Chronos remarked that (in his line of work anyway) this might not matter so much: If two numbers were so close together that a comparison might go the wrong way, then it might actually not matter which way the branch went. (BTW, look two posts above that one to see the post he is quoting there.)

The example I showed above, however, demonstrates the sort of situation where it really does matter. There were plenty of similar situations where one amount (typically computed from a bunch of separate operands) must be compared with some other amount (typically given as a fixed constant), where that fixed constant is a “clean” amount – a whole number in dollars, or a “whole” number in cents (which means a multiple of .01 which has its own rounding error from the get-go). If the computed total happens to be theoretically exactly equal to this threshhold amount but is actually off by a small amount, a branch could go the wrong way and then incorrect actions happen.

– Computing sales taxes, which are sometimes based on a series of threshhold amounts of the base purchase.
– Computing promotional give-aways (like the example above) that are based on the customer spending some amount.
– Deciding whether to accept a customer’s check, when there are maximum check amounts that can be accepted.
– Deciding if the customer’s house account balance exceeds the customer’s in-house credit limit.
– Deciding if a gift certificate is fully spent.

There are lots and lots of cases where this sort of thing can happen.

There are a few times when you’re dealing with limited-domain functions like atan that the library will freak out or give you NaN if you’re just a tiiiiiiiiiiiiiiiny bit over/under the max/min input value. This is why most computer graphics packages come with clampf and clampd functions which are basically just the logic

if x > max {
     return max
} else if x < min {
     return min
} else {
     return x

I just noticed that our OP, CookingWithGas, was a participant in that earlier thread from February 3, 2012, in which Chronos make the remark I cited above.

That thread is worth a read, or a re-read for those who were there then.

Personally, if I were writing the cash-register software, I’d make it so any purchase over $19.995 triggered the “at least $20” test, since it’d be rare indeed for any rounding error to be larger than half a cent, and those are the values that would (for what the customer sees) appear to be correct.

Actually, if I were writing the cash-register software, I’d probably avoid the issue entirely by working in integer cents, instead of in floating-point dollars. But for a consultant fixing someone else’s software, that ship has already sailed.

And for context, the sorts of code I’m dealing with usually involve some physical situation which can be approximated in different ways in different regimes. The approximations will, of course, match (to within some acceptable tolerance) at the boundaries between regimes, and for some amount of overlap near the boundary. So if one is very near the boundary, it won’t actually matter which of the two approximations you use.

My thoughts exactly. You would have to be careful about when to do certain calculations, like sales tax, to avoid rounding errors (taxing the total rather than individual items, for example).

(Ada, for one, also supports fixed-point data types with programmer-specified precision. But I don’t think anybody uses Ada anymore.)

I came across the same problem with Oracle once - I think it was:

1 - (1 / 190 * 10 * 19) = -6 x 10^40

It has something to do with Oracle storing numbers with 38 decimal digits (exactly - it uses a “base-100” system rather than binary) of precision, and 1/19 has a 19-digit repeat cycle.

As Cronos points out, tests of floating-point equality (or insisting that ‘>’ and ‘>=’ be distinguishable) are almost always wrong. (Comparisons with a few special values, like zero, may be exceptions.)

A less-known “gotcha” is that equality tests may fail even when the exact same computation sequence is used to derive the comparands! This can arise on Pentium where, IIRC, floating-point values are stored as 8 bytes typically but registers are higher-precision, use 10 bytes, and thus may not compare as equal with their own stored values!

Exactly this! In the hundreds of pages of add-on scripts I’ve written in the past 15-or-so years (including when I worked at that company), I generally used an epsilon of 0.0000001 – that would work well for dollars-and-cents amounts, as well as for tax or other percent computations where we might want to work with, say, 4 or 5 decimal places.

I made a point of mentioning making a distinction between > and >= tests because, as in the examples I’ve shown, it matters that the (theoretically) exactly == case must take the correct branch. The general strategy is: Perform the “nearly-equal” test first, and if so, take the “equal” branch. Then test for > or < at your convenience.

I also suggested a way to accomplish the same thing without the need for the extra nearly-equal test. Chronos suggested that all you need to do is shift your threshhold by a small epsilon – in retrospect, my suggested strategy was exactly that, except I had never quite thought of it that way.

As I mentioned, I think the original programmers knew nothing of any of these issues. And I’ve seen plenty of programmers similarly benighted. Every thread we’ve had on this subject has had knowledgeable people posting similar observations. That is why I, and others here, have voluably insisted that this should be carefully taught in beginner’s programming classes!

In the earlier thread I cited, I made this post, suggesting a trivially simple program that every student programmer should write, run, and study at least once in his life – It demonstrates the problem with wrong-branching on “nearly-equal” computations, and can be readily done in just about every programming language.

As for the ship that’s already sailed – Hey, that’s just more consulting $$$ for me!

ETA: I still say training dolphins was more fun.

That program will actually work depending on your compiler or language spec. The program

one := 1.0
addedOne := .1 + .1 + .1 + .1 +… // Pretend this is 10 .1s

Will often work correctly because of some idiosyncracies of how certain compilers evaluate constants at compile-time. You can actually often tell the extent of constant compile-time variable anaysis depending on whether the program

tenth := .1
one := 1.0
addedOne := tenth + // added 10 times

Differs in result from the first version. You may or may not get it as == 1 depending on optimization flags, whether the variable “tenth” was declared as a const variable, and any other number of things. Using a for loop to add .1 10 times usually more reliably results in a wrong answer, though perhaps amusingly JIT-compiled languages that do optimization during runtime may unroll the loop after 2-3 executions and the function will end up returning a different value in later calls during an execution than it does the first few calls of an execution. (Java does this sort of thing often)