Error in binary representation (IEEE 754)

Dr.Strangelove · January 20, 2014, 2:22am

A similar thing that has bitten people many times in my industry: certain computational architectures support a fused-multiply-add instruction that computes A*B+C in one shot. Depending on exactly how the code got optimized, that sequence may end up getting fused or not, which alters the rounding behavior and thus the final result. Even within the same program there could be differences here.

Senegoid · January 20, 2014, 2:28am

Yeah, I wrote it the way I did in the hope of defeating, as much as possible, some attempts at optimization that some compilers might do.

Gaak! Whatever kind of optimization a compiler/interpreter may do, I’d at least hope it would be more consistent (and hence, predictable) than that. We’d like to hang onto our fantasy that computers are deterministic, but this calls even that into question!

Jragon · January 20, 2014, 3:01am

Dude, that’s not even the half of it. This was a performance bug rather than a results bug, but the JIT once unrolled an O(n^2) loop for me that was taking forever. But it only unrolled it on Windows machines – and if it was a laptop, it would only do it if it was running on outlet power and not battery. Needless to say I was very confused when my groupmates on other systems yelled at me for making the game unplayable.

Edit:

And as for scaled integers. You absolutely have to remember the multiply-before-divide rule if you use it.

92 * (1/92) =/= (92*1)/92

in integer arithmetic.

Isilder · January 20, 2014, 3:58am

can it ? it can only show values in decimal,
but the ‘>’ test is being done in binary.

Jragon · January 20, 2014, 4:03am

Not necessarily, if you enable that option it’s possible they use a decimal format or some other trick. This could either be a tedious string compare, or a Rational type. Note that when you enable this option, you have to pick a decimal level of precision, consider

type Rat struct {
Numerator, Denominator int
}

If I have 2.53, I can express this as Rat{253, 100}, both of which can be expressed in binary with perfect precision. While it’s a bit more costly, both in terms of memory and performance, you can have perfect equivalence and ordering (<, >) tests on this type, using the same methods you learned for adding fractions of unlike denominators in grade school. In fact, it should be relatively cheap since all numbers should have the same Denominator component (in this case, 100), the only relevant bit is the ordering of numerators. Though whether the denominators stay 100 depends, I suppose, on how you handle division.

Senegoid · January 20, 2014, 7:08am

What kind of internal criteria do you suppose the JIT was using to decide when to do that and when not to?

Jragon · January 20, 2014, 7:11am

It’s not a big secret that the JVM is better optimized on Windows than other platforms, I assume the logic simply didn’t exist on the OS X and Linux implementations. As for the plugged in/not plugged in thing, I assume it has something to do with clock speed, Windows laptops have features that underclock your components when running off battery power so that the battery will last longer. I assume the JVM feels it can’t unroll loops or do other JIT optimizations if the clock isn’t fast enough because it will slow down the program unacceptably, so when it’s underclocked by Windows it won’t unroll the loop because the clock is under the acceptable level.

Senegoid · January 20, 2014, 7:16am

Gotta be careful! You could sum up:


int sum = 0 ;
for ( int i = 1 ;   i < ∞ ;   i++ ) {
    sum = sum + i ;
}

and, if the compiler optimized stupidly, you could end up with sum = -1/12

(But I suppose no matter how the compiler does or doesn’t optimize it, it would take O(∞) time. )

septimus · January 20, 2014, 7:57am

BTW, the Kahan summation algorithm may be worth knowing, for programmers who lose precision during very long summations. I do very little floating-point work myself, so have seldom if ever needed it. (I use it sometimes anyway, just as a fetish :rolleyes: ) If there’s interest, I’ll post demonstration C code.

No … I’ll just post demonstration code despite total lack of interest.



/*
 * This is a demonstration of floating-point addition.
 * We sum 1 + 1/2 + 1/3 + ... + 1/100000000 four different ways.
 *   (a) sum ordinarily, largest addends first
 *   (b) sum ordinarily, smallest addends first
 *   (c) sum with error carries, per William Kahan
 *   (d) predict the correct sum, using Euler's formula
 *
 * To emphasize summation precision problems, we use 'float' sums
 *   rather than 'double' sums.
 */
#include     <stdlib.h>
#include     <stdio.h>
#include     <math.h>

typedef float Real;

struct Floatsum {
        Real        sum, carry;
};

void usual_add(double addend, struct Floatsum *kp)
{
        kp->sum += addend;
}

void kahan_add(double addend, struct Floatsum *kp)
{
        Real y = addend - kp->carry;
        Real tmp = kp->sum + y;
        kp->carry = (tmp - kp->sum) - y;
        kp->sum = tmp;
}

main(argc, argv)
        char        **argv;
{
        struct Floatsum J;
        int        k, Many = 100000000;

        /* Sum the first Many reciprocals. */
        /* naive addition */
        J.sum = 0;
        for (k = 1; k <= Many; k++)
                usual_add(1.0 / k, &J);
        printf("%f
", J.sum);

        /* small-to-large addition */
        J.sum = 0;
        for (k = Many; k >= 1; k--)
                usual_add(1.0 / k, &J);
        printf("%f
", J.sum);

        /* Do Kahan adds */
        J.sum = J.carry = 0;
        for (k = 1; k <= Many; k++)
                kahan_add(1.0 / k, &J);
        printf("%f
", J.sum);

        /* Euler's formula */
        printf("%f
", log(Many + 0.0) + 0.57721566 + 0.5 / Many);
}

Chronos · January 20, 2014, 3:29pm

I’m not familiar with the arrow operator used there. Is that the same thing as the dot used in structs?

chrisk · January 20, 2014, 3:37pm

Equivalent. If you’ve got a pointer to a struct, then you use the -> operator to say 'get me the member of the struct this points to.) Like you use the dot for a member of an ordinary struct variable.

septimus · January 20, 2014, 5:13pm

The operator ’ -> ’ is an unnecessary frill in C. The expression
P->B
is simply shorthand for
(*P).B

It provides a “smoother-looking” way to follow pointer chains. For example, here’s a line from the Linux kernel inode code:
bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;

RaftPeople · January 21, 2014, 3:59pm

If you were writing cash-register software or any kind of business or accounting software you would probably do what the entire industry does and that is use decimal/numeric data types instead of floating point.

Topic		Replies	Views
IEEE 754 +0 single precision Factual Questions	11	929	March 4, 2003
Another Binary Question... Factual Questions	20	1399	February 21, 2006
Floating point madness Factual Questions	41	1516	December 18, 2022
simple C++ question about floats Factual Questions	38	1791	November 1, 2006
Significant digits in different number systems Factual Questions	22	670	May 17, 2021

Error in binary representation (IEEE 754)

Related topics