I’m having some serious trouble in a C++ program I’m writing. I’m sure that this is a known issue in C/C++, but my question is, is there a known solution to it?
My problem is that on very, very big numbers, the numbers aren’t accurate. When I take the number 9999999999999.999, and make it a double float, it comes up as 9999999999999.998046875 When I take the number 9999999999999.99, and make it a double float, it comes up as 9999999999999.9902340275. I understand that the computer stores floats as base + exponent, and so there’s some inaccuracy when it expands that into a number we can see. What I want to know is, is there any known workaround for this problem?
If it matters, I’m programming in a Windows environment.
The issue is not with C++ or Windows, but with the limited precision you get from representing decimal numbers and fractions in binary. To fix this you need a software package that handles arbitrary precision. A quiclk google revealed one from Gnu called MPBignum.
I haven’t used this, but I have taken Numerical Analysis, and when I was in grad school we had a group working on interval arithmetic. There appear to be packages that cost money also, but I’m betting MPBignum will do what you need.
The workaround depends upon what sort of functionality is lacking. For instance, if the issue is that your program needs to decide whether two floating-point numbers are the same, the workaround is to instead check if the difference between two numbers is within some delta. If the issue is that the numbers look ugly when you print them out, the workaround is to ensure that the print function prints no more than a certain number of digits (6 for floats, 16 for doubles). If the issue is that you require exact values for rational numbers (such as money), then a workaround is separating a given value into a numerator and a denominator.
It might help to know what it is you’re trying to do.
For most purposes, that’s plenty of accuracy. +/- 0.002 on a number in the trillions is really incredibly accurate. Do you actually need more accuracy than that, or are you just put off by the fact that the numbers don’t act like you expect them to?
If it’s the second case, there are probably known workarounds to whatever you’re trying to do. For example, it’s problematic to compare two floats directly. (float1 == float2) is rarely going to return true if you arrived at the numbers through different code paths. Instead, what you do is include a fudge factor, and see if the difference between the two values is large enough to be considered significant.
Obviously, if you’re working with, say, money, you’d want arbitrary precision. But bear in mind that such precision comes with a cost. If you’re doing lots of calculations, that precision is going to slow you down considerably. For most real-world problems you’ll be solving, float or double accuracy is more than enough.
The issue is that I need to ensure that the number the user enters does not contain more decimal places than the maximum allowed for that data category (which is configurable).
So if when the user enters 9999999999999.99 (for a category in which two decimal places are allowed), and the computer sees it as 9999999999999.990234, that’s a problem.
If your numbers can be represented as large integers, you can use IEEE double precision floating point to do exact arithmetic on integers up to the size of the mantissa, about 53 bits. I’ve used this for things like representing time as the number of microseconds that have elapsed since an epoch. For financial calculations, some programmers use large integers to represent cents or fractions of cents. For example, $1,234,567.89 is stored as 123456789.
That was the way I was going to do this in the first place. The problem is that when I format it as a string ("%lf") it comes out the bad way, with .99 being formatted as .990234 .
This implies that the number is being read in as a double, and that you’re converting it into a string. What people are saying to do is to read in the input as a string, verify that that follows the form you would like, and then convert that string into a double.
The number is actually being read in as a COM Variant, with the type being VT_DECIMAL. And when I debug and see that value, I see the exact thing the user put in. However, in order to format that into a string, I need to cast it to a double.
The option to read it as a string in the first place is not available to me. I need to check the decimal value against the configured maximum number of decimal places.
bup:
If I truncate it, I’m forcing the value to be a valid value, not checking that the value the user put in is valid.
My COM knowledge is pretty much non-existent but I beleive VT_DECIMAL is DECIMAL type. This is a 16-byte fixed point format. From the docs on PROPVARIANT: