So, what exactly is the problem with MS Excel for complex mathematics and statistics?

Daylate · June 17, 2012, 5:52pm

OK, I Googled 77.1*850 and added Excel, and got answers. From what I read, the bug was introduced in Excel 2007. I knew I was smart to stay with my old antique 2003!

Chronos · June 17, 2012, 5:55pm

It’s easy to represent financial transactions with full precision using binary. All you have to do is use integers, and make the cent (or mill or whatever) your unit. Which is in fact what many financial programs do.

ZenBeam · June 17, 2012, 6:22pm

Shalmanese:

Using Wolfram Alpha which is a tool specially designed for numeric computation, I get:

1.001^1000 = 2.71692
1.000001^1000000 = 2.71828
1.000000001^1000000000 = 2.71828
1.000000000001^1000000000000 = 2.71828
1.000000000000001^1000000000000000 = 2.71828
1.000000000000000001^1000000000000000000 = 2.71828

Using Excel, I get:

1.001^1000 = 2.71692
1.000001^1000000 = 2.71828
1.000000001^1000000000 = 2.71828
1.000000000001^1000000000000 = 2.71852
1.000000000000001^1000000000000000 = 3.035
1.000000000000000001^1000000000000000000 = 1

Properly designed numeric computation algorithms like Wolfram Alpha carefully perform calculations in a way such that errors don’t compound exponentially like the example above. Certain common operations like operations involving matrix inversion are especially prone to such errors so great care needs to be taken in how they are coded. Usually, this is packaged into a library to use such as numpy so you don’t have to worry about the details.

This is why excel is unsuited to this sort of calculation.

I tried this with Gnu Octave. I got the same answers as your Excel answers. I’ll try Matlab next time I’m booted under Windows, but I suspect I’ll get the same answers there as well. I’d also expect to get the same if I used doubles in C++, or Real*8 in Fortran.

You said Wolfram Alpha was designed for numeric computation, but unless it’s using quadruple precision, it must have been using symbolic computation to get the answers above. In both single and double precision, 1.000000000000000001 == 1.

I almost never use Excel, except to open documents someone else sent me, but are you able to write readable, maintainable functions in Excel? I suspect not, and that would be the main issue I’d see with it (but fight my ignorance here, if I’m wrong). From what I’ve seen, you’re basically writing unmaintainable spaghetti code each time. OK for throw-away testing code, but not for something complex you want to maintain.

Oh, and speaking of complex, does Excel handle complex numbers?

Nava · June 17, 2012, 6:34pm

A problem I have with using Excel is simply its documentation. Nowadays at least it includes more than one statistical function, but still: if you want to know whether you’re getting p(>x) or p(<x) you need to test it.

So even if you only need the limited functions that Excel brings pre-programmed, you can’t “just do it”.

(My “undergrad thesis” consisted of programming what would nowadays be a small library of continuous stats functions, for someone else to use in her PhD; it was in '93, so I didn’t use cards with holes but it sometimes feels like it)

Senegoid · June 17, 2012, 7:07pm

Granted. But in my essays on the topic, I often add the complaint that too few programmers know this, and too few beginning-level programming classes teach this. Add to that, the vast number of “self-taught” programmers who picked up some teach-yourself books and are now working with JavaScript or whatever, who have no clue. As I’ve mentioned before, I myself didn’t know this (until I did), and I was utterly mortified and stupefied when I first found out!

And some of those people write programs that we might actually use, and might actually need to get right answers.

Modern-day case in point: I worked for a company that makes cash register software. It is a small company with a small customer base, but you might nonetheless see their software on cash registers here and there. The programmers who wrote the code are self-taught (AFAIK). The code is in a high-level language that doesn’t give you any flexibility in how numbers are stored. (I think it uses 64-bit floats, and always has.)

Money amounts are stored in dollar units, so they typically have fractional parts in the hundreths (cents). Typical sales involve a customer buying a bunch of items, the prices of which are added up (exactly as you would expect a cash register to do). Then sales tax is computed (being a % computation, this commonly gives results with many decimal places, which are rounded to the nearest cent). End users have incessantly reported errors in the behavior, many of which turned out to be related to this.

Typical example: Sales tax amounts commonly are not computed, but looked up in a sales tax table, the details of which are dictated by the state. For example, if your total sale is < $X.37 the tax is Y, but if >= $X.37, the tax is Z. If a sale totals up to “exactly” $X.37, the code commonly “took the wrong branch” and added the wrong tax. Similar errors were reported in discount computations, price break points, promotional pricing sales and whatnot.

The programmers knew of the error but didn’t understand it. Note that the actual internal errors occurred in the 14th (or so) decimal place. The number-to-string functions rounded to a specified number of decimal places, so these errors were rarely overtly visible. The programmers only understood one solution to the comparison problem: Convert all numbers to strings (to 2 decimal places or whatever), then compare the strings.

Chronos remarked (in one of the threads I cited above) that the problem isn’t serious for him in his line of work, because the exact comparison contingencies are somewhat flexible. In the case I described here, we didn’t have that luxury – comparisons commonly had to be done to the “exact” cent, and they frequently went wrong.

RaftPeople · June 17, 2012, 7:08pm

How many bits do you think you would need to accurately cover the full range of financial transactions calculated by businesses today?
How would you handle mixing of very small and very large numbers?
Do you think there might be some issues with rounding?
Can your system accurately calculate repeated transactions with many decimal positions and end up with the exactly correct answer?

Not as “easy” as you think.

Senegoid · June 17, 2012, 7:18pm

Ah! But as I’ve been incessantly harping, a lot of people ARE “foolish enough” to think that!

And to add a nitpick upon your nitpick (that would be a nit[sup]2[/sup]pick?): The outcome of such cases, even worse that being wrong, is often unpredictable.

I’ve found, for example, that 0.1 + 0.1 + … (ten times) typically raises this problem, but 0.1 * 10 does not!. At least, in one language with one compiler on one machine where I tried it.
In another thread, Chronos suspected it had to do with that mixed-mode arithmetic (the integer 10 has to get converted to the float 10. first). I suspect, on the other hand, that it might just be some compile-time optimization that did away with the computation altogether.

In writing the sum 0.1 + 0.1 + … one might wonder if the compiler will optimize it in some way. You can never be sure, from one compiler to another, or from one machine to another, or from one language to another, what will happen. I tried this in several languages, several years ago. I recall ONE of them (I think it was Visual Basic) produced the “right” answer! But I don’t know why.

septimus · June 17, 2012, 9:02pm

Many computers today, including the cheap laptop on which I’m writing this, support 64-bit integer arithmetic with simple C programming.

If we “waste” one of those bits as a sign-bit, and represent amounts to the nearest mill (tenth of a penny), this still allows dollar amounts as large as $9200 trillion to be represented. That’s with integer arithmetic, and without any floating-point precision issues.

Francis_Vaughan · June 17, 2012, 9:15pm

It was nonsense in that on these older machines the instructions were actually slow, and could be trivially outrun by doing using the normal integer instruction set to do the exact same calculations (including all the care needed to avoid the issues above.) It is interesting that internally the x86 actually converts the BCD representation to double extended precision, and only converts it back when you want it in BCD. It does however guarantee precision doing this, but limited to 18 decimal digits.

The IBM z10 is the current version of the 360 architecture. From the instruction set architecture point of view it isn’t a Power6. So it has to keep compatibility with legacy code. Moving the decimal coding from firmware emulation to hardware will make it a lot faster. This is for the same reasons that you would prefer to remove the instructions entirely. Instruction decode is slow, and the fit of the instruction semantics will be poor to the underlying pipeline implementation. But if the code you are running contains the instructions, you have no choice, you must execute them correctly. The zSeries machines are all about legacy. There is probably a lot of COBOL running on those machines. They are also all about mainframe like reliability, and are a truley heroic design. The price reflects this.

That said, there are gains to be had on more modern, very large, processors. The ability to add logic is almost unconstrained (at least in terms of numbers of gates) so that the speed can swing back in favour of more complex instruction sets. As Intel did in the x86 (certainly the Netburst architecture, and likely the modern successors) they simply decode the instructions into an intermediate RISC instruction stream, and implemented a simple RISC like machine inside. However some instructions became much faster than others. If your instruction decode is very slick, and you can keep the pipleline free of hazards, the gain in memory bandwidth and instruction cache with a more compact instruction stream may start to claw back. But you can’t manage this with arbitrary instructions, they still need to be a good fit for issues like exception handling, and most importantly, hazards involving the processor state flags.

Senegoid · June 17, 2012, 11:23pm

Where, or where in this modern world, is the IBM 1620 when you need it?

[sub]ETA: Native decimal arithmetic, variable- and arbitary-length “data words”[/sub]
[sub]Oh, and did I mention memory in 20Kbyte increments up to 100Kbytes max?[/sub]

RaftPeople · June 18, 2012, 1:40am

And if you need to store values with 8 decimal places - and use even more than that for accuracy of intermediate results, then how large of a number can you store?

And again, how do you solve the rounding problem?

Senegoid · June 18, 2012, 2:14am

A 64-bit integer (I think that’s what the above quote is talking about) could store a number up to 21 decimal digits long or so. If integers are stored in BCD form with four bits per digit, that would hold 16 digits. With suitable programming, you could get a lot of accuracy out of that. Or you could program your own multi-word integer arithmetic.

[storytime]
My first quarter in college, summer quarter right out of high school, they let me into an upper-division assembly language class (because I already knew Fortran). Our very first assignment was a warm-up Fortran review problem: Write a Fortran program to compute (down to the last digit) the 999th and 1000th Fibonacci numbers and the product thereof. This was on a computer that had 60-bit integers.
[/storytime]

ETA: As I described in post #10 above, if you simply use more bits = more precision in your computations, that will solve SOME kinds of approximation problems, but not some other kinds of problems.

Francis_Vaughan · June 18, 2012, 3:42am

Eventually you get to the question of asking what problem you are trying to solve.

Fixed point arithmetic for other than financial purposes is rare. If you need precision, use proper floating point, and if you need insane precision use quad precision (128 bit) float. Scientific codes that need more than quad precision (and I would argue most things that need more than 64 bit) almost always implicitly have some very serious problems in definition or implementation. Floating point has the advantage that the precision is well defined, and usually defined by the IEEE floating point standard - which means that algorithms will normally port between architectures unscathed. So most science and engineering is well covered.

Arbitrary precision arithmetic is a specialised beast, but when you need it there is no choice. But it uses arbitrary storage, and can get pretty slow.

Senegoid · June 18, 2012, 4:13am

Do we have any rocket scientists here?

I have long wondered: Just how many significant digits do you need for various kinds of problems, where getting The Right Answer really matters?

If you are planning a spiral trajectory from earth to moon, how many significant digits to you need, say, to hit the moon? How many digits to hit a target 1 kilometer in diameter?

How many digits do you need to compute a trajectory to Mars, or Neptune? How many significant digits to you need to compute the timing of neutrinos vs photons coming from Andromeda? (Or something like that…)

With 32-bit floats, you typically get 8 significant digits, IIRC. With 64-bit floats, you get about 14. Are either of these precisions enough for even the most precision-sensitive problems that engineers deal with today (like the above-mentioned rocket questions)?

fumster · June 18, 2012, 5:32am

It’s a bit more complicated than that. Converting binary numbers to strings so that they can be printed is an expensive operation. Many business applications in the old days consisted of reading alphanumeric data from punch cards, doing a simple calculation, and then printing the result and/or punching a new card. Numeric data stored as BCD is trivial to convert to it’s external EBCDIC representation. So even though the arithmetic operation itself may have been slower the overall performance of the application was better.

God this makes me feel old.

Shalmanese · June 18, 2012, 7:12am

Senegoid:

Do we have any rocket scientists here?

I have long wondered: Just how many significant digits do you need for various kinds of problems, where getting The Right Answer really matters?

If you are planning a spiral trajectory from earth to moon, how many significant digits to you need, say, to hit the moon? How many digits to hit a target 1 kilometer in diameter?

How many digits do you need to compute a trajectory to Mars, or Neptune? How many significant digits to you need to compute the timing of neutrinos vs photons coming from Andromeda? (Or something like that…)

With 32-bit floats, you typically get 8 significant digits, IIRC. With 64-bit floats, you get about 14. Are either of these precisions enough for even the most precision-sensitive problems that engineers deal with today (like the above-mentioned rocket questions)?

It’s not quite as simple as that, it all depends on the type of computation that you’re doing and how errors compound. In the best, case, the error rate would remain constant and you could fly a vehicle from the earth to the moon and land it on a pinhead using 38 bit numbers. OTOH, I did some physics simulation work once using bouncing balls where the physics engine allowed you to compile in either 32 or 64 bit floats. The simulation behavior between the two diverged in a matter of seconds because of the inherent chaotic nature of the simulation.

Dr.Strangelove · June 18, 2012, 7:22am

I don’t know, other than that 16 bits are not enough (well, at least for one particular situation).

RaftPeople · June 18, 2012, 3:15pm

Well, you couldn’t run COBOL with that system, it requires 32 digits for intermediate calcs (for accuracy).
And I’m still curious how you are going to handle rounding with this system. (Of course it can be done and isn’t that hard, but the point is that there are more issues than realized by the comment “just use integers”).

Kevbo · June 18, 2012, 3:54pm

It wasn’t rockets, but back in the late 1980’s I worked on flight control hardware/software for military aircraft. (Sperry/Burroughs/Unisys/Honeywell…just one job, two acquisitions and a name change).

IIRC, we did it all with 32 bit integer arithmetic, sometimes just 16 bit, except one processor (I did NOT say microprocessor) was 18 bits. It was easier to test and or prove stability and consistent behavior with integer math. Also, there were not really any FPU’s for microprocessors in those days, the speed advantage of integer math was huuuuuuge. I think some VAXs may have had FPUs. The early systems were coded in assembly, and the later ones in Jovial.

Usually you have to interface with IRL signals. These are digitized. 12-14 bit resolution was typical in those days. 16 bit resolution is pretty common now. You can buy much higher resolution A-Ds, but it is difficult to design circuits with noise, accuracy drift, and bandwidth to support them. 16 bits will let you digitize an absolute pressure transducer and then calculate altitude +/- 3 to 4 feet up to 40,000 feet or so.

The accounting examples are a good illustration of why floating point representation can fail when newbs just assume it is better. Two other examples I have fixed:

A float was used for a loop counter because the loop needed to run for a very large number of loops. (yes, somebody actually did this!) For very large stopping points, it never completed because the count reached the point where the least significant digit of the count was worth 2 units. After that X+2 is still X+2, but X+1=X So the counter stopped incrementing. Fixed by nesting two loops with long int counters. This was easy for me to figure out because I had found the next one a few years earlier:

A float was used as a “hobbs meter” (total running time) for an industrial system.
The count was incremented once per second, and divided down to give years, days,hours,minutes, seconds. Problem was that after about a month, (less IIRC) you could no longer increment by 1 as explained above. Fixed by changing counter to long int. (I think I posted this example on the dope a couple years ago, but am too lazy to track it down)

Topic		Replies	Views
Error in binary representation (IEEE 754) Factual Questions	32	3478	January 21, 2014
Rules for rounding numbers Factual Questions	50	3040	June 19, 2003
Question for mathematicians (and/or Unix people) Factual Questions	19	1258	June 11, 2002
Quick Translation (for someone too stupid to comprehend scientific notation) Factual Questions	32	8650	November 13, 2009
Binary and Hexadecimal numbers question. Factual Questions	29	1761	April 3, 2004

So, what exactly is the problem with MS Excel for complex mathematics and statistics?

Related topics