NAN: What is it good for? (programming)

You could view a computation returning “NaN” as a form of returning an error message (“Error: You passed me an invalid argument!”). The fact that this error message doesn’t automatically crash the whole program down could then be viewed as providing the calling context an opportunity for exception handling (i.e., it can check the returned error message and react as it would like to it, including dying/propagating some error message upwards if it feels that is appropriate or carrying out some other form of damage-control more to its liking). The fact that future computations with “NaN” also return “NaN” is just more (appropriate) error messages (“You passed me an invalid argument”).

From that perspective, NaN is not an unreasonable design choice; it’s the programming culture of not checking for or thinking about how to respond to such error messages which is the problem. [But, then, perhaps a better design choice would find a way to enforce, or sidestep the need for, a more careful programming culture…]

Oh. Sorry.

engineer_comp_geek and Bytegeist: Excellent examples. Exactly the sort of stuff I was asking for. Thank you very much. I can go to bed now. Good night!

But it might not. If a is larger than one most of the time, the code will silently succeed most of the time. If 0 < a < 1 is a very rare case, QA testing will not discover it, but it will turn up in the field. If this incorrect code is buried very deep in a system, (e.g. as a precompiled library) it may stop being obvious what is causing the error, and it may be seen as a “every once in a while the thing just fails” error.

Yeah. Unfortunately, far too few of these routines allow you to pass these messages up and down the chain. Without them, it’s very difficult to figure out where the problem lies. But each programmer thought his routine was bulletproof and wouldn’t need to pass along any error messages. Aargh!

Indeed (you appear to have caught me mid-edit, though, to be fair, that’s not hard to do :))

Here’s something I learned as a programmer: “Almost never”, “rarely”, “sometimes”, and “usually” all mean the exact same thing: “Yes, it does happen, and the code had better be able to handle it.”

Those are the sort of errors that made me grateful that my code (or rather, the code that my bosses rushed through) was used by business, and not by healthcare.

This is the philosophy that Java’s checked exceptions is based on. A function can declare that it throws a certain exception, and the user of that function **must ** either handle that exception, or declare that his function might throw that exception as well. This enforces linguistically an obligation to handle the edge cases.

Checked exceptions are a bit unwieldy for math though. I’d hate to have to enclose every division with a try { … } catch (DivideByZeroException) { System.out.println(“You Suck!”); }

Only if he runs it with the particular input that produces the error. For sqrt, it’s pretty obvious, but what if it’s “x = foobar(a,b,c)”?

Then worry no longer. Under Java 7, division is accomplished as follows:
[ol]
[li]Request a singleton instance of the dividend via static methods in the java.math.Dividend class.[/li][li]Similarly for the divisor from java.math.Divisor.[/li][li]Define and create an instance of an inner class that has a callback method for receiving the quotient. It should derive from java.math.QuotientReceiver.[/li][li]Call the dividend object’s beginDividing() method, passing it references to your dividend object and quotient receiver object.[/li][li]Returned from this call will be a serial number identifying your division request. Store this somewhere handy.[/li][li]When the division is complete, the quotient will be passed as an instance of java.lang.Number to your quotient receiver’s receiveQuotient() method, along with the serial number of the originating request. By default this key-value pair will be stored in an internal hash (of type java.util.HashMap), but you can override this behavior. Error reporting commonly goes here.[/li][li]At some later time, if or when you need the result, call the quotient receiver’s getQuotient() method, passing it the serial number stored earlier. This is your answer.[/li][li]When you’re done with the dividend and divisor singletons, call their close() methods. Otherwise later divisions by 6, for example, might fail.[/li][li]Close() can of course throw a variety of exceptions, so be sure to surround the calls with a try-block.[/li][/ol]

Next week I’ll cover how to compare two Unicode strings in Java — which under SE 7 can now be achieved in under 50 lines of code.

I’d go further: for programmers, the opposite of impossible is inevitable. If there is some finite chance of a thing failing, it will, if your userbase is large (millions).

A small exception is where you can demonstrate mathematically that a failure is rare enough to be implausible–say, hash collisions in a 256-bit cryptographic hash.

For what it’s worth, NaNs allow you to have the best of both worlds: triggering unrecoverable errors in the programming environment (so you are forced to fix them), as well as reasonable robustness against failures in the field. You can write your code so that intermediate NaNs are checked in debug code (so that you see them at the point where they occur), while in release code you just have one check at the end that allows a graceful failure in case of problems.

BTW, the first Ariane 5 rocket blew up due to an unhandled numeric conversion exception. Probably would have been ok had the software clamped the final value or done something “sensible”.

I don’t think it’s too much odder than having int indexOf(element) methods return -1 for a nonexistent element. Yes, it would be better to use a boolean contains(element) in many cases, but in large systems there’s always a CHANCE that something won’t be where it should be, and it limits near redundant checks. For instance, if I had to ensure an element was existing, and if so get the element’s index I could use:

if(structure.contains(element){
whatever = structure.indexOf(element);
}
// do stuff

or I could just used:

whatever = structure.indexOf(element);
if(whatever >= 0){
// do stuff
}

Which at the very least saves me a function call (assuming the optimizer doesn’t find some automagic way of cleaning it up). Further, it’s much more acceptable to be able to pass in invalid arguments than have it crash your program with an NPE every time you try to get the index of a nonexistant object. NAN fills a similar concept, sure, you shouldn’t be passing -1 into sqrt, but in large libraries (especially ones you have no control over the source for) being able to accept a graceful notification that what you want simply can’t happen is sometimes better than throwing a temper tantrum and killing your program. This is especially true that many integral systems (i.e. ones that control precision machinery) are very fond of floating point arithmetic, and allowing the odd NAN is much safer for actual, fleshy meatbags than throwing a tantrum and dropping a bucket of slag (and much prettier than checking DivideByZeroErrors all the time).

This explains why our code stopped running properly on my co-worker’s computer when she inadvertently replaced 6.0.26 with 7…

I’d recommend the lengthy but rather accessible article What every computer scientist should know about floating-point arithmetic. It gives a pretty good explanation of why the standard was written the way it was.

The first example given in the NaN section - a zero-finder for functions. If a numeric solver has to make initial guesses, it will sometimes pick invalid points for the function. But the programmer has no way of knowing in advance what function is going to be passed, so can’t just avoid testing those points (without essentially implementing multiple methods of analyzing the function).

I call bullshit on your entire premise. Since the 1980s almost all computer floating-point arithmetic uses the IEEE 754 standard, or some minor variation thereupon, and this standard quite clearly indicates that division by zero does not result in a NaN, except for the pathological case that the dividend is also zero.

Which is horrible, horrible design. If you change something, you make sure to keep it where the old version works, at least for a while, or you let both versions exist simultaneously. Breaking old code when old code is still heavily in use is bad, bad programming.

Division of non-zero numbers by zero is not particularly vital to the OP, is it?.. There’s still the matter of all the operations which do standardly produce NaNs.

Honestly, if NAN “propagat[es] that NAN all over the place, and thrashing the internals of the program in a manner which can be extremely frustrating to track down.” then you have problems anyway. There are plenty of things that can do similar errors, NAN is probably one of the least common ones. Simple misplacing of your parens in expressions is probably more likely than a NAN to send your program in a similar, rather spectacular, failure cascade, though I suppose NANs can be caused by misplaced parens.

There are tons of little, unnoticeable errors that are a complete pain to track down. NAN contributes to those, for sure, but depending on what types of data you frequently work with, it’s usually pretty uncommon. There are tons of logic errors that simply do not throw anything, the worst of which are the ones where you had something like an off by one error somewhere that makes the program work MOST of the time, return something strange, but not a complete outlier 5% of the time, and then segfaults the whole damn thing 3 months later when it finally gets the right pointer. I think NAN is just noticeable because when you get burned by it you tend to say “Not a Number!? Then why the hell is it even IN the floating point NUMBER I declared?”

I didn’t get that impression, since the only example he offered was division by zero. Though if he’s arguing against NaNs in general, then his “olden days” almost predate the microcomputer era, in which case the behaviour he is talking about must have been idiosyncratic to whatever mainframe or minicomputer he happened to be using. Before floating point standardization, each computer used its own system for dealing with undefined or unrepresentable values: such a system could have thrown an error, but it could just as easily have involved setting status register bits, or even using special values like IEEE NaN. He is therefore not correct in his blanket assertion that computers in the “olden days” crashed upon encountering an undefined or unrepresentable value. Maybe they crashed on his system, but certainly not on all of them (nor, I think, on even the majority).

This is incorrect. Right after the first mention of division by zero in the OP come two other examples.

I stand corrected; my apologies.