Why does SQL assume you don't want to indlude NULLs in queries?

Example…

“select * from clients where state not in ('NY,‘CA’)”
In this query I want to see all ‘clients’ except those who are in Calif or New York.

But SQL actually goes one further - It shows me all clients except those who are in Calif or New York or have a null in the state column!!!

This pisses me off because it is clearly not what I asked for. And it means queries I’ve run in the past have been imperfect.

Why does SQL do this?? It means I have to extend a query just to accomodate this annoying trait…

“select * from clients where (state not in ('NY,‘CA’) or state is null)”

But more importantly it means queries I did before discovering this have been wrong.

It is what you asked for - it’s just not what you wanted. You can fix this by saying “WHERE COALESCE(STATE,‘ZZ’) NOT IN(‘NY’,‘CA’)”. COALESCE returns the value of its first non-null argument, that is, either what’s in STATE if STATE is not null or ‘ZZ’ if it is. As to your prior queries, I guess that’s an RTFM error. :stuck_out_tongue:

It does it that way because that’s the nature of three-value logic. (Or at least, the type of three-value logic that SQL uses.)

Let’s consider a simpler example: “SELECT * FROM clients WHERE state != ‘NY’”

This will find all records for which the comparison state != ‘NY’ results in a true value. Why doesn’t it include the NULLs? Because while the comparison ‘CA’ != ‘NY’ is true, the comparison NULL != ‘NY’ is NULL. In other words, the result of any logical comparison against a NULL is also NULL.

Thus, when the server looks at a record with a NULL state, and executes the comparison NULL != ‘NY’, the result is NULL, which is false, and the record is excluded.

Why do it this way? Because NULL means “unknown” and the result of comparing something to an unknown value is also unknown. Thus, even NULL = NULL returns NULL. That’s why SQL has a separate operator, IS NULL, to determine if something is NULL. Because saying = NULL doesn’t make sense.

So that’s why you have to say: SELECT * FROM clients WHERE state NOT IN (‘NY’, ‘CA’) OR state IS NULL;

There’s really no need to use a COALESCE in this example. A simple IS NULL will do the trick.

One way to think of null is as “unknown”. The database can’t say definitively that states that are null are not actually NY or CA so it doesn’t return them unless you specifically add that to your criteria.

ETA: Or, what friedo said.

Thanks for the answers all :slight_smile:

It seems, to me at least, a bit pedantic (specifically the idea that SQL says " ‘NULL’ Could be NY or CA" so I won’t include it in your results dude"
So I guess I’ll have to be more careful with my queries on columns containing null values.

Some people are afraid of NULLs and insist they shouldn’t be used. Those people tend to just not understand how they work. They are a valuable tool, when used properly, and when you know what to expect.

There’s really no need to use an IS NULL in this example, a simple COALESCE will do the trick. Then you don’t need to dick around with OR, although it’s possible OR has ceased to be Stage 2 since the last SQL training course I went on.

I think we’re better off with our computers being “pedantic.” To me that just means they’re adhering to spec and not leaving anything up to the (dbms) developer’s personal interpretation, despite what some people might think of as making more sense.

Null isn’t anything, so it is not like, unlike, equal to, greater than or less than anything.

To be a bit pedantic myself:
“select * from clients where (state not in ('NY,‘CA’) or state is null)”
Won’t give you any different results than your first query.
It would just give you anything where state is in (‘NY’,‘CA’) or where state is null.

You need something more like:
select * from clients where state not in ('NY,‘CA’) AND state is not null

It’s a bit like the question “Have you stopped beating your wife yet?”

I agree with the prior statement about standards. Variation from a standard should be accompanied by clear and complete documentation. Lack of compatability is worse, but acceptable if done honestly, and accompanied by the docs. That said, SQL implementations have a lot of variance, and many are SQL ‘adaptors’ trying to get a good match with common SQL usage. The trouble is that in many databases, in a non-SQL context, null is the 0 length string, rather than something different. I deal peripherally with these issues often.

Somebody said: I love standards, because there are so many to chose from.

That would have to include C.J. Date who probably knows as much about relational databases as anybody. I will quote his recommendations regarding usage of nulls in A Guide to the SQL Standard - 4th edition in its entirety:

Indeed. He’s also wrong. :stuck_out_tongue:

FWIW, EF Codd, the other half of Date & Codd, suggests that not only should relational databases have NULLs, but they should have two distinct NULLs, one for “missing but applicable” and another for “missing but inapplicable.”

So it’s not like the guys who invented this stuff are in agreement on the issue.

Please tell me that no other database vendor has copied Oracle’s idiocy on NULLs and zero-length strings.

I’ve seen a few different implementations of null handling in SQL - I think it’s Visual dBase that lets you have WHERE [somefield]=Null;

This always irked me intensely, because from a philosophical POV, I don’t think null should equal anything (even another null). There should be a way of asking ‘Is it Null?’, but ‘equal to Null’ is an inherently broken statement (IMO, YMMV)

This can come up in subqueries as well. If you do something like:

select * from Customers where custid not in (select custid from Orders)

You’ll get no records if there’s a null value anywhere in Orders.custid.

Tell that to this guy. Or, more likely, don’t. You might give the poor guy another stroke.

From here:

Also, You Suck:

Doesn’t really mean Codd was right or wrong, it all depends on your goal and can be situational.

I personally think the negatives (increased stmt complexity and potentially more difficult to optimize) outweigh the positives (perfect accuracy and clarity, but this can be achieved with simpler methods many times).