Why does SQL assume you don't want to indlude NULLs in queries?

Lobsang · May 27, 2010, 3:33pm

Example…

“select * from clients where state not in ('NY,‘CA’)”
In this query I want to see all ‘clients’ except those who are in Calif or New York.

But SQL actually goes one further - It shows me all clients except those who are in Calif or New York or have a null in the state column!!!

This pisses me off because it is clearly not what I asked for. And it means queries I’ve run in the past have been imperfect.

Why does SQL do this?? It means I have to extend a query just to accomodate this annoying trait…

“select * from clients where (state not in ('NY,‘CA’) or state is null)”

But more importantly it means queries I did before discovering this have been wrong.

Malacandra · May 27, 2010, 3:40pm

It is what you asked for - it’s just not what you wanted. You can fix this by saying “WHERE COALESCE(STATE,‘ZZ’) NOT IN(‘NY’,‘CA’)”. COALESCE returns the value of its first non-null argument, that is, either what’s in STATE if STATE is not null or ‘ZZ’ if it is. As to your prior queries, I guess that’s an RTFM error.

friedo · May 27, 2010, 3:47pm

It does it that way because that’s the nature of three-value logic. (Or at least, the type of three-value logic that SQL uses.)

Let’s consider a simpler example: “SELECT * FROM clients WHERE state != ‘NY’”

This will find all records for which the comparison state != ‘NY’ results in a true value. Why doesn’t it include the NULLs? Because while the comparison ‘CA’ != ‘NY’ is true, the comparison NULL != ‘NY’ is NULL. In other words, the result of any logical comparison against a NULL is also NULL.

Thus, when the server looks at a record with a NULL state, and executes the comparison NULL != ‘NY’, the result is NULL, which is false, and the record is excluded.

Why do it this way? Because NULL means “unknown” and the result of comparing something to an unknown value is also unknown. Thus, even NULL = NULL returns NULL. That’s why SQL has a separate operator, IS NULL, to determine if something is NULL. Because saying = NULL doesn’t make sense.

So that’s why you have to say: SELECT * FROM clients WHERE state NOT IN (‘NY’, ‘CA’) OR state IS NULL;

friedo · May 27, 2010, 3:48pm

There’s really no need to use a COALESCE in this example. A simple IS NULL will do the trick.

Nars_Glinley · May 27, 2010, 3:51pm

One way to think of null is as “unknown”. The database can’t say definitively that states that are null are not actually NY or CA so it doesn’t return them unless you specifically add that to your criteria.

ETA: Or, what friedo said.

Lobsang · May 27, 2010, 4:30pm

Thanks for the answers all

It seems, to me at least, a bit pedantic (specifically the idea that SQL says " ‘NULL’ Could be NY or CA" so I won’t include it in your results dude"
So I guess I’ll have to be more careful with my queries on columns containing null values.

friedo · May 27, 2010, 4:38pm

Some people are afraid of NULLs and insist they shouldn’t be used. Those people tend to just not understand how they work. They are a valuable tool, when used properly, and when you know what to expect.

Malacandra · May 27, 2010, 4:57pm

There’s really no need to use an IS NULL in this example, a simple COALESCE will do the trick. Then you don’t need to dick around with OR, although it’s possible OR has ceased to be Stage 2 since the last SQL training course I went on.

arseNal · May 27, 2010, 6:21pm

I think we’re better off with our computers being “pedantic.” To me that just means they’re adhering to spec and not leaving anything up to the (dbms) developer’s personal interpretation, despite what some people might think of as making more sense.

Mangetout · May 27, 2010, 6:26pm

Null isn’t anything, so it is not like, unlike, equal to, greater than or less than anything.

M.Constant · May 27, 2010, 6:35pm

To be a bit pedantic myself:
“select * from clients where (state not in ('NY,‘CA’) or state is null)”
Won’t give you any different results than your first query.
It would just give you anything where state is in (‘NY’,‘CA’) or where state is null.

You need something more like:
select * from clients where state not in ('NY,‘CA’) AND state is not null

Mangetout · May 27, 2010, 6:43pm

It’s a bit like the question “Have you stopped beating your wife yet?”

TriPolar · May 27, 2010, 6:54pm

I agree with the prior statement about standards. Variation from a standard should be accompanied by clear and complete documentation. Lack of compatability is worse, but acceptable if done honestly, and accompanied by the docs. That said, SQL implementations have a lot of variance, and many are SQL ‘adaptors’ trying to get a good match with common SQL usage. The trouble is that in many databases, in a non-SQL context, null is the 0 length string, rather than something different. I deal peripherally with these issues often.

Somebody said: I love standards, because there are so many to chose from.

ticker · May 27, 2010, 7:11pm

That would have to include C.J. Date who probably knows as much about relational databases as anybody. I will quote his recommendations regarding usage of nulls in A Guide to the SQL Standard - 4th edition in its entirety:

friedo · May 27, 2010, 7:16pm

Indeed. He’s also wrong.

FWIW, EF Codd, the other half of Date & Codd, suggests that not only should relational databases have NULLs, but they should have two distinct NULLs, one for “missing but applicable” and another for “missing but inapplicable.”

So it’s not like the guys who invented this stuff are in agreement on the issue.

Rysto · May 27, 2010, 7:21pm

Please tell me that no other database vendor has copied Oracle’s idiocy on NULLs and zero-length strings.

Mangetout · May 27, 2010, 9:05pm

I’ve seen a few different implementations of null handling in SQL - I think it’s Visual dBase that lets you have WHERE [somefield]=Null;

This always irked me intensely, because from a philosophical POV, I don’t think null should equal anything (even another null). There should be a way of asking ‘Is it Null?’, but ‘equal to Null’ is an inherently broken statement (IMO, YMMV)

borschevsky · May 27, 2010, 9:09pm

This can come up in subqueries as well. If you do something like:

select * from Customers where custid not in (select custid from Orders)

You’ll get no records if there’s a null value anywhere in Orders.custid.

Derleth · May 27, 2010, 10:28pm

Tell that to this guy. Or, more likely, don’t. You might give the poor guy another stroke.

From here:

Also, You Suck:

RaftPeople · May 27, 2010, 10:34pm

Doesn’t really mean Codd was right or wrong, it all depends on your goal and can be situational.

I personally think the negatives (increased stmt complexity and potentially more difficult to optimize) outweigh the positives (perfect accuracy and clarity, but this can be achieved with simpler methods many times).

Topic		Replies	Views
What's wrong with my SQL Factual Questions	18	6191	October 14, 2010
Sas proc sql - null Factual Questions	7	1294	November 8, 2016
SQL help (ordering search results) Factual Questions	6	4685	October 21, 2010
is there a tool that tells you "why" a particular tuple is not in SQL query result set? Factual Questions	8	1134	April 14, 2011
My First MS Access Question: #Error Factual Questions	6	928	January 21, 2005

Why does SQL assume you don't want to indlude NULLs in queries?

Related topics