It is what you asked for - it’s just not what you wanted. You can fix this by saying “WHERE COALESCE(STATE,‘ZZ’) NOT IN(‘NY’,‘CA’)”. COALESCE returns the value of its first non-null argument, that is, either what’s in STATE if STATE is not null or ‘ZZ’ if it is. As to your prior queries, I guess that’s an RTFM error.
It does it that way because that’s the nature of three-value logic. (Or at least, the type of three-value logic that SQL uses.)
Let’s consider a simpler example: “SELECT * FROM clients WHERE state != ‘NY’”
This will find all records for which the comparison state != ‘NY’ results in a true value. Why doesn’t it include the NULLs? Because while the comparison ‘CA’ != ‘NY’ is true, the comparison NULL != ‘NY’ is NULL. In other words, the result of any logical comparison against a NULL is also NULL.
Thus, when the server looks at a record with a NULL state, and executes the comparison NULL != ‘NY’, the result is NULL, which is false, and the record is excluded.
Why do it this way? Because NULL means “unknown” and the result of comparing something to an unknown value is also unknown. Thus, even NULL = NULL returns NULL. That’s why SQL has a separate operator, IS NULL, to determine if something is NULL. Because saying = NULL doesn’t make sense.
So that’s why you have to say: SELECT * FROM clients WHERE state NOT IN (‘NY’, ‘CA’) OR state IS NULL;
One way to think of null is as “unknown”. The database can’t say definitively that states that are null are not actually NY or CA so it doesn’t return them unless you specifically add that to your criteria.
It seems, to me at least, a bit pedantic (specifically the idea that SQL says " ‘NULL’ Could be NY or CA" so I won’t include it in your results dude"
So I guess I’ll have to be more careful with my queries on columns containing null values.
Some people are afraid of NULLs and insist they shouldn’t be used. Those people tend to just not understand how they work. They are a valuable tool, when used properly, and when you know what to expect.
There’s really no need to use an IS NULL in this example, a simple COALESCE will do the trick. Then you don’t need to dick around with OR, although it’s possible OR has ceased to be Stage 2 since the last SQL training course I went on.
I think we’re better off with our computers being “pedantic.” To me that just means they’re adhering to spec and not leaving anything up to the (dbms) developer’s personal interpretation, despite what some people might think of as making more sense.
To be a bit pedantic myself:
“select * from clients where (state not in ('NY,‘CA’) or state is null)”
Won’t give you any different results than your first query.
It would just give you anything where state is in (‘NY’,‘CA’) or where state is null.
You need something more like:
select * from clients where state not in ('NY,‘CA’) AND state is not null
I agree with the prior statement about standards. Variation from a standard should be accompanied by clear and complete documentation. Lack of compatability is worse, but acceptable if done honestly, and accompanied by the docs. That said, SQL implementations have a lot of variance, and many are SQL ‘adaptors’ trying to get a good match with common SQL usage. The trouble is that in many databases, in a non-SQL context, null is the 0 length string, rather than something different. I deal peripherally with these issues often.
Somebody said: I love standards, because there are so many to chose from.
That would have to include C.J. Date who probably knows as much about relational databases as anybody. I will quote his recommendations regarding usage of nulls in A Guide to the SQL Standard - 4th edition in its entirety:
FWIW, EF Codd, the other half of Date & Codd, suggests that not only should relational databases have NULLs, but they should have two distinct NULLs, one for “missing but applicable” and another for “missing but inapplicable.”
So it’s not like the guys who invented this stuff are in agreement on the issue.
I’ve seen a few different implementations of null handling in SQL - I think it’s Visual dBase that lets you have WHERE [somefield]=Null;
This always irked me intensely, because from a philosophical POV, I don’t think null should equal anything (even another null). There should be a way of asking ‘Is it Null?’, but ‘equal to Null’ is an inherently broken statement (IMO, YMMV)
Doesn’t really mean Codd was right or wrong, it all depends on your goal and can be situational.
I personally think the negatives (increased stmt complexity and potentially more difficult to optimize) outweigh the positives (perfect accuracy and clarity, but this can be achieved with simpler methods many times).