Why does SQL assume you don't want to indlude NULLs in queries?

OK, now I’m curious. Taking Date as paradigmatic, how, in the general case, do you avoid NULLs in SQL? Take this example:

You’re trying to build a database that contains the names, hair colors, and ages of everyone in your family. Now, your uncle is bald and, as your maiden aunts are all ladies, well, a lady never tells her age and a gentleman never asks.

I guess you could make ‘bald’ a hair color by doing gross violence to the concepts involved, but how do you handle the ages of your maiden aunts?

Sorry I don’t think that’s right. I’m at home now wo I don’t have the query in front of me, but when I made the OP I was quoting an actual query (the important bits of a larger query anyway)

this query…

select * from clients where state not in (‘NY’,‘CA’)

returned less results than this query…

select * from clients where (state not in (‘NY’,‘CA’) or state is null)

also… the whole point of my OP was that I wanted to have the records with nulls returned. your query would not achieve that.

In that case, you would build a table with identifiers for the name, and then link each name to a table containing hair colors, and a separate table containign ages. For those which are known, you will have a relationship, but for those which are unknown, there will be no relationship. It’s the ultimate in normalized structure. :slight_smile:

Not that I am trying to pick a side or add an answer here, but in our system we have a state code of ‘NA’ which stands for ‘Not Applicable’.

Not saying that’s right or wrong, just providing an anecdote :slight_smile:

So what will be contained in the foregin key ‘hair colour’ field? NULL?

I’m curious, then. How would you do an optional foreign key, where a record in A may or may not have a parent record in B, without NULLs?

I’ve seen cases where people put a special record in B whose sole purpose is as a placeholder pkey for records in other tables that don’t reference any B’s, and therefore would otherwise have a NULL fkey.

OK, so you’ve eliminated pesky NULLs. Now, all you have to do is remember to filter out that magic record every time you do a query on table B! How has this made anything simpler? And why is perverting the purpose of a record to make a hardcoded special value for an fkey somehow better than simply not putting an fkey value where there’s no foreign record to reference?

Also, left joins. What placeholder value do you specify for records in the right-side table that don’t match the left side, if there are no NULLs? You can’t use 0, because 0 may be a valid value in an integer column. You can’t use empty-string, because that may be a valid value in a string column. If only we had some special marker to indicate that no data exists!

OK, rant over.

Unless I’m misunderstanding your question, one way you can accomplish this is by setting up a third table (table C) that contains records representing the relationship between the two tables (i.e., like what is done when creating a many-to-many relationship).

For example, this table might contain only two columns: one that references the record in A, and one that references the record in B. This wouldn’t interfere with your ability to use LEFT JOINs either, since the absence of a record in table C can be thought of as an implied NULL.

ETA: I’m of the mind that NULLs can indeed be useful in a database, but that their usage should always be individually justified.

So now you’ve added a whole new join table just to get rid of NULLs. In addition to making what should be a simple N:1 join more complicated, you’ve now lost transactional enforcement of the N:1 relationship, because the presence of a join table means you could accidentally make it N:N. Normalization ruined and complexity added, all for the noble cause of NULLlessness. :frowning:

You have my complete agreement there.

Don’t know the details of the database, but a quick answer is FKEY=-1. This eliminates a NULL from FKEY but maybe you were really asking about how to eliminate the nulls on left join? If that is really the question, I wouldn’t propose that, when you join you do need to know if you have a matching row or not.

Adding a magic record would probably be the last resort to solve just about any problem in the DB, although I probably have done it once or twice.

As I said, you need to know if the join found a matching row in the other table, need nulls in that condition.

But in some (not all) cases of a field in a table, it isn’t important to make the distinction between blank and “could be blank but might not be blank, not really sure, dang I wish we knew for certain” because at that point all you know is that you don’t have the information. If the information is not required (as in not “absolutely required for the proper functioning of the app”), then Lobsang’s query should be able to return the “desired” result without having to worry about how nulls are handled in conditionals. In other words, if the value is not NY or CA (for any reason, we don’t care why, which is quite often the “desired” query) then include it in the set.

Well, that was just an example of how a foreign key relationship could be implemented/enforced without the use of a NULLable column. Your point is valid: this type of implementation is not optimal in all situations. (However, it is still possible to ensure the table does not inadvertently allow many-to-many relationships from being created.)

IMO, an abundance of NULLable columns is kind of like database de-normalization; its validity will vary from database to database - but should be avoided when possible.

My knowledge of Oracle is limited. I have heard it said that in Oracle, SQL NULL is equivalent to a 0 length string (NULL STRING). I know of an associative database that predates SQL and Oracle. It doesn’t have an internal NULL data type (thats what SQL NULL is), so it uses special strings to represent SQL NULL. That occasionally causes a problem. But it can also maintain non-relational data structures that are extremely difficult to implement and inefficient in relational systems. It is also much more efficient at many SQL operations, and far more scalable than relational systems. So choose your poison. Everyone considers a system that doesn’t work the way they want it to, to be idiotic. But using a product that doesn’t do what you want it to is idiotic, as well as expecting a product to do what you want it to when it doesn’t is idiotic.

What gross violence? What concept? ‘Hair Color’ is the name of a column. If it is defined to contain values ‘Blond’, ‘Brown’, ‘Red’, ‘Black’, ‘White’, ‘Gray’ and ‘Bald’, what would the problem be? You can add values like ‘Not Known’ and ‘Only Her Hairdresser Knows For Sure’. If you think ‘Bald’ is not a color, then why would ‘Black’ be a color? ‘Black’ is the absence of color.

All of these arguments illustrate the often dismissed necessity of some formal education in data management. Sure, to a lay person, it looks simple, you just exclude the states you don’t want, and you should get all the other records, right? Except because of some necessary constructs such as NULLs, it isn’t quite that simple.

In other words, if Lobsang’s boss gets pissed because he discovers his reporting has been somewhat inaccurate, he (the boss) really has no one to blame but himself.

If that is not the behavior you want, you should have set up your table so that it would not allow NULL in that field. You could have used an empty string.

FYI most of my queries in the past are likely to be fine/unaffected as they do not work on fields containing nulls.

Also, I was kind of aware of this behavour of nulls for a while. It did turn up in one of my earlier queries where I ended up adding the ‘or [field] is null’ bit.

What prompted me to post this thread was that it reared its ugly head in a recent event where I had simply forgot about it (and also happened to not be aware that one of the fields I was working on contained nulls)
You can’t really blame my boss for my lack of formal training. My SQL has developed over the years, and the use of it is not a primary job requirement. I’ve just been able to do useful things for the company over the years with an ever expanding knowledge/ability to use sql. For the most part any disadvantage of not having had formal training has had little or no impact on the ‘service’ I provide as the company’s sql guy.

Maybe I misunderstand the whole discussion, but if I want to avoid NULL values for the hair_colour field, I would set up my tables like this:

Table Person
columns:
person_id Primary Key
person_name

Table Hair_colour
columns:
person_id Primary Key and also Foreign Key referencing Person
hair_colour

There can only be one row in Hair_colour for each Person. People that are bald will have 0 rows in table Hair_colour.

Confirming what Lobsang said:

If you want to return rows with a null value for state, then you would need to say (state not in (‘NY’, ‘CA’) or state is null)

If you want to eliminate nulls, then you would need to say
state not in (‘NY’, ‘CA’) and state is not null
which would be the same as
state not in (‘NY’, ‘CA’)

(see proof below)

The SQL behaviour makes sense to me, if you consider NULL to mean ‘unknown value’. For the customers that have NULL, I don’t know if they are in NY or CA - they might be! So when I ask “show me customers for which I know that they are not in NY or CA”, I should not show the NULL customers - those could be in NY or CA. If you ask “show me customers for which I know that they are not in NY or CA, and also show me the customers for which I do not know the location” then it’s obvious you have to include the “and state is null” condition.

Example (using Oracle)



SQL> create table state (state_id integer, state_code char (2)) ;
Table created.
SQL> insert into state (state_id, state_code)
  2  select 1, 'NY' from dual
  3  union all
  4  select 2, 'CA' from dual
  5  union all
  6  select 3, 'AK' from dual
  7  union all
  8  select 4, NULL from dual ;
4 rows created.
SQL> commit ;
Commit complete.

SQL> select * from state ;
  STATE_ID ST
---------- --
         1 NY
         2 CA
         3 AK
         4

SQL> select * from state where state_code not in ('NY', 'CA') ;
  STATE_ID ST
---------- --
         3 AK

SQL> select * from state
  2   where state_code not in ('NY', 'CA') and state_code is not null ;
  STATE_ID ST
---------- --
         3 AK

SQL> select * from state
  2   where state_code not in ('NY', 'CA') or state_code is null ;
  STATE_ID ST
---------- --
         3 AK
         4


Using a combined pkey/fkey on the hair color table is, indeed, a good solution. (I’ve used this strategy also.)

But consider that in order to use this structure, you have to left join People with Hair Color. The result of that left join is that anyone without a hair color gets a NULL in the result set. So you still have to deal with NULLs; they’re just transient instead of stored. God help you if you do it in a subquery, or try to filter on a boolean operation of Hair Color, without understanding the consequences. :stuck_out_tongue:

And people for whom you don’t know their hair color will also have 0 rows in the table. So now you don’t actually know if Uncle Otto is bald, or if you just don’t know what color his hair is.

I’d say the way to go would be to have column Hair Colour, and include both “bald” and “unknown” as possible values. Acknowledging that “bald” isn’t really a hair color, but is the closest you can get.

Such is the troubled lot of a database developer. :stuck_out_tongue:

I’m going to assume you misquoted here …