SQL query: MAX(CASE WHEN...

Honestly, this looks a bit like klunky database design. From what I can see, there are two tables, user, and user_meta. User contains basic user information, while user_meta contains additional user metadata, such as first name, last name and user_level. Frankly, this really should have been put into a single table, IMHO.

Anyway, so what we have is a set of tables a bit like this:

Users:



id          username
----------- ----------
1           bbrown
2           amurray
3           mobama
4           ttest


Users_meta:



user        meta       value
----------- ---------- ----------
1           first_name Bobby
1           last_name  Brown
1           user_level 1
2           first_name Andy
2           last_name  Murray
2           user_level 4
3           first_name Michelle
3           last_name  Obama
3           user_level 5
4           first_name Test
4           last_name  Test
4           user_level 6


What that query is trying to do is join these two tables together to give you a single resultset, with a list of all users, their username, first name, last name and user level.

If you do a straight join between the tables, without doing any GROUP BY, you get this query:



SELECT
	Users.id as id,
	Users.username as username,
	CASE WHEN Users_meta.meta = 'first_name' THEN Users_meta.value ELSE NULL END AS first_name,
	CASE WHEN Users_meta.meta = 'last_name' THEN Users_meta.value ELSE NULL END AS last_name,
	CASE WHEN Users_meta.meta = 'user_level' THEN Users_meta.value ELSE NULL END AS user_level
FROM
	Users
LEFT JOIN Users_meta
	ON Users_meta.[user] = Users.id


Here are the results:



id          username   first_name last_name  user_level
----------- ---------- ---------- ---------- ----------
1           bbrown     Bobby      NULL       NULL
1           bbrown     NULL       Brown      NULL
1           bbrown     NULL       NULL       1
2           amurray    Andy       NULL       NULL
2           amurray    NULL       Murray     NULL
2           amurray    NULL       NULL       4
3           mobama     Michelle   NULL       NULL
3           mobama     NULL       Obama      NULL
3           mobama     NULL       NULL       5
4           ttest      Test       NULL       NULL
4           ttest      NULL       Test       NULL
4           ttest      NULL       NULL       6


A few things to note here: we have multiple rows per user, and we also have a large number of NULL values, because the join is evaluated per row in the users_meta table, giving us 4 rows per user. What you want to do is collapse that into a set that only gives us one row per user, eliminating the NULLs at the same time. That’s where the GROUP BY clause comes in:



SELECT
	Users.id as id,
	Users.username as username,
	CASE WHEN Users_meta.meta = 'first_name' THEN Users_meta.value ELSE NULL END AS first_name,
	CASE WHEN Users_meta.meta = 'last_name' THEN Users_meta.value ELSE NULL END AS last_name,
	CASE WHEN Users_meta.meta = 'user_level' THEN Users_meta.value ELSE NULL END AS user_level
FROM
	Users
LEFT JOIN Users_meta
	ON Users_meta.[user] = Users.id

GROUP BY
	Users.id, users.username


However, you can’t run this query (at least in SQL Server), because it gives you the following error message:



Msg 8120, Level 16, State 1, Line 4
Column 'Users_meta.meta' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Msg 8120, Level 16, State 1, Line 4
Column 'Users_meta.value' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.


You can’t include a column in the SELECT portion of a GROUP BY unless it’s included in an aggregate function, or else is part of the GROUP BY clause. The logical thing to do is to wrap an aggregate function around each meta_value, which takes you back to your original query. MAX is just the most logical aggregate function to use; but MIN would work equally well, since in each column group there’s only one non-null value. MAX and MIN both eliminate NULLs when doing a column concatenation of this sort - the MAX of a column is only NULL if every value in the column in NULL, and the same applies to MIN.

Here’s your query with MAX replaced with MIN:



SELECT
	Users.id as id,
	Users.username as username,
	MIN(CASE WHEN Users_meta.meta = 'first_name' THEN Users_meta.value ELSE NULL END) AS first_name,
	MIN(CASE WHEN Users_meta.meta = 'last_name' THEN Users_meta.value ELSE NULL END) AS last_name,
	MIN(CASE WHEN Users_meta.meta = 'user_level' THEN Users_meta.value ELSE NULL END) AS user_level
FROM
	Users
LEFT JOIN Users_meta
	ON Users_meta.[user] = Users.id

GROUP BY
	Users.id, users.username


And here are the results: identical, as you can see:



id          username   first_name last_name  user_level
----------- ---------- ---------- ---------- ----------
2           amurray    Andy       Murray     4
1           bbrown     Bobby      Brown      1
3           mobama     Michelle   Obama      5
4           ttest      Test       Test       6
Warning: Null value is eliminated by an aggregate or other SET operation.

You also get a useful warning that the NULL values have been eliminated by an aggregate, which in this case is exactly what we wanted - though in other cases the warning might be an indication of something incorrect in our query.