What I am trying to do is the get the results of SELECT item, sum(qty) as tq, sum(total) as tp FROM items GROUP BY item ORDER BY tq DESC with rankings, so that the results look like this
Not that I can SQL my way out of a paper bag, but why are you trying to do this in one query? My first (albeit typically clumsy) instinct would be to base the query you’re looking for on a summation query one step earlier.
Yeah, I’d go that way. Not sure if there’s any way to get the actual ranking values in there - I’d probably do that with a loop counter in whatever program is calling mysql and outputting the results. sql by its nature isn’t the right tool for every job.
Inline view? SELECT… FROM (SELECT…) AS MYVIEW, then your inner SELECT is queried just as if it were a view or table in its own right - put your GROUP BY in there - and your outer SELECT can ORDER BY all it likes. Efficiency-wise, it should run just fine bearing in mind the necessary overheads (sorting and so on) and it’s for sure DB2 will run faster than your own program code.
Create a view V that has everything but the rank, and something like this should work:
SELECT count(SELECT Item FROM V as V2 WHERE V2.tq > V1.tq) + 1 as rank, Item, tq, tp FROM V as V1
So for each item in V, you’re querying V for all the items whose total quantity sold is less than the current item’s, counting them, and adding one. If you want to do this without an intermediate query, you can just replace V with the SQL used to define it everywhere it shows up.
On the other hand, if you want to do this in some imperative language, all you have to do is initialize a counter to 1 and walk over the records in order. For each record, you add a field “Rank” with the current value of the counter and add 1. That’s almost certainly going to be faster, unless the query optimizer is way smarter than I think it is.
Building on what Malacandra and ultrafilter have said, you can do it using an inline view, without creating a separate view:
select * from items ;
item qty price total
ItemA 12 10 120
ItemB 2 20 40
ItemA 2 10 20
ItemC 4 30 120
ItemD 1 40 40
ItemB 3 20 60
select e.the_rank, e.item, e.tq, e.tp
from
(select d.item, d.tq, d.tp,
(select 1 + count(*)
from (select a.item, sum(a.qty) as tq from items a group by a.item) b
where b.tq > d.tq
) as the_rank
from
(select c.item, sum(c.qty) as tq, sum(c.total) as tp
from items c group by c.item
) d
) e
order by e.item, e.the_rank desc ;
the_rank item tq tp
1 ItemA 14 140
2 ItemB 5 100
3 ItemC 4 120
4 ItemD 1 4
sorry, had a mistake in the order by in my previous post
select * from items ;
item qty price total
ItemA 12 10 120
ItemB 2 20 40
ItemA 2 10 20
ItemC 4 30 120
ItemD 1 40 40
ItemB 3 20 60
select e.the_rank, e.item, e.tq, e.tp
from
(select d.item, d.tq, d.tp,
(select 1 + count(*)
from (select a.item, sum(a.qty) as tq from items a group by a.item) b
where b.tq > d.tq
) as the_rank
from
(select c.item, sum(c.qty) as tq, sum(c.total) as tp
from items c group by c.item
) d
) e
order by e.the_rank ;
the_rank item tq tp
1 ItemA 14 140
2 ItemB 5 100
3 ItemC 4 120
4 ItemD 1 4
I think that any query solution which involves re-querying the entire select set for each record to generate the rank value will have disastrous O(n^2) performance on any meaningful quantity of data. The OP says he has hundreds of thousands of records. In MySQL no less.
Perhaps you can create a temp table with an additional integer autoincrement field (typically used for the PK). Then SELECT INTO the temp table with the original GROUP … BY ORDER BY … Finally, SELECT from the temp using the autoincrement as the rank values.
Even if you have to insert an intermediate step to first group, then order by sum(qty), and finally read back with ranks, that’s one O(n) operation to group plus two O(# of distinct items) operations to sort & rank which will be vastly faster.
An interesting side question for the OP is how does the number of items compare to the number of sales records? Is it 100,000 items each sold once or twice, or 50 items each sold 5,000 times?
SQL is like APL. The fact you CAN torture it into doing it all in one SELECT (or all on one line) doesn’t mean you SHOULD.
I agree with you, LSLGuy. The query will be horribly inefficient for large quantities of data. I’m more used to Oracle, and in Oracle you could easily write it using an ordered inline view and the rownum pseudo-column. I looked in the MySQL help I have but I didn’t see anything about a rownum pseudo-column.
Oracle example
SQL> select * from items ;
ITEM QTY PRICE TOTAL
------------------------------ --------- --------- ---------
ItemA 12 10 120
ItemB 2 20 40
ItemA 2 10 20
ItemC 4 30 120
ItemD 1 40 40
ItemB 3 20 60
6 ligne(s) sélectionnée(s).
SQL> select rownum as the_rank, b.item, b.tq, b.tp
2 from
3 (select a.item, sum(a.qty) as tq, sum(a.total) as tp
4 from items a group by a.item order by 2 desc
5 ) b ;
THE_RANK ITEM TQ TP
--------- ------------------------------ --------- ---------
1 ItemA 14 140
2 ItemB 5 100
3 ItemC 4 120
4 ItemD 1 40
SQL>
I can do the sub-select easily in MySQL
select * from (SELECT item
, sum(qty)
, sum(total)
FROM items group by item
order by 2 desc) a
but getting the ranking was hard?!?! (for a newbie like me)
I did some googling and found this solution, using the declaration of a MySQL variable (something I have to read more about
select * from items ;
item qty price total
ItemA 12 10 120
ItemB 2 20 40
ItemA 2 10 20
ItemC 4 30 120
ItemD 1 40 40
ItemB 3 20 60
select
@rownum:=@rownum+1 the_rank,
t.*
from
(select @rownum:=0) r,
(select item, sum(qty) as tq, sum(total) as tp
from items
group by item
order by 2 desc
) t ;
the_rank item tq tp
1 ItemA 14 140
2 ItemB 5 100
3 ItemC 4 120
4 ItemD 1 40
Whoa! One last thing - I found it that the MySQL variables are incremented at display time, not at row retrieval time, making them more convenient (in this case) than the Oracle rownum pseudo-column. The query doesn’t need to be written with an inline view. This will work as well:
select
@rownum:=@rownum+1 the_rank,
t.item, sum(t.qty) as tq, sum(t.total) as tp
from
(select @rownum:=0) r,
items t
group by t.item
order by 3 desc ;
the_rank item tq tp
1 ItemA 14 140
2 ItemB 5 100
3 ItemC 4 120
4 ItemD 1 40
I just popped in to add the completely useless (to the OP) nugget that Microsoft SQL Server 2005 has a function that does this without needing any subqueries etc. So to all of you MySQL and Oracle users: boo-yah, suckers!
Yes, it even has an intuitive name. The guys in Redmond must be slipping. (Although that said, the query builder in SQL Server Management Studio says it’s invalid SQL and won’t display queries in design mode if they have that function, despite the fact that the query runs just fine. :smack: )
My company seems to be stuck on Oracle 7.3, and it doesn’t seem to work there. But I guess that’s from, like, the 19th century or something, so I guess that’s not too surprising.