MySQL SELECT ranking question

FlyingCowOfDoom · November 14, 2007, 4:48pm

I have a table that is a list of names of items sold, the quantity, and price.


Item	Qty	Price	Total
ItemA	12	10	120
ItemB	2	20	40
ItemA	2	10	20
ItemC	4	30	120
ItemD	1	40	40
ItemB	3	20	60

What I am trying to do is the get the results of SELECT item, sum(qty) as tq, sum(total) as tp FROM items GROUP BY item ORDER BY tq DESC with rankings, so that the results look like this


Rank	Item	tq	tp
1	ItemA	14	140
2	ItemB	5	100
3	ItemC	4	120
4	ItemD	1	40

Is this even possible with one query? I can’t figure out how to do it. TIA!

–FCOD

Rhythmdvl · November 14, 2007, 5:06pm

Not that I can SQL my way out of a paper bag, but why are you trying to do this in one query? My first (albeit typically clumsy) instinct would be to base the query you’re looking for on a summation query one step earlier.

FlyingCowOfDoom · November 14, 2007, 5:33pm

The actual table on which this will run is hundreds of thousands of records, and I’d like it to run as quickly as possible.

–FCOD

Turek · November 14, 2007, 5:45pm

Can you do

ORDER BY SUM(qty)?

chrisk · November 14, 2007, 5:56pm

Yeah, I’d go that way. Not sure if there’s any way to get the actual ranking values in there - I’d probably do that with a loop counter in whatever program is calling mysql and outputting the results. sql by its nature isn’t the right tool for every job.

Turek · November 14, 2007, 6:09pm

I can get the rankings in there, but it’s in MS SQL Server and it uses a temp table.

Malacandra · November 14, 2007, 6:55pm

Inline view? SELECT… FROM (SELECT…) AS MYVIEW, then your inner SELECT is queried just as if it were a view or table in its own right - put your GROUP BY in there - and your outer SELECT can ORDER BY all it likes. Efficiency-wise, it should run just fine bearing in mind the necessary overheads (sorting and so on) and it’s for sure DB2 will run faster than your own program code.

ultrafilter · November 14, 2007, 7:43pm

Create a view V that has everything but the rank, and something like this should work:


SELECT count(SELECT Item FROM V as V2 WHERE V2.tq > V1.tq) + 1 as rank, Item, tq, tp FROM V as V1

So for each item in V, you’re querying V for all the items whose total quantity sold is less than the current item’s, counting them, and adding one. If you want to do this without an intermediate query, you can just replace V with the SQL used to define it everywhere it shows up.

On the other hand, if you want to do this in some imperative language, all you have to do is initialize a counter to 1 and walk over the records in order. For each record, you add a field “Rank” with the current value of the counter and add 1. That’s almost certainly going to be faster, unless the query optimizer is way smarter than I think it is.

Arnold_Winkelried · November 14, 2007, 11:08pm

Building on what Malacandra and ultrafilter have said, you can do it using an inline view, without creating a separate view:



select * from items ;
item    qty   price  total
ItemA    12      10    120
ItemB     2      20     40
ItemA     2      10     20
ItemC     4      30    120
ItemD     1      40     40
ItemB     3      20     60

select e.the_rank, e.item, e.tq, e.tp
 from
  (select d.item, d.tq, d.tp,
      (select 1 + count(*)
        from (select a.item, sum(a.qty) as tq from items a group by a.item) b
        where b.tq > d.tq
      ) as the_rank
    from
      (select c.item, sum(c.qty) as tq, sum(c.total) as tp
        from items c group by c.item
      ) d
  ) e
 order by e.item, e.the_rank desc ;
the_rank   item   tq   tp
   1       ItemA  14  140
   2       ItemB   5  100
   3       ItemC   4  120
   4       ItemD   1   4

Arnold_Winkelried · November 15, 2007, 1:10am

sorry, had a mistake in the order by in my previous post



select * from items ;
item    qty   price  total
ItemA    12      10    120
ItemB     2      20     40
ItemA     2      10     20
ItemC     4      30    120
ItemD     1      40     40
ItemB     3      20     60

select e.the_rank, e.item, e.tq, e.tp
 from
  (select d.item, d.tq, d.tp,
      (select 1 + count(*)
        from (select a.item, sum(a.qty) as tq from items a group by a.item) b
        where b.tq > d.tq
      ) as the_rank
    from
      (select c.item, sum(c.qty) as tq, sum(c.total) as tp
        from items c group by c.item
      ) d
  ) e
 order by e.the_rank ;
the_rank   item   tq   tp
   1       ItemA  14  140
   2       ItemB   5  100
   3       ItemC   4  120
   4       ItemD   1   4

FlyingCowOfDoom · November 15, 2007, 1:21am

Thanks everyone!

–FCOD

LSLGuy · November 15, 2007, 12:46pm

I think that any query solution which involves re-querying the entire select set for each record to generate the rank value will have disastrous O(n^2) performance on any meaningful quantity of data. The OP says he has hundreds of thousands of records. In MySQL no less.

Perhaps you can create a temp table with an additional integer autoincrement field (typically used for the PK). Then SELECT INTO the temp table with the original GROUP … BY ORDER BY … Finally, SELECT from the temp using the autoincrement as the rank values.

Even if you have to insert an intermediate step to first group, then order by sum(qty), and finally read back with ranks, that’s one O(n) operation to group plus two O(# of distinct items) operations to sort & rank which will be vastly faster.

An interesting side question for the OP is how does the number of items compare to the number of sales records? Is it 100,000 items each sold once or twice, or 50 items each sold 5,000 times?
SQL is like APL. The fact you CAN torture it into doing it all in one SELECT (or all on one line) doesn’t mean you SHOULD.

Arnold_Winkelried · November 15, 2007, 7:28pm

I agree with you, LSLGuy. The query will be horribly inefficient for large quantities of data. I’m more used to Oracle, and in Oracle you could easily write it using an ordered inline view and the rownum pseudo-column. I looked in the MySQL help I have but I didn’t see anything about a rownum pseudo-column.

Oracle example



SQL> select * from items ;

ITEM                                 QTY     PRICE     TOTAL
------------------------------ --------- --------- ---------
ItemA                                 12        10       120
ItemB                                  2        20        40
ItemA                                  2        10        20
ItemC                                  4        30       120
ItemD                                  1        40        40
ItemB                                  3        20        60

6 ligne(s) sélectionnée(s).

SQL> select rownum as the_rank, b.item, b.tq, b.tp
  2   from
  3    (select a.item, sum(a.qty) as tq, sum(a.total) as tp
  4          from items a group by a.item order by 2 desc
  5    ) b ;

 THE_RANK ITEM                                  TQ        TP
--------- ------------------------------ --------- ---------
        1 ItemA                                 14       140
        2 ItemB                                  5       100
        3 ItemC                                  4       120
        4 ItemD                                  1        40

SQL>

I can do the sub-select easily in MySQL


select * from (SELECT  item
	, sum(qty)
	, sum(total)
FROM items group by item
order by 2 desc) a

but getting the ranking was hard?!?! (for a newbie like me)

I did some googling and found this solution, using the declaration of a MySQL variable (something I have to read more about



select * from items ;
item    qty   price  total
ItemA    12      10    120
ItemB     2      20     40
ItemA     2      10     20
ItemC     4      30    120
ItemD     1      40     40
ItemB     3      20     60

select
   @rownum:=@rownum+1 the_rank,
   t.*
 from
   (select @rownum:=0) r,
   (select  item, sum(qty) as tq, sum(total) as tp
     from items
     group by item
     order by 2 desc
	 ) t ;
the_rank   item   tq    tp
   1       ItemA  14   140
   2       ItemB   5   100
   3       ItemC   4   120
   4       ItemD   1    40

Arnold_Winkelried · November 15, 2007, 7:35pm

Whoa! One last thing - I found it that the MySQL variables are incremented at display time, not at row retrieval time, making them more convenient (in this case) than the Oracle rownum pseudo-column. The query doesn’t need to be written with an inline view. This will work as well:



select
   @rownum:=@rownum+1 the_rank,
   t.item, sum(t.qty) as tq, sum(t.total) as tp
 from
   (select @rownum:=0) r,
   items t
 group by t.item
 order by 3 desc ;

the_rank   item   tq    tp
   1       ItemA  14   140
   2       ItemB   5   100
   3       ItemC   4   120
   4       ItemD   1    40

SCSimmons · November 17, 2007, 3:32am

I just popped in to add the completely useless (to the OP) nugget that Microsoft SQL Server 2005 has a function that does this without needing any subqueries etc. So to all of you MySQL and Oracle users: boo-yah, suckers!

LSLGuy · November 17, 2007, 2:27pm

And the name of that function would be???

SCSimmons · November 20, 2007, 6:10am

RANK()

Yes, it even has an intuitive name. The guys in Redmond must be slipping. (Although that said, the query builder in SQL Server Management Studio says it’s invalid SQL and won’t display queries in design mode if they have that function, despite the fact that the query runs just fine. :smack: )

K364 · November 20, 2007, 2:17pm

boo-yah to you…

Oracle has had a rich set of SQL extensions for many years, including RANK. Google “Oracle analytic functions” to have your eyes opened!

SCSimmons · November 20, 2007, 4:34pm

My company seems to be stuck on Oracle 7.3, and it doesn’t seem to work there. But I guess that’s from, like, the 19th century or something, so I guess that’s not too surprising.

Topic		Replies	Views
Simple SQL question Factual Questions	7	760	January 8, 2001
Can anyone reccomend a fast-response SQL forum. And can anyone help in the meantime? Factual Questions	3	681	March 2, 2008
Help Me Please!! SQL Query, Desperatly. Factual Questions	4	816	June 17, 2001
microsoft access question Factual Questions	3	606	August 23, 2000
I need some SQL query help, please Factual Questions	12	1138	March 20, 2009

MySQL SELECT ranking question

Related topics