Math geeks - need help

redtail23 · January 26, 2012, 7:21pm

Is there a way to calculate the top n% of a weighted distribution (if that’s the right term)? I’ve found a formula for calc’ing a weighted average, but that’s not what I need.

I have a set of values (number of times X was done, grouped by X) and need to calculate the top n% of that, by the count rather than grouping.



Cherries   	15
Bananas   	10
Cantelope	7
Oranges   	6
Apples      	5
Kiwifruit    	3
Persimmons	3
Grapes     	2
Mangos    	1
Papayas   	1
Pineapples	1
Plums       	1
Apricots    	1
Blueberries	1
Raspberries	1
Blackberries	1

So I’ve got a total of 59 pieces of fruit. I need to know the list of fruits in the top 30% by the number of pieces.

59 * .3 = 17.7, so that’d be 18 pieces of fruit. That would be 15 Cherries and 3 Bananas.

Is there a way to calculate this rather than just going through the sorted list and adding until you get to the right number?

Thanks!

P.S., no it’s not a homework assignment, I’ve actually got a much larger dataset I’m trying to crunch for work, but I’m no statistician, just a computer nerd.

standingwave · January 26, 2012, 7:37pm

Since you’re talking about a spreadsheet, I create a column that shows the sum from the top to that row.

=SUM(A$1:A6)

With the $ symbol pinning the first element, you can copy it for every element in the column. Then it’s simple to calculate the percentage (or to do it all in one step).

Xema · January 26, 2012, 7:52pm

It will be challenging to come up with a calculation method that avoids summing the counts.

redtail23 · January 26, 2012, 7:53pm

Yeah, it’s not in a spreadsheet, that would make it too easy.

It’s a database, with multiple tables that have to be put back together after finding my top list.

I can get there calculating RBAR like a spreadsheet, I was just hoping for a better way since I’ll have to run this multiple times for different breakdowns and datasets.

Thanks, though!

nivlac · January 26, 2012, 7:54pm

If you’re going to end up ranking the fruits by highest frequency there’s no way you can avoid a sort or some kind of search to find the highest frequency fruits. What’s required is not a math formula but an algorithm and it’s perfectly described by the OP.

redtail23 · January 26, 2012, 7:58pm

That’s what I was afraid of.

I can do that, I was just hoping that mathier people than me had a better idea.

But you did make me think of another way to search and it looks like that may do it. I needed to be looking for solutions to create set-based running sums.

Thanks!

redtail23 · January 26, 2012, 8:01pm

Yeah, sorting them into the proper order is easy, it was how to keep from adding each row and then comparing that to the grand total row-by-row that was hard.

SUM OVER will do it - you can compare partitioned sums to the total sum.

I’ve gotta get better at remembering the aggregations available.

Thanks for jogging the old brain into a new slot!

Punoqllads · January 26, 2012, 9:29pm

If the flavor of SQL you’re using supports analytic functions, you can do something like:



select fruit_name,
  count(*) as fruit_count,
  sum(fruit_count) over ( ) as total_count,
  sum(fruit_count) over
    (
      order by fruit_count
      rows between unbounded preceding and current row
    ) as rolling_count
from fruit_order_table
group by fruit_name
having rolling_count / total_count > 0.7;

to get the top 30%.

redtail23 · February 1, 2012, 10:57pm

Just to close out, I did finally get it.

I’m in MS-SQL. I think the Analysis Services might do that analytical stuff, but I haven’t gotten into it yet.

I ended up with a multi-part query to get the different pieces.

First one breaks out the sections, each with a list of items, count of those items, grand total for the section, and ranked in descending order by count. The second one calcs the percentage of grand total for each item. Third one calcs running totals on the item count and percentage. Final part selects all rows for each section with running total less than my goal.

Runs pretty well, and it will be fairly easy to change up the breakdowns as needed.

Thanks for all the suggestions!

Topic		Replies	Views
Mathematics: what would an unweighted average of a list of percentages measure? Need answer fast! Factual Questions	18	7003	September 22, 2011
Questions about making a ranking from submitted lists Factual Questions	11	969	January 8, 2020
MySQL SELECT ranking question Factual Questions	18	1401	November 20, 2007
Getting someone to write a program for me? Factual Questions	18	2104	March 25, 2009
I need some SQL query help, please Factual Questions	12	1144	March 20, 2009

Math geeks - need help

Related topics