Best 3 out of 5 query

I am currently teaching a class where I am expected to average the top three out of five grades. I use Access to calculate student grades, but I do know SQL as well.

Can anyone help me write an Access expression or SQL statement that can look at five distinct values for each student, and calculate an average for the three highest grades for each student out of those five grades?

I can handle this if all I have to do is drop the lowest grade:

=(SUM(Grades)-MIN(Grades)) / (COUNT(Grades) - 1)I can also use TOP to get the highest three grades in the Grades table, but not grouped by StudentID.

Any suggestions on other steps to try? I’ll even look at Excel formulas, if anyone knows of any that would do this.

Can you drop the lowest grade, take that cell out of consideration, and then drop the lowest grade of the ones left?

There is a function in Excel called SMALL(beginning cell:ending cell, m) that will give you the mth smallest value in your range. There is also a function called SUMIF(range again, condition, such as >SMALL(A2:A6,2), and the cells you want added.

So, say your grades are in columns A2:A5. A6 (or whatever) would be “=SUMIF(A2:A6, >=SMALL(A2:A6,2), A2:A6)”.

It’s hard to give specifics without knowing how your tables are laid out, but the TopValues function will perform do this easily.

I’m not sure that it will; we’re looking for the top three values per student; I thought I could do this by joining a couple of queries, but it has me stumped.

I know that this wasn’t in the ‘solution space’ you mentioned, but I would tend to think of solving this sort of problem with code, for instance, with a VBA macro. I came up with a quick database satisfying your description, and this procedure seems to work. You may have to customize it to your own application, of course.


Public Sub doCalcs()

    Dim conn As Connection, rst1 As New Recordset, rst2 As Recordset
    
    Set conn = CurrentProject.Connection
    
    rst1.Open "students", conn, adOpenStatic, adLockPessimistic
    
    Dim studentID As Long, totalscores As Long, k As Integer
    
    
    Do While Not rst1.EOF
        totalscores = 0
    
        studentID = rst1("studentID")
    
        Set rst2 = conn.Execute("select score from scores where studentFK = " & studentID & " order by score desc")
        
        k = 0
        Do While k < 3
        
            totalscores = totalscores + rst2("score")
            rst2.MoveNext
            k = k + 1
        
        Loop
    
        rst2.Close
        Set rst2 = Nothing
    
        rst1("result") = totalscores / 3#
        
        rst1.MoveNext
        
    
    Loop

    rst1.Close
    Set rst1 = Nothing
    
    MsgBox "Done!"
End Sub


Very generally, you’ve got two nested data loops there - one to consider each student, and the second to read the three highest scores from each student. As written, it’ll crash if any student doesn’t have at least three scores on file, and it writes its results into my ‘students’ table.

Hope it helps.

Yeah, this is a good one. As long as you’re not too concerned about performance, you might be able to get something by creating a query for each student’s grades, invoking TopValues on that, and then unioning the results together. There’s got to be a better way, though.

I still can’t think of a way to do this without some kind of iteration in code, but I’m just posting to mention my doubt about the idea of using unions; I have a feeling Access imposes some limit on the number of unions possible in a single query. Also, creating a top values query for each student would have to be done dynamically; to cope with new students in the list

Here’s one:

Since you already know how to compute the average minus the very bottom score, let’s use that. Do a query that will return each student’s lowest grade, and then do a query for all the records in the original table that aren’t in there. That’ll give you the top 4 grades, and you already know how to handle that situation.

Use Excel. List student IDs in column A and the five test scores in B-F. In column G do MIN to get the lowest. Then in the next five columns put something like=IF($G1=B1,“NADA”,B1). THat recreates your set of tests but with the lowest taken out–replaced with text.

Now when you repeat the process, you’ll take out the next lowest leaving you with the three highest umbers and the lowest two replaced as text.

In the next column do your averaging.

Once it’s set up hide all of the intermediary cells and it will be neat.

Pretty sure this will work though I only tried bits of it.

Ooops–repeated scores might throw a wrench in that–would maybe need some more columns–let me think some more–

but Excel HAS to be the simplest way–unless you’ve got > 64,000 students :slight_smile: JMHO

OK, this works:


SELECT studentname, grade FROM students

INNER JOIN grades ON students.studentID = grades.studentID

WHERE grade IN (SELECT TOP 3 grade FROM grades WHERE grades.studentID = students.studentID ORDER BY grade DESC)

ORDER BY studentname ASC, grade DESC

The results consist of (up to) three records per student showing the top three grades for that student; a summary query can be run using this query’s results as a source, averaging them.

Upon further review–Flipshod slinks back whence he came embarrassed. But thanks for the morning puzzler :slight_smile:

And the ANSI way to do this, without a TOP function, goes something like this: [Table T has columns STUDENT_ID and GRADE]



select student_id, avg(grade) 
from t t1 
where 2 >= (select count(*) - 1 
            from t t2 
            where t1.grade <= t2.grade and 
                       t1.student_id = t2.student_id 
            group by t1.student_id) 
group by student_id


I think that works. Could be wrong tho, I didn’t thoroughy test it, not having access to Access. It’s based on a Joe Celko “SQL for Smarties” solution.

Okay, I’d like to post a slight hijack. Both arsenal’s posted code and mangetout’s example involve a subquery which refer to tables that are not listed in the subquery. (t1 and students respectively.)

I like to think of myself as a SQL smarty, but I’ve never really been comfortable with this technique, or scenario, or whatever you call it. Is there a term for it? What are the rules for when you can do this and refer to tables in your main query and when you can’t? How should you plan queries like this, how do you ‘plan them out’ in your head, and when should you use them??

Also, does it confuse anyone else that in mangetout’s example, the subquery refers to one table (students) that is only referenced outside the sub, but also to a table (grades) that is referenced inside AND outside the sub, without differentiating between them?? (Can I assume that the inside reference ‘shadows’ the outside reference, making it impossible for the subquery to use the outside grades table unless there is an alias used?)

I really like subqueries, but am only really comfortable with the ones that could be run independently – which I’m pretty sure you can’t do these. The ones I use are easy to picture mentally… you run the query once, generate a logical ‘table’ from the results, and you can then join other tables to that logical table. Or you can seek through the table with an ‘in’ expression, if there’s only one column in it.

Since these other subqueries have a where condition using an outside table field, does that mean that the subquery gets run seperately for every outside query row?? (Because there’s a field value for each row.)

Sorry for rambling on and on so much, but I’d appreciate a little further info on this kind of technique please.

Google on Correlated Subquery I’m off for a run in a minute. Back tomorrow if there hasn’t been a dogpile on this by then :slight_smile:

By the way… if one student has a ‘tie for third’ among his scores, (say, 90, 89, 88, 88, 70) this query will return more than three records for the student. Summary can catch that, if it’s well designed.

I actually have an extraneous ‘group by’ in my previous snippet. It doesn’t affect the outcome but it’s clearer without it. It really should be:



select student_id, avg(grade)
from t t1
where 2 >= (select count(*) - 1
            from t t2
            where t1.grade <= t2.grade and
                  t1.student_id = t2.student_id
            )
group by student_id;


Essentially, yes. This may sound like it would perform poorly, and sometimes it does. However with proper indexes and a database with a decent query optimizer, it’s usually perfectly acceptable. Under the hood it’s typically achieved through a nested loop operation (as opposed to, say, a merge or hash join).