I am currently teaching a class where I am expected to average the top three out of five grades. I use Access to calculate student grades, but I do know SQL as well.
Can anyone help me write an Access expression or SQL statement that can look at five distinct values for each student, and calculate an average for the three highest grades for each student out of those five grades?
I can handle this if all I have to do is drop the lowest grade:
=(SUM(Grades)-MIN(Grades)) / (COUNT(Grades) - 1)I can also use TOP to get the highest three grades in the Grades table, but not grouped by StudentID.
Any suggestions on other steps to try? I’ll even look at Excel formulas, if anyone knows of any that would do this.
Can you drop the lowest grade, take that cell out of consideration, and then drop the lowest grade of the ones left?
There is a function in Excel called SMALL(beginning cell:ending cell, m) that will give you the mth smallest value in your range. There is also a function called SUMIF(range again, condition, such as >SMALL(A2:A6,2), and the cells you want added.
So, say your grades are in columns A2:A5. A6 (or whatever) would be “=SUMIF(A2:A6, >=SMALL(A2:A6,2), A2:A6)”.
I’m not sure that it will; we’re looking for the top three values per student; I thought I could do this by joining a couple of queries, but it has me stumped.
I know that this wasn’t in the ‘solution space’ you mentioned, but I would tend to think of solving this sort of problem with code, for instance, with a VBA macro. I came up with a quick database satisfying your description, and this procedure seems to work. You may have to customize it to your own application, of course.
Public Sub doCalcs()
Dim conn As Connection, rst1 As New Recordset, rst2 As Recordset
Set conn = CurrentProject.Connection
rst1.Open "students", conn, adOpenStatic, adLockPessimistic
Dim studentID As Long, totalscores As Long, k As Integer
Do While Not rst1.EOF
totalscores = 0
studentID = rst1("studentID")
Set rst2 = conn.Execute("select score from scores where studentFK = " & studentID & " order by score desc")
k = 0
Do While k < 3
totalscores = totalscores + rst2("score")
rst2.MoveNext
k = k + 1
Loop
rst2.Close
Set rst2 = Nothing
rst1("result") = totalscores / 3#
rst1.MoveNext
Loop
rst1.Close
Set rst1 = Nothing
MsgBox "Done!"
End Sub
Very generally, you’ve got two nested data loops there - one to consider each student, and the second to read the three highest scores from each student. As written, it’ll crash if any student doesn’t have at least three scores on file, and it writes its results into my ‘students’ table.
Yeah, this is a good one. As long as you’re not too concerned about performance, you might be able to get something by creating a query for each student’s grades, invoking TopValues on that, and then unioning the results together. There’s got to be a better way, though.
I still can’t think of a way to do this without some kind of iteration in code, but I’m just posting to mention my doubt about the idea of using unions; I have a feeling Access imposes some limit on the number of unions possible in a single query. Also, creating a top values query for each student would have to be done dynamically; to cope with new students in the list
Since you already know how to compute the average minus the very bottom score, let’s use that. Do a query that will return each student’s lowest grade, and then do a query for all the records in the original table that aren’t in there. That’ll give you the top 4 grades, and you already know how to handle that situation.
Use Excel. List student IDs in column A and the five test scores in B-F. In column G do MIN to get the lowest. Then in the next five columns put something like=IF($G1=B1,“NADA”,B1). THat recreates your set of tests but with the lowest taken out–replaced with text.
Now when you repeat the process, you’ll take out the next lowest leaving you with the three highest umbers and the lowest two replaced as text.
In the next column do your averaging.
Once it’s set up hide all of the intermediary cells and it will be neat.
Pretty sure this will work though I only tried bits of it.
SELECT studentname, grade FROM students
INNER JOIN grades ON students.studentID = grades.studentID
WHERE grade IN (SELECT TOP 3 grade FROM grades WHERE grades.studentID = students.studentID ORDER BY grade DESC)
ORDER BY studentname ASC, grade DESC
The results consist of (up to) three records per student showing the top three grades for that student; a summary query can be run using this query’s results as a source, averaging them.
And the ANSI way to do this, without a TOP function, goes something like this: [Table T has columns STUDENT_ID and GRADE]
select student_id, avg(grade)
from t t1
where 2 >= (select count(*) - 1
from t t2
where t1.grade <= t2.grade and
t1.student_id = t2.student_id
group by t1.student_id)
group by student_id
I think that works. Could be wrong tho, I didn’t thoroughy test it, not having access to Access. It’s based on a Joe Celko “SQL for Smarties” solution.
Okay, I’d like to post a slight hijack. Both arsenal’s posted code and mangetout’s example involve a subquery which refer to tables that are not listed in the subquery. (t1 and students respectively.)
I like to think of myself as a SQL smarty, but I’ve never really been comfortable with this technique, or scenario, or whatever you call it. Is there a term for it? What are the rules for when you can do this and refer to tables in your main query and when you can’t? How should you plan queries like this, how do you ‘plan them out’ in your head, and when should you use them??
Also, does it confuse anyone else that in mangetout’s example, the subquery refers to one table (students) that is only referenced outside the sub, but also to a table (grades) that is referenced inside AND outside the sub, without differentiating between them?? (Can I assume that the inside reference ‘shadows’ the outside reference, making it impossible for the subquery to use the outside grades table unless there is an alias used?)
I really like subqueries, but am only really comfortable with the ones that could be run independently – which I’m pretty sure you can’t do these. The ones I use are easy to picture mentally… you run the query once, generate a logical ‘table’ from the results, and you can then join other tables to that logical table. Or you can seek through the table with an ‘in’ expression, if there’s only one column in it.
Since these other subqueries have a where condition using an outside table field, does that mean that the subquery gets run seperately for every outside query row?? (Because there’s a field value for each row.)
Sorry for rambling on and on so much, but I’d appreciate a little further info on this kind of technique please.
By the way… if one student has a ‘tie for third’ among his scores, (say, 90, 89, 88, 88, 70) this query will return more than three records for the student. Summary can catch that, if it’s well designed.
I actually have an extraneous ‘group by’ in my previous snippet. It doesn’t affect the outcome but it’s clearer without it. It really should be:
select student_id, avg(grade)
from t t1
where 2 >= (select count(*) - 1
from t t2
where t1.grade <= t2.grade and
t1.student_id = t2.student_id
)
group by student_id;
Essentially, yes. This may sound like it would perform poorly, and sometimes it does. However with proper indexes and a database with a decent query optimizer, it’s usually perfectly acceptable. Under the hood it’s typically achieved through a nested loop operation (as opposed to, say, a merge or hash join).