So I figured I’d ask a ‘human’ question and someone can attempt to convert it into SQL.
but before I do that… What is the technical name (if there is one) for a query that combines a sum from grouped data with a first and/or last entry of a specific field for that data?
For the sake of argument I have a table with account number, balance forward, and amount.
I want the SUM of the amount field, and the FIRST and LAST value of balance forward field for each account.
so my ‘english’ question is this - “give me the first value for balance forward, the last value for balance forward, a sum of amount, where the date is yesterday, for each account”
The annoying thing is the above should be a straight-forward (and commonly required) thing to do… but the SQL is proving hard for me to work out.
So the top would be the balance at the beginning of the day, and the bottom would be the balance at the end of the day.
for one day (yesterday) for all accounts (so it’s grouped… so I end up with three values for each account - a start bal, an end bal, and a sum of the amount)
SELECT AccountNumber,
(SELECT TOP 1 BalanceForward FROM table1 WHERE AccountNumber = T.AccountNumber AND DateEntered >= GetDate() ORDER BY DateEntered ASC) AS First,
(SELECT TOP 1 BalanceForward FROM table1 WHERE AccountNumber = T.AccountNumber AND DateEntered >= GetDate() ORDER BY DateEntered DESC) AS Last,
SUM(Amount) As SumAmount
FROM table1 T
WHERE DateEntered >= GetDate()
GROUP BY AccountNumber
You’ll have to jiggle with the date stuff to get yesterday’s data and it depends on your your particular database server.
A properly designed correlated subquery shouldn’t need to resort to DB-specific TOP/LIMIT/ROWNUM tricks, as long as the data has decent recognizable ordering, such as a datetime or sequence number on each row.
SELECT accountnumber, SUM(amount),
(SELECT balance FROM thetable
WHERE timestamp = (SELECT MIN(timestamp)
FROM thetable
WHERE accountnumber = T.account_number)) AS balance_start,
(SELECT balance FROM thetable
WHERE timestamp = (SELECT MAX(timestamp)
FROM thetable
WHERE accountnumber = T.account_number)) AS balance_end
FROM thetable T
GROUP BY accountnumber
I apologize for that earlier link; I didn’t read it carefully. The author of that guide used the TOP device in their examples, which is fine for T-SQL, but that might not be the DB-flavor you’re using. Manduck’s example should work on any SQL-92-compliant database.
Am I missing something? If the rows are guaranteed to be in the correct order, it seems like this should do the trick:
SELECT AccountNumber, First(BalanceForward) AS Start, Last(BalanceForward) AS End, Sum(Amount) AS Total
FROM Table
WHERE TimeStamp = Date()-1
GROUP BY AccountNumber
Those functions aren’t Microsoft-specific, are they? (I know that Date() is, but there’s sure to be something like it in any language.)
If the rows aren’t always in the right order, it’s slightly more complicated, but I’d try to put a single subquery in the FROM clause, not two in the SELECT clause.
SELECT accountnumber, SUM(amount),
(SELECT balance FROM thetable
WHERE timestamp = (SELECT MIN(timestamp)
FROM thetable
WHERE accountnumber = T.account_number)) AS balance_start,
(SELECT balance FROM thetable
WHERE timestamp = (SELECT MAX(timestamp)
FROM thetable
WHERE accountnumber = T.account_number)) AS balance_end
FROM thetable T
GROUP BY accountnumber
I don’t believe that takes yesterday into account. I’t will find the life of the account.
SELECT accountnumber, SUM(amount),
(SELECT balance FROM thetable
WHERE timestamp = (SELECT MIN(timestamp)
FROM thetable
WHERE accountnumber = T.account_number
AND timestamp like (to_date("DD-MON-YYYY"), (SYSDATE -1))
)) AS balance_start,
(SELECT balance FROM thetable
WHERE timestamp = (SELECT MAX(timestamp)
FROM thetable
WHERE accountnumber = T.account_number
AND timestamp like (to_date("DD-MON-YYYY"), (SYSDATE -1))
)) AS balance_end
FROM thetable T
GROUP BY accountnumber
SELECT accountnumber, SUM(amount),
(SELECT balance FROM thetable
WHERE timestamp = (SELECT MIN(timestamp)
FROM thetable
WHERE accountnumber = T.account_number
AND timestamp like (to_date("DD-MON-YYYY"), (SYSDATE -1))
)) AS balance_start,
(SELECT balance FROM thetable
WHERE timestamp = (SELECT MAX(timestamp)
FROM thetable
WHERE accountnumber = T.account_number
AND timestamp like (to_date("DD-MON-YYYY"), (SYSDATE -1))
)) AS balance_end
FROM thetable T
WHERE timestamp like (to_date("DD-MON-YYYY"), (SYSDATE -1))
GROUP BY accountnumber
First() and Last() sound like functions specific to some particular DB-flavor, and shouldn’t be relied upon. The OP never specified what DB they were using, unless I missed it?
The Date()-1 construct is also DB-specific. I hate translating date functions between DB flavors; they aren’t covered in the SQL-92 standard, so everybody does them differently.
One should never assume the rows are in a specific order in the database, this is why we have key and index fields.
Because the OP’s requirement is to join on 2 different rows, I don’t think there’s any way around doing 2 different subselects to get the data they want.
Wolfman’s altered example also relies on DB-specific date functions. Lobsang will have to look up the correct syntax for their DB, and also do whatever tricks are necessary to split out the day component of the datetime to cover one specific date.
Yeah, I took it as a given that anything date-related would need to be translated, and have conversion functions added.
Here’s my single-subquery version, without First, Last, or the assumption of record-order. It does assume that each TimeStamp is unique.
SELECT minmax.AccountNumber, small.BalanceForward AS Start, big.BalanceForward AS End, minmax.Total
FROM Table AS small
INNER JOIN (
[SELECT AccountNumber, Min(TimeStamp) AS FirstTime, Max(TimeStamp) AS LastTime, Sum(Amount) AS Total
FROM Table
WHERE TimeStamp = Date()-1
GROUP BY AccountNumber]. AS minmax
INNER JOIN Table AS big ON minmax.AccountNumber = big.AccountNumber AND minmax.LastTime = big.TimeStamp
) ON small.AccountNumber = minmax.AccountNumber AND small.TimeStamp = minmax.FirstTime