SQL experts please help with a monster query.

given one table which is a transaction file containing bets, positive and negative adjustments a ‘transaction type’ field, and an account number

and another table containing account numbers and a field used to filter (call it filter… for the purposes of this question it doesn’t matter what it is a filter of)

I want to build a table where column one is an account number, column 2 is a sum of bets (where transaction type is ‘bet’) column 3 is the first transaction of type ‘deposit’ and everything is filtered with ‘yes’.

the problem is it is combining a sum of records with a single record (the first deposit) for the same account number.
I built a query that uses TWO sub queries (one that gets the total and one that gets the first deposit) but this takes half a minute to run on only one account number (there is at the moment only one that satisfies the filter argument.
When ten account numbers satisfy the filter argument it is going to take ten times as long to run the query, and so on.
I’m sorry to not be very specific in the main OP. I will try to be more specific as I respond to replies but at the moment I’m desperate for ideas and my brain is whirring away looking for alternatives.
The whole thing is gong to be run in asp code using recordset… another question is: could I use a recordset inside a new sql query to build a second record set?

Another question… possibly an alternative to the above…
is it possible to temporarily save the results of a query so they can be used more than once in another query without having to be RUN more than once.
I built a table of ALL records that satisfy the filter

then I want to use that table in two subqueries such as …

“select top 1 amount from query1 where transaction_type = ‘deposit’”

and

“select sum(amount) from query1 where transaction_type = ‘bet’”

thus running both queries on a much much smaller data set and so speeding them up.

Well, because you need data that is both at the transaction level as well as an aggregate level, you will pretty much have to use multiple/nested queries in some fashion.

Of course, that’s what stored procedures and temp tables are for. What database engine are you using?

(Also, don’t forget to make sure the index on the account field in the transaction table is built correctly. In general, taking a look at the optimizer plan is important here.)

I think it’s MSSQL (I’m using asp code to access the data)
If I have a stored procedure called ‘filter’ and I made a new record set in the code …
set rs1 = conn.execute(‘filter’)

how would I use that in a new sql query???
sqlstring = “select * from rs1”

set rs2 = conn.execute(sqlstring)

of course the above is just what I want to be able to do. How can I put the recordset ‘rs1’ in the sql query text???

I’ve never worked with MSSQL, but if you design the join on the two subqueries correctly, the runtime won’t noticably increase for increasing the number of accounts that pass your filter. Does it have some kind of equivalent to Oracle’s EXPLAIN PLAN?

Something like:



select q1.account, q1.bets, q2.first_deposit
from
( select table1.account, sum(table1.transaction) as bets
  from table1, table2
  where table1.account = table2.account
    and table1.transaction_type = 'bet'
    and table2.filter_column = 'yes'
  group by table1.account ) q1
full outer join
( select table1.account, table1.transaction as first_deposit
  from table1,
  ( select table1.account, min(table1.transaction_time) as transaction_time
    from table1, table2
    where table1.account = table2.account
      and table1.transaction_type = 'bet'
      and table2.filter_column = 'yes'
    group by table1.account ) sq1
  where table1.account = sq1.account
    and table1.transaction_time = table2.transaction_time ) q2
on q1.account = q2.account


You should put the filtering inside of each subquery, I think, since the number of transactions should be far larger than the number of valid accounts. The join is a full outer join since it’s theoretically possible to have bad data where a given account has no deposits or no bets. Make it an inner join if you’d rather just silently ignore those accounts.

I don’t know what explain plan is. (I’ll google it)

I guess there’s no harm in reveailing my table and field names here. I built your query using my names. I removed the “and table1.transaction_time = table2.transaction_time” bit as it didn’t seem to fit and my table 2 (clients) doesn’t have such a field.

I also added the distinct keyword as it seemed to be duplicating results for the one test account I have so far with real data.

If it does take a similar amount of time for a larger set of accounts then the problem is solved and I will gladly kiss your feet and bring you things


select distinct q1.accountnumber, q1.handle, q2.firstdeposit from

(select amtoteaccountactivity.accountnumber, 
 sum(cast(total_bet_amount as real) - cast(refund_amt as real)) as handle
 from amtoteaccountactivity,clients
 where amtoteaccountactivity.accountnumber = clients.accountnumber
 and amtoteaccountactivity.transaction_type = 'Bet'
 and clients.bonuscredited='no'
group by amtoteaccountactivity.accountnumber) q1

inner join

(select amtoteaccountactivity.accountnumber, amtoteaccountactivity.amount as firstdeposit
from amtoteaccountactivity,(select amtoteaccountactivity.accountnumber, 
min(amtoteaccountactivity._date) as mindate
from amtoteaccountactivity,clients
where amtoteaccountactivity.accountnumber = clients.accountnumber
and amtoteaccountactivity.transaction_type in ('Visa','Neteller','Master Car')
and clients.bonuscredited = 'no'
group by amtoteaccountactivity.accountnumber) sq1
where amtoteaccountactivity.accountnumber = sq1.accountnumber) q2
on q1.accountnumber = q2.accountnumber

It doesn’t look like you have a lot of access to the database itself. Can’t you get help from whoever does the database administration? It can be … helpful. With building stored procedures as well as the optimization of queries (which is the purpose of explain plan – in the MSSQL Query Analyzer client, it’s the option “Show Execution Plan”).

I don’t want to speak for punoqllads but I’m pretty sure you’re going to need that. It’s how you get the transaction that is the earliest deposit transaction. Except that I think it should be table1.transaction_time = sq1.transaction_time.

Some unsolicited advice - when you are getting what appears to be duplicate data and you’re not expecting it, it’s often worth it to find out why rather than just slapping a distinct in there. In this case I suspect it’s because you dropped that transaction_time condition.

If you were to do this with a stored procedure, you would do all the work server side in the procedure, rather than executing a client-side query and then another based on the previous results. That is, the SP would do something like

create procedure the_proc as

begin

select into #the_accounts – this is the temp table that holds the accts you want

  • from account
    where <apply whatever filter conditions>

select *
from transaction t1, #the_accounts t2
where t1.accountid = t2.accountid

<etc>

end

I have attempted a simulation of what I think your scenario is. This is a bit of a hack with a misuse of the MAX() function, but how is this lot?


create table bets ( trtype VARCHAR2(20), acctno NUMBER, value NUMBER );

create table acct ( acctno NUMBER, filter VARCHAR2(1) );

Is that a reasonable version of your tables?




create view depos as
select acctno, max(value) depo1
  from bets
group by acctno

Replace max() with whatever gets you the correct transaction



select SUM(value), bets.acctno, MAX(depos.depo1)
  from bets, acct, depos
 where acct.filter = 'Y'
   and bets.trtype = 'bet'
   and bets.acctno = acct.acctno
   and bets.acctno = depos.acctno
  group by bets.acctno, depos.depo1


I’m thinking that you could get round the need for the view (which you may not be in a position to create) with a self join of the bets table to itself. I’m busy right now but I may have another go if I have time.

You were right. I put it back…


select q1.accountnumber, q1.handle, q2.firstdeposit,agent,forename,surname from							
(select amtoteaccountactivity.accountnumber, 																		
sum(cast(total_bet_amount as real) - cast(refund_amt as real)) as handle											
from amtoteaccountactivity,clients																					
where amtoteaccountactivity.accountnumber = clients.accountnumber													
and amtoteaccountactivity.transaction_type = 'Bet'																	
and clients.bonuscredited='no'																						
group by amtoteaccountactivity.accountnumber) q1																	
inner join																											
(select agent,forename,surname,amtoteaccountactivity.accountnumber, amtoteaccountactivity.amount as firstdeposit	
from amtoteaccountactivity,(select agent,forename,surname,amtoteaccountactivity.accountnumber, 					
min(amtoteaccountactivity._date) as mindate																		
from amtoteaccountactivity,clients																					
where amtoteaccountactivity.accountnumber = clients.accountnumber													
and amtoteaccountactivity.transaction_type in ('Visa','Neteller','Master Car')										
and clients.bonuscredited = 'no'																					
group by amtoteaccountactivity.accountnumber,agent,forename,surname) sq1											
where amtoteaccountactivity.accountnumber = sq1.accountnumber														
and amtoteaccountactivity._date = sq1.mindate) q2																	
on q1.accountnumber = q2.accountnumber where q1.handle >= q2.firstdeposit

The big test will be when there is more than one account that qualifies for the 'bonuscredited=‘no’ condition

and when there is more than one deposit (will it correctly pick the first?). I was using ‘top 1’ and sort by date… but I guess picking the minumum date and using that to filter achieves exacxtrly the same end.
edit: forgot to remove the distinct keyowrd :smack: It still duplicates the results without it.

I think I see one possible issue with the dates. I assumed that the dates for deposits for an account would be distinct, but if they’re not, then you would get multiple rows for a given account. So, if there are multiple rows for the same date, do you want just one of them or the sum of them?

Am I the only one who’s brain went:

select ANIMAL from BEASTS where MONSTER=true ;

?

This can have a tremendously bad effect on performance. Be careful with it.

You are right on that. Not a week goes by where I don’t tell one of my consultants to take out a DISTINCT or a MAX that they are using to cover a flaw in the query. There is a fundamental problem if it is doing that and it needs to be addressed. My specialty is Oracle SQL but subqueries are definitely the only way to go as the problem is described. If one of my Oracle SQL statements was having similar problems, I would have a DBA look into it because it doesn’t sound right. The time of execution isn’t linear however and you can’t just say X minutes times Y records will be equal for different datasets. The time of execution could be anything.

Just one… this is why I used ‘top 1’ and sort by date… which in hindsight would still not work.

I should be using time and date.
Time is in the correct order in the database anyway (within a given date), but date may not be due to the fact we’ve had to insert data out of sequence a couple of times in the past.
Also… I know for a fact the test subject (a test account) has only one deposit.

The query is listed in full above if anyone can figure out why it’s duplicating that would be much appreciated. I will try too but my sql skills are clearly inferior to some other’s here.
edit: time and date are seperate fields (unlike a sensible database that would store it as one datetime field)

Actually, I don’t think top 1 will work anyway, because top is not an aggregate function. What you need is the earliest date/time per account, which requires a group by, which top does not work with (in other words, you can’t get a “top 1 per group”, only a top 1 from the entire resultset, period).

The problem here is that even if you manage to create the subquery that gets you the minimum date per account – something like Small Clanger’s example, is that in my experience, MSSQL doesn’t do a very good job of applying the optimizer technique known as the ‘predicate push’ into subqueries/views (Oracle tends to be better at it IME). What this means in your case is that MSSQL will have a hard time applying the account filter on that subquery/view before doing the calculation of getting the minimum date/time. Rather, it may decide to calculate the min date/time for every single account in your transaction table, and only then apply the account filter. This is something you want to avoid.

Given all this, I think your best approach is going to be a server side stored procedure with temp tables, or perhaps a scalar function that will handle the minimum date/time part. Is this something you’re allowed to do in your enviroment or not? And is there someone who can verify that your transaction table has the required indexes built? Without the right indexes, no query you write is going to perform well on a large transaction table. (Are any other applications using this table, by the way, or is yours the only one?)

I can create stored procedures myself, but what I am writing is something that will be used on an intranet by people who don’t have that access.

Does anyone know about asp? Do you know how to use the results of running a stored procedure in a new query text, as if you were using a table? This is an important question because it will make life a lot easier than trying to build huge complicated queries that become impossible to debug.
Say I get the result of running a stored procedure like this “set rs =conn.execute(stored_procedure)”

how do I then use those results in a new query?

I realize the answer to my own question is have my stored procs build new tables in the db…

but it would still be good to know if it’s possible to use a vb recordset in a new query.

My original query in which I was using top one wasn’t using agregate functions. It was using a subquery that was run for each account one by one… so top 1 worked in that way.

Yes, this is what I’ve been trying so say but I guess I could have been clearer. Create the temp tables within the proc and then don’t worry about session scope or anything. That’s what they’re there for.

More unsolicited advice: it’s generally considered a better approach to do the required data manipulation as close as possible to the data itself. This way the stored procedure can be changed and tweaked for performance as the table structures change, without impacting your asp application. All it needs to know is the name of the proc and the structure of the result set.

I don’t know asp that well myself, but I don’t think there is any way to directly pass resultsets from one query as a parameter to a new query. I’m guessing that you would have to create an intermediate data structure to hold it, and then pass them as native parameters to the new query. But this approach isn’t often considered sound practice. It’s all data manipulation. As a rule of thumb, try not to to take these intermediate steps, which among other things, represent repeated round trips to the database.

Amen and hallelujah. I’m an Oracle developer myself but our philosophy has always been, “Any code that can go in the database, should go in the database.” (With the usual caveat that there are always exceptions) It’s much easier to keep track of where your code is if you KNOW it’s in the database. The backup and recovery procedures are much better than those for a filesystem (IMHO). And all of this is in addition to the main reasons of reduced network traffic and ease of use.