I’ve been trying to run this query in several different ways but each time it seems to take ages at an unexpected place. Here is the most recent ‘version’
create view tempcompanygenuine as select * from clients where companyid= 2 and requeststatusid = 2 and email not like ‘%mycompany.com’
Gets all accounts that are not company accounts, but belong to a particular website . runs very quickly.
*create view distinctaccountsbet as
select distinct accountnumber from bettingtable where _date >=‘09/08/06’
gets all account numbers that have bet in the last two months. runs very quickly.
*select accountnumber, registrationclosedate,forename, email
from tempcompanygenuine where accountnumber not in (select * from distinctaccountsbet)
Takes ages! So far it hasn’t completed. As I type it’s up to ten minutes and has pulled off 839 records (of a probable seven thousand)
I can’t understand why the last query is taking so long. All it’s doing is finding records from the first view where the account number isn’t in the second view.
Up to 30 minutes. 2857 records returned so far. (nearly half way!)
Could it be re-running the views for every single record it’s checking against??? It certainly seems like that. And this is a fairly powerful/fast sql server it’s running on.
What that seems to do is return one or no records in the sub query for each account number being compared, so potentially the subquery is being run thousands of times.
But running a query that returns one record thousands of times is probably quiker than running a query that returns 400 records thousands of times and then comparing the account number to each one of the 400.
But I was hoping my sql server was ‘smart’ in other words it should know to run each sub query only once, and then compare each account number in the main query to the results of the second sub query.
I have no idea what it’s actually doing. I’ll try reconstructing the overall query using not exists’ instead of ‘not in’
(select top 1 accountnumber from bettingtable a1 where transaction_type = ‘bet’ and not exists
(select top 1 accountnumber from bettingtable a2 where _date >=‘09/08/06’ and a1.accountnumber = a2.accountnumber)
and a1.accountnumber = clients.accountnumber)
It runs in just 1:21 (one minute 21 secs) but because it’s increased in complexity I’m no longer ‘confident’ it has pulled back the data I actually want.
If the “not exists” doesn’t help, I’d change “(select * from distinctaccountsbet)” to “select accountnumber”. There’s no reason (that I can think of) to tell it to select everything from that table when you only want one field, especially one that might be in an index.
The DB might be smart enough to figure this out on its own or it might not. You might as well be specific.
ETA::smack::smack::smack: Didn’t see that it was a view of only one column.
In the end I went with the hendel solution. it allows me to do it in stages.
ntucker Two of the tables I’m working on have indexes, but probably not ideal for my query. I have never had formal SQL database training (except for some basic stuff at University which I had almost entitely forgot by the time I started using sql at work)
In other words I wouldn’t know what to index, what type of index to use, what impact the index would have on the size and day-to-day running of the table.
In other words I’d feel like someone trying to fix a TV with no TV fixing experience.
The proper way to do things is to follow these 2 rules:
Make it fast enough to meet your requirements
Make it maintainable for the next person
Views are valuable, they allow you to simplify and abstract aspects of your data model that get used in many places. But the first requirement is to get the job done and if the optimizer is having a problem using a view, then you may need to avoid it for that particular query. I have run into situations where the complexity of a view caused problems with the optimizer as it was being used in my query.
Regarding Temp Tables:
Use temp tables when it makes your query more maintainable or if required for functionality or performance. I use them often for complex sets of queries.
Regarding Performance on Your Query:
My guess is your problem is the “NOT IN” clause. IN, NOT IN and EXISTS and NOT EXISTS have different performance characteristics (within SQL Server and between SQL Server and Oracle). If I remember correctly, a IN/NOT IN with a subquery may be executing the subquery once for every row in the driving table.
You could try the following:
Change NOT IN to NOT EXISTS
In addition to 1), add a WHERE clause in the NOT EXISTS subquery on account number if account number is indexed in the underlying table
Convert NOT IN to a LEFT OUTER JOIN with a WHERE somefield IS NULL from the joined table
Did you examine the plan for the query? That is usually a good way to determine what the optimizer is doing. (I’m not good with SQL Server so I don’t know how to get a query plan out of it. There is probably a way.)