Why does this sql query take so long?

Hi squirrels.
I’ve been trying to run this query in several different ways but each time it seems to take ages at an unexpected place. Here is the most recent ‘version’

create view tempcompanygenuine as select * from clients where companyid= 2 and requeststatusid = 2 and email not like ‘%mycompany.com

Gets all accounts that are not company accounts, but belong to a particular website . runs very quickly.
*create view distinctaccountsbet as

select distinct accountnumber from bettingtable where _date >=‘09/08/06’

gets all account numbers that have bet in the last two months. runs very quickly.
*select accountnumber, registrationclosedate,forename, email
from tempcompanygenuine where accountnumber not in (select * from distinctaccountsbet)
*

Takes ages! So far it hasn’t completed. As I type it’s up to ten minutes and has pulled off 839 records (of a probable seven thousand)
I can’t understand why the last query is taking so long. All it’s doing is finding records from the first view where the account number isn’t in the second view.

Up to 30 minutes. 2857 records returned so far. (nearly half way!)
Could it be re-running the views for every single record it’s checking against??? It certainly seems like that. And this is a fairly powerful/fast sql server it’s running on.

It occurs to me I could get this done much faster by removing the email address filter and then paste the results in to excel and remove the emails that way…

but I’m too afraid to stop it now! 36 minutes!

I’m not a SQL expert but occasionally write simple queries. Our DB team told me that for that type query to use a "where not exists (select ‘x’ from distinctaccountsbet…) construct.

I’m in an Oracle SQL environment if that makes a difference.

What that seems to do is return one or no records in the sub query for each account number being compared, so potentially the subquery is being run thousands of times.

But running a query that returns one record thousands of times is probably quiker than running a query that returns 400 records thousands of times and then comparing the account number to each one of the 400.

But I was hoping my sql server was ‘smart’ in other words it should know to run each sub query only once, and then compare each account number in the main query to the results of the second sub query.

I have no idea what it’s actually doing. I’ll try reconstructing the overall query using not exists’ instead of ‘not in’

Right…

*select accountnumber,forename,email from clients

where exists

(select top 1 accountnumber from bettingtable a1 where transaction_type = ‘bet’ and not exists

(select top 1 accountnumber from bettingtable a2 where _date >=‘09/08/06’ and a1.accountnumber = a2.accountnumber)

and a1.accountnumber = clients.accountnumber)
*
It runs in just 1:21 (one minute 21 secs) but because it’s increased in complexity I’m no longer ‘confident’ it has pulled back the data I actually want.

I have run into similar performance issues when joining together two views in SQL Server.

While this probably isn’t the best solution, I found that replacing one of the views with a temp table or table variable in the query really speeds things up.

If the “not exists” doesn’t help, I’d change “(select * from distinctaccountsbet)” to “select accountnumber”. There’s no reason (that I can think of) to tell it to select everything from that table when you only want one field, especially one that might be in an index.

The DB might be smart enough to figure this out on its own or it might not. You might as well be specific.
ETA::smack::smack::smack: Didn’t see that it was a view of only one column.

I have been tempted to do that. I just thought using views was the ‘proper’ way.

I’ll give that a go if all else fails. But I think tim-n-va’s ‘exists’ suggestion has worked…

but there IS only one field in that table. Accountnumber. It’s a view.

I think I’d also try removing the distinct from the view. It will cause a sort that isn’t needed and there’s no telling how many times it will do it.

I thought of that :slight_smile: Changed it to ‘top 1’ instead of distinct.

Have you tried some judiciously-placed indexes on some of the columns you’re using?

In the end I went with the hendel solution. it allows me to do it in stages.

ntucker Two of the tables I’m working on have indexes, but probably not ideal for my query. I have never had formal SQL database training (except for some basic stuff at University which I had almost entitely forgot by the time I started using sql at work)

In other words I wouldn’t know what to index, what type of index to use, what impact the index would have on the size and day-to-day running of the table.

In other words I’d feel like someone trying to fix a TV with no TV fixing experience.

For the interest of those who are interested, here’s how I did it in the end…

**select distinct accountnumber into temp_aa_2months from bettingtable where _date >= ‘09/08/06’
**

Took about 40 seconds. Then…

**select distinct accountnumber into temp_aa_allexceptlast2months from amtoteaccountactivity where transaction_type = ‘Bet’ and accountnumber not in (select * from temp_aa_2months)
**

Took about 1.20 seconds. Then.

**select distinct accountnumber,forename,email from clients where trackingid = 2 and requeststatusid = 2 and accountnumber in (select * from temp_aa_allexceptlast2months)
**
Almost instant.

I then pasted the results into excel, and used excel’s sort to remove the email addresses I didn’t want.

The proper way to do things is to follow these 2 rules:

  1. Make it fast enough to meet your requirements
  2. Make it maintainable for the next person

Views are valuable, they allow you to simplify and abstract aspects of your data model that get used in many places. But the first requirement is to get the job done and if the optimizer is having a problem using a view, then you may need to avoid it for that particular query. I have run into situations where the complexity of a view caused problems with the optimizer as it was being used in my query.
Regarding Temp Tables:
Use temp tables when it makes your query more maintainable or if required for functionality or performance. I use them often for complex sets of queries.
Regarding Performance on Your Query:
My guess is your problem is the “NOT IN” clause. IN, NOT IN and EXISTS and NOT EXISTS have different performance characteristics (within SQL Server and between SQL Server and Oracle). If I remember correctly, a IN/NOT IN with a subquery may be executing the subquery once for every row in the driving table.

You could try the following:

  1. Change NOT IN to NOT EXISTS
  2. In addition to 1), add a WHERE clause in the NOT EXISTS subquery on account number if account number is indexed in the underlying table
  3. Convert NOT IN to a LEFT OUTER JOIN with a WHERE somefield IS NULL from the joined table

I had this idea on the way home from work today as well. Of course, the NULL field would either need to be the primary key or some column with a NOT NULL constraint.

I agree - I think it’s the lack of a join that’s slowing it down - without that, it might be running a cartesian query, then pruning out everything that isn’t required.

Yeah - I’d suspect the same thing. It’s probably re-running the sub-query with every row in the outer query. I don’t have my books to hand or I’d look it up.

Try the alternate version (‘not exists’) to see if that isn’t a bit friendlier, speed-wise.

Did you examine the plan for the query? That is usually a good way to determine what the optimizer is doing. (I’m not good with SQL Server so I don’t know how to get a query plan out of it. There is probably a way.)