Does narrowing a sql search put more strain or less strain on a SQL server?

1 “select * from table where field1 = ‘bob’”
2 “select * from table where field1 = ‘bob’ and field2 = ‘monday’”
3 “select * from table where field1 = ‘bob’ and field2 = ‘monday’ and field3 = ‘breakfast’”
Which of the three queries will put the most strain on the server? Which will put the least strain?

Assume that there’s nothing unusual about the data (i.e. that 1 will return more records than 2, and 2 will return more records than 3)

And a followup question…

If a wider query is easier on the sql server… would it make sense to put any desired filters on the ASP output? (i.e. put in an if statement on your table generation code, which says if rs(“field2”) = ‘monday’ then display the record

I am a database expert but I work mainly with mega-corp Oracle with queries thousands of lines long but I have done tons of Access development as well as some SQL server development.

Those queries are all so simple that I don’t think it is a question worth asking. None of those should put any unusual strain on the server. I have no idea which one would win because there are different optimizers available. In any case, any difference should be in milliseconds. I would not worry about it at all unless you have thousands or tens of thousands transactions going through each second.

I say it has to be a non-issue.

Those queries are fake ones to illustrate what I am asking - does sql query complexity increase or decrease server load?

Imagine if I have a query which does seven filters and still returns thousands of records from a table of billions…
would I put less load on the sql server if I took the seven filters out of the query and put them in the asp file that will be displaying the results?

In this simple case, the more criteria you add, the more slowly the query runs, even if you’d have to be working with very large tables to see the difference. It probably still makes sense to do the selection in SQL, just because it’ll almost certainly be faster than doing it in your favorite interpreted server scripting language (but check and see). Once you start joining multiple tables in your queries, your ability to be more clever than the query execution engine will degrade rapidly, so it’s best just to let it handle those executions.

ETA: On reflection, there’d have to be something extremely bizarre happening on several different fronts for the ASP filtering to be faster than constraints on a query. Query engines are seriously optimized to run quickly, and your ASP code isn’t.

The third query will put the most strain on the server, unless the number of rows returned by the first query is significantly larger, in which case sending those rows across the network will take up a significant amount of time as well. It will also depend on how you have configured the database, indices you have created, and so forth. There’s no way to answer it generally.

The answer to your second question is a definite no. Databases are designed to perform things like this very efficiently. Sending all the rows across the network for your application filter them will be a heavy load in itself - not to mention, it will take far more effort for your simplistic filtering to find the desired rows than it would for the database to find them itself.

Are you familiar with the rules of performance optimization?

  1. Don’t optimize.
  2. For experienced developers only: Don’t optimize yet.

It is basically never worth worrying about this kind of stuff. You should use all your tools for the purposes they are designed for - don’t try to second-guess them because you think it will produce a minor performance improvement. Chances are, you’ll just make performance worse - and even if, by some stroke of luck, you eek out a 1% improvement, you’re left with a crappy, bizarre architecture that will be more difficult to maintain and debug - and that 1% performance gain will be irrelevant, as two months later Intel will release a processor that’s 15% faster than whatever you were using.

Oh, god no. Even if you ignore the fact that the database uses techniques vastly more advanced than “iterate through all the rows and check the values”, your technique still involves sending billions of rows across the network instead of thousands. Even with a dumb database, it would still be more efficient to have the database perform the filtering itself in memory and then send just the results.

Network bandwidth maxes out at 1 Gbps. Memory bandwidth in modern machines is approximately 102 Gbps.

Well…
edit:Absolute’s reply came after this post. I am talking thousands of records down to hundreds (I’d still put sensible filters on the original query)… and the iteration is being done anyway so why not check the data as i’m iterating through it? (the construction of the table involves iterating through every single returned record)

I want to minimise SQL load at the expense the time it takes for the end user to see his data.

In other words, the method that puts the least strain on the sql server (not caring one jot about how long it takes for the end user to see his data) is the preferable method…
and the alternative filtering wouldn’t be too difficult - it would simply a case of not bothering to display a table row if it’s data doesn’t fit the criteria.
Having said all that… I will take the advice given and use the tools for their intended purpose… it’s just that my IT manager has been mentioning sql load to me latel.y

Thanks for the replies peeps :slight_smile:

Yes. yes. I see the flaw in my logic…

If there’s less data in the data set, there’s less iterating to do. :smack:

If load is an issue, this is there the “don’t optimize yet” part kicks in. It is a pretty quick step to take in most environments to see what queries are running most often, and which queries are taking the most time. Use that log to figure out where to focus time on optimizing. Once you have the query that is really hitting you the hardest, examine it. Make sure fields that are used are indexed.

Also, if the boss is mentioning sql load as a problem, how is he figuring this out? Are responses taking too long for end users? or for intermediate applications? Or is he looking at the cpu load on the server? Or is he specifically saying that certain specific queries are taking longer than expected based on long term evaluation of those specific queries (it was running in .02 seconds last week, now it is taking 2 minutes).

I think it’s important to understand what the bottleneck is. Unless you are able to put the entire database into RAM (and trying to keep as much of it in RAM as possible may be a good idea), you’re probably bottlenecked by your storage, so you may want to add more spindles to your RAID array, move to RAID-10 instead of RAID-5 etc. If CPU is your bottleneck, look into multiple processors/cores so you can run transactions in parallel.

Anyway, do some performance monitoring first!

I realize that this thread is three days old, but I’m going to attack this anyway because it’s my area of expertise and my logic goes against what some people on this thread have said.

With the databases I am familiar with (in order: Sybase, SQL Server, MySQL, and, to a much lesser extent, Oracle), with a single table SELECT like in your examples, there can be one of three scenarios:

  1. The table has a single index that includes all the columns you are filtering.
  2. The table has an index on some of the columns, but not all of them
  3. The table has no index on these columns.

In each case, however, the outcome is the same: the more filtering you have, the less load will be put on the server (this is a assuming a disk-based database; if in-memory, the conclusion would be a bit different).

This may seem counter-intuitive, but hear me out:

First of all, you need to remember that disk I/O is several orders of magnitude slower than in-memory operations, so in the case of a database where many thousands of disk pages are involved, the amount of time to do memory operations can be considered to be insignificant, relative to the time spent doing I/O.

Now, let’s take case 1: You have an index covering all columns in your WHERE clause. The database engine traverses the index and retrieves only the rows you need. If your criteria are less narrow, there are more rows to be read from disk and more to send back to the client.

Case 2: The database traverses the index, retrieves all rows that match the indexed columns, then checks each retrieved row against the non-indexed columns and sends the relevant ones back. If you widen your criteria and choose to select only on indexed columns, the number of rows retrieved from disk is the same, but you still need to send more data back to the client, which is more expensive than doing in-memory comparisons.

Case 3: The database has to read each row in from disk and then send only the matching ones back to the client. Again, if you use wider criteria, you’re reading the same number of rows from disk, but you’re still putting a heavy load on the DB to send that data back.

Running with Scissors, DBA

Say what? You have the backwards right? :dubious:

Well the parity calculation on RAID-5 will penalize your writes. RAID-10 (or 1+0) should give you good read and write performance, at the cost of the additional drives you need to provide the same amount of space.

What Running With Scissors said. Without knowing what indexes you have to support your query, it’s not possible to give a definite answer, but in the best case, your last query involves fetching one or a few index pages and getting the row-IDs of the table entries you want. In the worst case, any of them entails running down the whole damn database, bringing back every single row and then deciding you don’t want it after all. In the context of that, this is like worrying about how sharp the tool is you’re going to use to cut down a giant redwood… before actually deciding whether to use a cut-throat razor or a big honkin’ saw.