I’ve always used count() in queries where I was grouping, but one of our DBAs said it’s faster to use count(1). However, I’m not totally confident in his knowledge of these kinds of things so I’m looking for confirmation. Is count(1) really better than count() performance-wise for queries such as the following:
select State, count(1)
from Customers
group by State
(Obviously, that’s a very simple example. Generally I’m using this on queries with multiple group by fields, where clauses, etc.)
Are there situations where count(*) is better or is count(1) always more efficient?
As Sanity Challenged states, in practical terms, the two are equivalent. The conventional wisdom used to be that COUNT(*) was a special case optimization by SQL engines that didn’t require the actually row data. It’s a safe bet that modern implementations will also recognize the presence of the literal as the same situation.
Philosophically, I prefer the asterisk, as the meaning seems to be more intuitive to me. YMMV
Nothing like a little empirical testing to throw up some actual results.
I did both a SELECT COUNT(1) and a SELECT COUNT(*) with several GROUP BY clauses on a table that contains over 22 million rows, and on the second execution (to avoid disk latency issues) both queries took 44 seconds to execute and returned the exact same execution plan. This is on SQL Server 9.0.3054.
So it looks like there’s no difference at all, and you can crow. Incidentally, the reason I have access to tables with 22 million rows is because I work for a data warehousing company, which is convenient when questions like this are asked.
I use count(*) because * is ‘all fields’ or ‘a record’ so I am counting records.
Counting how many 1 there are is completely nonsensical, even if it does work.
Oh, we’ve got databases with far higher volumes than that - foreign exchange transactions for a major bank over several years and advertising ratings for all programmes for a major TV channel. I think each of those is over several billion rows in total. I was just going along (tongue-in-cheek) with the comment about 22 million being a Data E-Z U-Store-It Company, and wondering what individual person would need that sort of storage.
Am I spending too much time explaining? I suspect so…
Lobsang - you have to be careful with tests like this because the second one should almost always run faster (all things being equal) because of data caching.
The only way to really test fairly is to bring up the server fresh with one user. Then run Q1, Q2. Restart the server fresh again and run Q2, Q1.
I’m guessing the * in count(*) leads people to the analogy with select *… they think they’re asking the server to count “everything”. It doesn’t work that way - it’s a special syntax that says give me the rows returned. That in itself doesn’t really require the server to do anything because it probably always counts that anyway.
The second run (count (1)) was an afterthought upon returning to the thread. I had closed down the server (well, enterprise manager at least) and I actually assumed the second query would run a lot faster due to caching (or to be more accurate - due to knowing I’d run the same or similar query recently) but it didn’t, It was only about three percent faster.
Are we playing whose data set is bigger? 'Cause the one I am responsible for gets 40-60 million records a day. At one time we were up to 100 million a day. And each record is a couple hundred fields long.