SQL experts: Is count(*) or count(1) better?

I’ve always used count() in queries where I was grouping, but one of our DBAs said it’s faster to use count(1). However, I’m not totally confident in his knowledge of these kinds of things so I’m looking for confirmation. Is count(1) really better than count() performance-wise for queries such as the following:

select State, count(1)
from Customers
group by State

(Obviously, that’s a very simple example. Generally I’m using this on queries with multiple group by fields, where clauses, etc.)

Are there situations where count(*) is better or is count(1) always more efficient?

Most current DB’s will optimize them to be the exact same. Use whatever makes your boss feel smart.

As Sanity Challenged states, in practical terms, the two are equivalent. The conventional wisdom used to be that COUNT(*) was a special case optimization by SQL engines that didn’t require the actually row data. It’s a safe bet that modern implementations will also recognize the presence of the literal as the same situation.

Philosophically, I prefer the asterisk, as the meaning seems to be more intuitive to me. YMMV

I’m with Cerowyn… COUNT(*) means “count the rows returned”. Any other column name, function or expression means “count the non-nulls”.

As the others have said, the db engine might optimize a literal because it’s always non-null, but why rely on that?

If I were dubious, I would write two queries on a big table:
select count (*) from big_table
select count (1) from big_table

and see which one runs faster. If there is no difference, I would ask the DBA why he says there is a difference.

What database system are you using? What version?

I would be very surprised if there were any difference between the two methods.

We’re using SQL Server 9.0.3186.

I did have my doubts about count(1) being faster, which is why I asked here. Thanks.

Nothing like a little empirical testing to throw up some actual results.

I did both a SELECT COUNT(1) and a SELECT COUNT(*) with several GROUP BY clauses on a table that contains over 22 million rows, and on the second execution (to avoid disk latency issues) both queries took 44 seconds to execute and returned the exact same execution plan. This is on SQL Server 9.0.3054.

So it looks like there’s no difference at all, and you can crow. Incidentally, the reason I have access to tables with 22 million rows is because I work for a data warehousing company, which is convenient when questions like this are asked. :wink:

Only 22 million? More like a data E-Z U-Store-It company. :wink:

Eh, I was using one of the smaller tables in the database, because quite frankly I didn’t want to wait longer than necessary for results.

Well okay then. :stuck_out_tongue:

How many people do you know who want to store 22 million rows of anything? :stuck_out_tongue:

I use count(*) because * is ‘all fields’ or ‘a record’ so I am counting records.
Counting how many 1 there are is completely nonsensical, even if it does work.

Individual bet transactions over six years for thousands of customers.

Oh, we’ve got databases with far higher volumes than that - foreign exchange transactions for a major bank over several years and advertising ratings for all programmes for a major TV channel. I think each of those is over several billion rows in total. I was just going along (tongue-in-cheek) with the comment about 22 million being a Data E-Z U-Store-It Company, and wondering what individual person would need that sort of storage.

Am I spending too much time explaining? I suspect so…

Actually our bet transaction DB only has 12,501,479 records. Which is less than half the 23 million.

Took my SQL server 32 seconds to count them.

I was bored so I ran the query with count(1).

Took 31 seconds. So it’s one second quicker.

Lobsang - you have to be careful with tests like this because the second one should almost always run faster (all things being equal) because of data caching.

The only way to really test fairly is to bring up the server fresh with one user. Then run Q1, Q2. Restart the server fresh again and run Q2, Q1.

I’m guessing the * in count(*) leads people to the analogy with select *… they think they’re asking the server to count “everything”. It doesn’t work that way - it’s a special syntax that says give me the rows returned. That in itself doesn’t really require the server to do anything because it probably always counts that anyway.

Bah. I’m looking at a SQL Server table right now with 31,776,848,544 rows in it. Data space is almost 2 Tb.

I’m not going to try count(1).

The second run (count (1)) was an afterthought upon returning to the thread. I had closed down the server (well, enterprise manager at least) and I actually assumed the second query would run a lot faster due to caching (or to be more accurate - due to knowing I’d run the same or similar query recently) but it didn’t, It was only about three percent faster.

Are we playing whose data set is bigger? 'Cause the one I am responsible for gets 40-60 million records a day. At one time we were up to 100 million a day. And each record is a couple hundred fields long.

Do I win?

Susan