SQL Help: SUM function

I’m having some SQL trouble, and I’m hoping someone here can help.

I have three tables: one which lists information about products, one which lists information about stores selling those products and when, and one which lists information about how many customers have viewed a product (it has two columns, a date and the number of views on that date).

What I want to do is list, for a given date range, for each product, the number of stores selling that product and the number of customers who viewed that product.

I have a query like this:



SELECT [Products].[ProductID], 
            SUM([DailyViews].[Views]) as Views, 
           count(DISTINCT([ProducttoStoreLinkDetails].StoreID)) as Links 

FROM [Products], [DailyViews], [ProducttoStoreLinkDetails] 

WHERE 
            [Products].ProductID = [DailyViews].ProductID 
     and [ProducttoStoreLinkDetails].ProductKey =[Products].ProductKey 
  
     and [DailyViews].ViewDate > '2012-12-31' and [DailyViews].ViewDate < '2013-04-01' 

      and [ProducttoStoreLinkDetails].DateCreated > '2012-12-31' and [ProducttoStoreLinkDetails].DateCreated < '2013-04-01' 

group by [Products].ProductID 
order by Links desc

Here’s the problem: the count of Links comes out fine. The sum of views comes out as the sum of views * the number of days in the range (90). If I use SUM(Distinct([DailyViews].[Views])) as Views, I get the correct sum – except that, for any days which had the same number of views, that value is only counted once (which is what distinct does, I know).

Where am I going wrong? How do I get the sum of the views value for each day, without it being multiplied times the number of days?

I suspect you’re missing a condition in your table join somewhere, which is causing it to evaluate the join as a Cartesian product. Without looking at the structure of the tables in detail, I think you need to add


and [DailyViews].ViewDate = [ProducttoStoreLinkDetails].DateCreated

to your WHERE clause.

Interesting. I had thought one join for each table was all that was needed, but that makes sense. I’ll try that and report back. Thanks!

Adding that, the query executes successfully and I get 0 results.

It turns out that ViewDate is in date format while DateCreated is in datetime format. I’m now looking up conversion methods…

Remember that when you join, you are basically creating all combinations of rows from all 3 tables and whittling that list down to what you want with the join/where clauses.

But, unless I am misinterpreting, you are joining 2 independent/marginally related sets of data together when you shouldn’t be (views and store links).

Alternatives:
Subqueries in select:
Select
productid
,(select sum(views) from … where …) as num_views
,(select count(stores) from … where …) as num_stores
from products
where …
Subqueries combined with union all and final sum:
select
product_id
,sum(num_views) as num_views
,sum(num_stores) as num_stores
from (
select
productid
,sum(views) as num_views
,sum(0) as num_stores
from products
inner join views
where …
UNION ALL
select
productid
,sum(0) as num_views
,count as num_stores
from products
inner join stores
where …
) t1
group by
productid

You’re not misinterpreting; I munged the backstory slightly. In real life, these are both independent actions happening to the product, so it makes sense to report on the product and the two actions.

I tried a variant of this, with a single join between tables 2 and 3 and subquery in the where, but I’ll try this – and your other suggestion – if I can’t get the datetime conversion to work (or if that method, itself, doesn’t work).

The “subquery in select” idea would get my vote as probably being the easiest to implement correctly.

Your original sql has another common, somewhat hidden problem. Because your where conditions are really just inner joins, you could miss out on products that have views but no links, or links but no views. Simply joining up the other two tables by date would not remove this problem; it would merely cause the problem on each individual date. The subquery solution should avoid this problem, although you still need to be careful with your where conditions in the subqueries.

If you had many more products than views or links, then the “subquery with union” might be faster, but I don’t expect that this is the case with your system. In any event, you’d have to analyse the execution plan of both ideas to see which seems to work more efficiently with your system.

Sorry, folks, I just didn’t have time to work on this today, so no progress to report. I’m hoping to take some time tonight.

This is a really interesting point, and one I hadn’t considered – thank you! For the reasons you suggest, which are very real dangers, I will make that change.

There are about 30,000 products … so this actually is a real danger. I’ll only be running the query once (well, once successfully, umpteen times to get wrong results!), so I’m not too concerned, but I appreciate the caution. If/when this goes into production, I’ll let my DBAs worry about this angle. :slight_smile:

in mssql:

and datediff(d,[DailyViews].ViewDate , [ProducttoStoreLinkDetails].DateCreated)=0

it’s the difference in days between the two dates.

all rdbms’s will have their own date methods.

I think this should work (ignoring issues in date conversion). For future reference, it would be very helpful if you specify database platform and version. SQL syntax can vary widely.

SELECT 
	q.ProductID, q.Views, SUM(l.StoreID) AS Links
FROM
(
	SELECT p.ProductID, p.ProductKey, SUM(dv.Views) as Views
	FROM
	[Products] p
		LEFT OUTER JOIN
	[DailyViews] dv
			ON dv.ProductID = p.ProductID
	WHERE dv.ViewDate > '2012-12-31' and dv.ViewDate < '2013-04-01' 
	GROUP BY
		p.ProductID, p.ProductKey
) q
	LEFT OUTER JOIN
[ProducttoStoreLinkDetails] l
		ON l.ProductKey = q.ProductKey
WHERE l.DateCreated > '2012-12-31' and l.DateCreated < '2013-04-01' 
GROUP BY 
	q.ProductID, q.Views

Oops, the first line should have a COUNT not a SUM:


SELECT 
	q.ProductID, q.Views, COUNT(l.StoreID) AS Links
FROM
(
	SELECT p.ProductID, p.ProductKey, SUM(dv.Views) as Views
	FROM
	[Products] p
		LEFT OUTER JOIN
	[DailyViews] dv
			ON dv.ProductID = p.ProductID
	WHERE dv.ViewDate > '2012-12-31' and dv.ViewDate < '2013-04-01' 
	GROUP BY
		p.ProductID, p.ProductKey
) q
	LEFT OUTER JOIN
[ProducttoStoreLinkDetails] l
		ON l.ProductKey = q.ProductKey
WHERE l.DateCreated > '2012-12-31' and l.DateCreated < '2013-04-01' 
GROUP BY 
	q.ProductID, q.Views

AnalogSignal, your method can work, but you need to move your WHERE clauses up into the JOIN clause on the left outer (for both of them).

Otherwise, if rows exist for the product in DailyViews (for example), the LEFT OUTER JOIN finds a match and basically acts like a normal INNER JOIN, but the where clause then removes that row, so the ProductID is not returned at all.

Do this (WHERE switched to AND in join):



FROM 
    [Products] p 
        LEFT OUTER JOIN 
    [DailyViews] dv 
            ON dv.ProductID = p.ProductID 
      ** AND **   dv.ViewDate > '2012-12-31' and dv.ViewDate < '2013-04-01' 


convert(char(10),cast(ViewDate as datetime),101)

and/or

convert(char(10),cast(DateCreated as datetime),101)

should convert to “mm/dd/yyyy” format.

If this is SQL Server, no need to convert both fields to char, just convert the datetime as date.

CONVERT(Date, DateCreated)

Not sure about other DBMS.

Yes, good point! Agreed.

Genius! AnalogSignal, RaftPeople, this worked. Thank you, very, very much. I am heartily impressed by your skills, and I believe I have learned something. (At the very least, I’ll be saving this for future use and reference.)

I’m glad this worked. Thanks StephenG for letting us know. Several times I have posted a solution here only to have the OP never return. Thanks again to RaftPeople for providing the bug fix.