Anyone up for some SQL Server query fun? Sure you are!

Here’s the deal…I need to create a simple bar graph with price distillations for Crystal Reports XI. CR has a fun feature where is a datapoint has no data it skips it in the graph. That is, say I run it and there is no data that meets the <$100,000 criteria. Well, the graph then visually starts at $125,000 skipping <$100,000 completely. This applies when using straight up SQL Server tables to drive the graphs. Using stored procedures to specify that a specific datapoint is zero is the way around it.

So, I need create a query that will make sure all datapoints are captured. I’m working with a consultant who created a stored procedure using a loop, but I’d be MUCH happier to rely on just sql queries.

Here is the output from his stored procedure that I’d like to mimic:


Idx	valuerange	valuecount
1	<100	180
2	125	78
3	150	88
4	175	60
5	200	48
6	225	44
7	250	45
8	275	26
9	300	34
10	325	16
11	350	19
12	375	8
13	400	11
14	425	3
15	450	7
16	475	4
17	500	8
18	525	0
19	550	2
20	550+	13

Note that idx 18 with the 525 value is 0, yet 525 still exists to be reported on.

And here is the stored procedure he used to make it happen (spoilered for brevity’s sake):

[spoiler]USE [Builder Report MatrixSQL]
GO
/****** Object: StoredProcedure [dbo].[RptBarActiveResale] Script Date: 07/21/2011 21:27:09 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[RptBarActiveResale]
(
@Region varchar(25) = NULL,
@County varchar(25) = NULL,
@Schooldistrict varchar(50) = NULL,
@Zip varchar(25) = NULL,
@DateTo smalldatetime
)

AS
SET NOCOUNT ON
CREATE TABLE #Report (Idx int identity(1,1), valuerange varchar(200), valuecount int)
declare @i integer
SET @i = 125

INSERT INTO #Report (valuerange) VALUES (’<100’)

WHIlE @i <= 550
BEGIN
INSERT INTO #Report (valuerange) VALUES (Convert(varchar, @i))
SET @i = @i + 25
END

INSERT INTO #Report (valuerange) VALUES (‘550+’)

UPDATE #Report Set ValueCount =
COALESCE((SELECT SUM(a.RecordCount)
FROM ActiveRngAggregates as a
WHERE a.ValueRange = #Report.valuerange
and (a.county like isnull(@County, a.county) + ‘%’)
and (a.mlssource like isnull(@Region, a.mlssource) + ‘%’)
and (a.schooldistrict like isnull(@Schooldistrict, a.schooldistrict) + ‘%’)
and (a.zip like isnull(@Zip, a.zip) + ‘%’)
and a.dateaddedmonth = month(@DateTo)
and a.dateaddedyear = year(@DateTo)
and a.new <> ‘Yes’
and a.status <> ‘Sold’) ,0)

SELECT * FROM #Report
DROP TABLE #Report[/spoiler]

What I’m looking for is to find out if this can even be done without loops, and if so how would I begin? and if loops are indeed required, is the code above the way to do it? I’m no sql (in general) dummy yet far from an expert, but this I just can’t quite get my head around.

Thanks!

Jeff

I probably have a similar amount of SQL expertise, so maybe there’s an obvious better way I don’t see. That said, I wouldn’t worry about the loop at all. I was expecting to see looping through table data and incrementing counts or some other insanity. But creating a list of 20 numbers? Adds so little to the execution time it’s not worth any amount of stress.

The only somewhat-sane alternative I have off the top of my head is creating a Ranges table that you join against and group on the range value to get a sum. It might be quicker then creating a temporary table and updating its counts (and gets rid of the coalesce). It might also be more of a headache to maintain. And now that I’ve thought about it for more than a second, a table that exists only for a single Crystal Reports graph? That is insane.

Mithras is right that the loop doesn’t hurt anything.

Some alternatives (these aren’t exact but you get the idea):
#1



select
 idx
,((idx-1)*25)+100 as value_range
into #temp
from (
    select 1 as idx union all
    select 2 union all
    ...
    select 18
) t1


#2
I have a table with digits 0 through 9 that I use to create enumerated lists



select
 (t1.num*10)+t2.num as idx
,( ((t1.num*10)+t2.num) -1)*25)+100 as value_range
into #temp
from nums t1
cross join nums t2
where (t1.num*10)+t2.num between 1 and 18


There’s no way I’m going to look through all that code to figure it out. I’m not paid to do that kind of thing.

Wait, yes I am.

But I would create a dummy or temp table that has all of the value ranges that you want, then update the valuecount to 0 for every row. Then left join it to your main table. It’ll hold that slot open for you.

Recursive CTE version:


WITH RECURSIVE a (range) AS (
     SELECT 100, 0 AS count UNION ALL
     SELECT range + 25, 0 FROM a WHERE range < 550 
) SELECT a.range, a.count + count(b.range) AS count FROM a 
LEFT JOIN  ActiveRngAggregates AS b ON a.range=b.range
GROUP BY a.range, a.count ORDER BY range;

This works in PostgreSQL, may need to use “WITH” rather than “WITH RECURSIVE” in MS-SQL, I’m not sure.

Missed the edit window but here’s one that has the Idx column as well:


WITH RECURSIVE a (idx) AS (
     SELECT 1, 100 as range, 0 AS count UNION ALL
     SELECT idx + 1, 100 + idx * 25, 0 FROM a WHERE idx < 19
) SELECT a.idx, a.range, a.count + count(b.range) AS count FROM a 
LEFT JOIN activeRngAggregates AS b ON a.range=b.range
GROUP BY a.idx, a.range, a.count ORDER BY range;

Thanks for the pointers, all!

The reason I don’t like the loop method I posted in my OP is that is seems way too limiting. One thing I should have mentioned is that “ActiveRngAggregates” in the stored procedure is the product of another stored procedure that pulls out Region, County, SchoolDistrict and ZipCode and adds aggregate data to make the loop in the SP I posted even faster. I’m not happy with that method at all. After all, what if I want a report that has Township instead of SchoolDistrict? Well, then we need a whole new aggregation table! Ugh.

So again, thank you all for the input! I think I like the CTE method, but only because I just yesterday learned they existed and really want to play with them more!

You can likely make a query with the CTE method that will remove the need for the aggregate table as well. Good luck, I only just learned about them as well, but they’re very handy! That way requires no temporary table either.

I’ve build similar queries in the past to create histograms. Basically I would create a reference table (REF) that looks something like this:

IDX (or display bucket) VALUERANGE RANGEMIN RANGEMAX
1 <100 0 100
2 125 100 125
3 150 125 150

And then you create a join to your data table

select r.IDX, r.VALURANGE, count (*) as VALUECOUNT
from DATA as d
inner join REF as r
on d.VALUE < r.RANGEMAX
and d.VALUE >= r.RANGEMIN
group by r.IDX, r.VALURANGE
order by 1

I would suggest this is the best and most flexible way to manage this, because if your range values change all you need to do is update the table, not re-write the query (which may be used in multiple places). Just a quick note, in SQL Server you can do a BETWEEN join, so for brevity, the above would become:


INNER JOIN Ref as r ON d.Value BETWEEN r.RangeMin AND r.RangeMax