Basic SQL--Calculating a maximum (with a twist)

My dataset looks like the following:



                                        Hp1       Hp2       Hp3
                                     
                                     0.372815  0.372815  0.372815
                                     0.777116  0.777116         0
                                            0   0.75623   0.75623
                                     0.392454         0  0.392454
                                     0.801172         0         0
                                            0  0.999872         0
                                            0         0  0.780745


What I’d liked to do is find the maximum of each column. Simple, right? Well, the problem is, I will be working with datasets that have varying number of columns. I want to be able to tell SQL to select all of the columns (however many there are), and compute the maximum of each column.

What I’m trying to avoid is having to alter the SELECT statement each time I change the number of columns that will be in the dataset.

Is this possible?

I only know very basic SQL (I didn’t know you could have a field with separate columns!) - but could you do it in PHP? Basically do a preliminary query to find the number of columns and then dynamically create the real query from the result?

If you’re not regularly adding tables or adding columns to existing tables you could just run the PHP once and store the queries for running in the future so there would be no real overhead.

I really don’t know much about SQL (I just learned about using it in SAS tonight!), and I don’t know PHP at all. Is it a steep learning curve?

There’s no way to do it in standard SQL, but if your database product supports Information Schema or another metadata standard then you can query that in a stored procedure or subquery to get all the column names.

But don’t do that. This sounds like a normalization problem. In a properly designed relational data set you should never have variable arity amongst tuples in the same relation. So maybe you need to perform some sort of transformation on these data beforehand to make them suitable for a relational model. Or maybe a relational database is not the appropriate tool for this problem.

Can you munge the data into a single two column table like so?


Col1		Col2                                     
0.372815 	Hp1
0.372815 	Hp1
0.372815 	Hp1
0.777116 	Hp2
0.777116 	Hp2
0		Hp1
0		Hp3
0.75623		Hp3
0.75623		Hp2	
0.392454 	Hp2
0		Hp1
0.392454 	Hp1
0.801172 	Hp2
0		Hp3
0		Hp3
0		Hp2
0.999872 	Hp2
0		Hp1
0		Hp2
00.780745	Hp3

It then becomes trivial using max() and group by.

How come you’re not doing it using SAS?

That’s where I started from, but when Googling “find maximum of a column in SAS” I kept finding references to SQL. The MAX function in the SAS language finds the maximum across observations, i.e., for across variables. I wanted to find the maximum within a variable. SQL kept popping up when I searched for help on the problem. And so far, it does work, but only in the case where I’m hardcoding the number of columns.

This problem is a subtask in a much larger statistical simulation program I’m writing in SAS. It has to do with specialized hypothesis testing for multiple comparisons using p-values. The reason for the variable number of columns is that one of the parameters I’d like to allow to vary is the number of null p-values using a macro variable rather than having to recode everything.

**Small Clanger, **that might do the job. I’ll try it!

Well it’s absolutely possible using SAS, you don’t need SQL. Like **Small Clanger **indicated, just use proc transpose and then the last. function.

Migrating from VB I found PHP reasonably easy to use, although I suspect migrating from C or Java would be easier (bloody semicolon terminators!). No idea how much easier it is in SAS but it sounds like you’re familiar with that so its probably a better bet than PHP. Good luck!