SQL Question

MrWhy · June 4, 2002, 10:37pm

I don’t know how many SQL gurus there are here so if anyone doesn’t know the answer but knows a site where I can post SQL questions, that would be appreciated too.

Anyway, the question:

I have a table with a few hundred thousand rows, each of which has an identifier of C1, C2 etc up to C21. The identifier is known as the segment.

To illustrate, if I execute

SELECT segment, count (*)
FROM my_table
GROUP BY segment

I get:

What I need to do is set 5% of each of the segments as a control. ie I want to set the “segment” column to be set to ‘CTRL’ for 5% of each segment.

Does anyone know a way to do this with a single command? It can be as complex as necessary, but must be a single statement.

Thanks in advance for any help.

ratatoskK · June 5, 2002, 12:00am

Is there some other column that you could use in the Where clause that has a regular value range, like between 1 and 100 evenly spaced, then you could say Where columnX < 20.

Or use a stored procedure.

ratatoskK · June 5, 2002, 12:24am

Typo: I meant < 5.

Whenever I have to do something like this, I write something quick in whatever language I’m developing on, and use a loop and counter.

MrWhy · June 5, 2002, 12:40am

Unfortunately there is no column like that.

I can’t do that either. The data is on a system to which I have no access except trough some archaic third party software which will only let me send one statement at a time. And I can’t load up stored procedures on the remote system.

Yeah, I’d happily use PL/SQL, but I can’t send it to the remote system. Things would be so much easier if they were reasonable about the access privileges around here. :mad:

Amok · June 5, 2002, 1:10am

If they won’t let you use a stored procedure, could it be done by writing an external program that makes ODBC calls to the database? That would probably be a big pain in the ass, but I can’t think of any way to do this using just a single SQL statement.

The only way I’m aware of to return a random selection of rows that is some percentage of your whole is with the SAMPLE clause. For instance:

SELECT *
FROM my_table SAMPLE (5)
WHERE segment = ‘C1’;

will give you 5% of the rows that have an identifier of ‘C1’. However, SAMPLE is pretty picky (for reasons I don’t really understand) about where it can be used. I don’t think you can ever use it in the WHERE clause of an UPDATE statement, as that would involve joining the table you want to update with some other table (even if it’s just a different version of the same table). Oracle says (and I think this is true of any SQL database):

Even if each row had a primary key that you could reference, a statement like:

UPDATE my_table mt1
SET mt1.segment = ‘CTRL’
WHERE mt1.id IN (SELECT m2.id
FROM my_table mt2 SAMPLE (5)
WHERE mt2.segment = ‘C1’);

wouldn’t work. Running it produces a syntax error.

Wait a minute, if I’m reading you right, you can issue multiple statements, it’s just that it has to be one at a time. If each row does have an unique identifer, then I think something like this would work:

CREATE TABLE my_new_table
AS SELECT *
FROM my_table SAMPLE (5)
WHERE segment = ‘C1’;

UPDATE my_table mt
SET mt.segment = ‘CTRL’
WHERE mt.id IN (SELECT mnt.id
from my_new_table);

DROP my_new_table;

Of course, that only works if there’s some column in the table that you can use to join it with a table containing a random subset. If that’s not the case, and you can’t add a column like that, then I’m out of ideas.

aramis · June 5, 2002, 4:16am

How important is it that you get exactly 20% for each set? Could you tolerate a little variation?

If you can, then you might try a modulus function on some nearly random column, i.e. UPDATE TBL SET COL = ‘CTRL’ WHERE (OTHERCOL MOD 5) = 0. I’m not sure of the modulus operator but that’s the basic idea.

Topic		Replies	Views
SQL Help! Factual Questions	2	963	June 5, 2014
Anyone up for some SQL Server query fun? Sure you are! In My Humble Opinion	9	1846	July 23, 2011
Help needed with a Oracle SQL query question. Factual Questions	23	1326	April 25, 2007
SQL Server - column names as variables? Factual Questions	9	2321	August 28, 2009
Quick SQL question that I should know the answer to Factual Questions	6	708	May 20, 2006

SQL Question

Related topics