Actually, it is probably my explaination of the question that is complicated, not the solution
Let me try and explain what I need to do.
I pull out results from a database. I am not allowed to touch the database myself (or this would be easy) but have to use a special (crap) interface. This will give me 3 comma separated files.
What I want to do is:
From each of those files take a certain number of randomly chosen records, say 300 from the first, 200 from the second, 100 from the third. I want to output them to a new file to give to the client.
The next time I do this procedure I want to remove previously chosen records from the selection, I assume the easiest way to do this is to say “remove from file1 any records that exist in file2”. I am guessing that part is easy enough to do, I just don’t know how.
The client wants csv’s but I rekon it is easiest to do this with excel (access was bitching about some of the cell contents, and I daren’t mess with the data). I could put my 3 csvs into a sheet each in an excel workbook and then have excel create a new sheet with my compiled results.
Can anyone work out what the hell I am talking about? Can anyone help me?
I don’t have excel to hand, but my guess would be:
Open the CSV files in Excel: hopefully it will convert them. Otherwise look for an import option.
Add another column each cell containing =rand() (I think this is the correct function; check the help file.)
Then you have a random number associeated with each row. Somewhere there should be a sort menu option (data?) - sort on this column.
Cut and paste the top 100 cells to another sheet. Export that as CSV. Save the first one, and repeat on that to get another 100 cells.
Do they have to be actually random, or just not have to be in order? If you just want any 300, ignore the rand steps.
As a matter of curiosity, why do they want this?
Oh, and the other option would be to import the CSVs back into another database, if you’d prefer that.
Doesn’t need to be truely random, just not in order. The =rand() and taking the top X amount should be perfect. Thanks!
Any ideas on how to remove them on the next months sort? That is, that I say remove from the new list the ones I took out last time and kept in another file for this purpose…
As to the why, I realise when you ask that my post makes me sound like a mad spammer. Don’t worry I am not It is to do with taking selections monthly from a changing (growing) pool of items, if that makes any sense.
So what do you want to do next month? Send 100 different records? But next month’s version of them? Ah, right. (BTW - will you be getting new records, or will you eventally resend some, or will you stop, or what?)
If so, umm, well, it’d be easy in access (“SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id” where table2 is the previously sent records, and id is a unique identifier. Oh, and you’d have to append the query to table2… umm, hang on, let me think)
In Excel? I’m not sure. There should be some sort of wizard that’d select everything not in another sheet, but I’ve no idea what.
Oh, or you could use a lookup function to check if the record is in the other sheet. That is, a formula in the last column of sheet 1 that 'lookup’s on the id column of sheet 2, returning yes or no, and then sort by that first, and rand second. Sorry I can’t be more help, I don’t have an excel help file here.
So you want to wait until next tie to remove them? Why? I’d suggest you output two files: one with the chosen records, and one without. Then, next time you do this load the second output file. This way you’ll end up with a back at each step.
On the rand() issue, I see possible problems in Shade’s suggestion. There are actions in excel that reset the function rand(), (sorting is one of them) and you don’t want to be worrying about that. Create a new row, assign one of the cells the formula rand(), and copy the value to each row. Note that I said “value”: you need to be sure not to copy the formula. Copying the value should reset rand(). Or create two new coumns. Give the first the formula rand(), and copy the value of the first to the second (again, note that I said value).
Use the Tools->Data Analysis->Sampling menu option. If you don’t have data analysis installed, install it. This will work easier than trying to use the rand() function.
Will your data be changing between times you need to do this? If not, you can simply toss the records you’ve selected and start over next time.
If your data does change, I would suggest you keep a record of which records you’ve previously selected in a separate spreadsheet. The next time you need to do this, you will have two spreadsheets: 1 will have the new data and 2 will have the record of data already output.
Use VLOOKUP to mark the records in 1 that appear in 2. Delete those records and start over with the Tools->Data Analysis->Sampling method mentioned above.
Confusing? I hope not. I’ll give more details if you need them.
Are you sure you can’t do this in Access? It can be done in Excel (including removing last months picks) but it would be a whole hell of a lot easier in a database. What data doesn’t Access like? If it seems to be screwing up your data make sure you set each field to Text (or Memo for long strings of text > 255) when importing or linking your csv files. Other than binary data I haven’t seen anything these two field types can’t handle.
If you like I could come up with a VB macro that would automate all of this down to a single click and prompt you for number of rows desired from each file. Are your input files always named the same thing? Maybe give us a few sample rows as well?
As someone already pointed out, eventually you will run out of records unless more than what you sample are added each month. Is that the case?