I don’t usually make a practice of this sort of thing, but I’ve been staring at this table for 2 days now trying to figure out how to reduce this data. So in desperation, I turn to the Dopers to lend me a hand. I know there’s probably a simple method of doing this, but at this piont I’m not seeing the forest for the trees, and could use another plan of attack.
I have a table that is a compilation of data from several spreadsheets. The key fields are Name ; Parent ; Phase ; and Checkpoint.
The checkpoint is the name of the file, that lets me know where exactly in the sequence of events that data came from. The phase is one of two values A or B. Phase A has over a dozen Checkpoints, and Phase B has about the same number.
What I’m trying to do is filter out duplicates for the Name and Parent fields. So there could be a Name that has one parent for Phase A, and another for Phase B. Or there could be a Name that went from one Parent to a different one, then back again, all in the same Phase, so the checkpoint name would be very important for that unit, so that I know when the changes took place.
I’ve filtered out a lot, by appending the data into a table structure that has Name, Parent, and Phase as primary key fields. But I’m still left with a lot of records that are duplicates. With over 100,000 records, going through them all by hand is not really my first option.
So, any suggestions for how to delete the duplicate records? Thanks in advance for any advice at all.