Access Query Building help...

I don’t usually make a practice of this sort of thing, but I’ve been staring at this table for 2 days now trying to figure out how to reduce this data. So in desperation, I turn to the Dopers to lend me a hand. I know there’s probably a simple method of doing this, but at this piont I’m not seeing the forest for the trees, and could use another plan of attack.

I have a table that is a compilation of data from several spreadsheets. The key fields are Name ; Parent ; Phase ; and Checkpoint.

The checkpoint is the name of the file, that lets me know where exactly in the sequence of events that data came from. The phase is one of two values A or B. Phase A has over a dozen Checkpoints, and Phase B has about the same number.
What I’m trying to do is filter out duplicates for the Name and Parent fields. So there could be a Name that has one parent for Phase A, and another for Phase B. Or there could be a Name that went from one Parent to a different one, then back again, all in the same Phase, so the checkpoint name would be very important for that unit, so that I know when the changes took place.

I’ve filtered out a lot, by appending the data into a table structure that has Name, Parent, and Phase as primary key fields. But I’m still left with a lot of records that are duplicates. With over 100,000 records, going through them all by hand is not really my first option.

So, any suggestions for how to delete the duplicate records? Thanks in advance for any advice at all.

I’m not 100% sure that I understand your question, but the simplest way to de-dup records is to set up a query, pull down the fields that you need, then click on the little sumnation symbol. This will then add a row titles “Total:” down where you put your query criteria… set them all to “Group By” and run. You can make this a Make Table Query to save your de-duped table.

Alternatively you can make a query with the fields you need and then set the query property “Unique Values” to Yes (right click in a blank area in the pannel where the tables are to get a list of optoins, the bottom one should be “Properties”).

Here, lemme see if I can give a sample:
[li]Name-----------------Parent----------------Phase-------Checkpoint[/li][li]One--------------------Red--------------------A ------------Day1 1200 [/li][li]One--------------------Red--------------------B-------------Day 6 1300[/li][li]One--------------------Green-----------------B-------------Day 7 0200[/li][li]Two--------------------Yellow-----------------A-------------Day 2 1100[/li][li]Two--------------------Green-----------------A-------------Day 3 1200[/li][li]Two--------------------Green-----------------A-------------Day 4 1300[/li][li]Two--------------------Yellow-----------------A-------------Day 5 0200[/li][li]Two--------------------Yellow-----------------B-------------Day 6 1300[/li][li]Two--------------------Yellow-----------------B-------------Day 7 0200[/li][li]Three------------------Brown-----------------A-------------Day 1 1300[/li][li]Three------------------White------------------B------------Day 7 0200[/li][li]Three------------------Grey--------------------B------------Day 8 1300[/li][li]Four--------------------Black-------------------A------------Day 1 1200[/li][li]Four--------------------Blue--------------------A------------Day 3 1300[/li][/ul]

so as you can see, on Two, the parent changed, then changed back. So I’d need to record the checkpoint from the last point that it had the original parent, the checkpoint that it changed, and the checkpoint that it changed back. All the other checkpoints that it stayed the same, I don’t care about.

Sorry for the mess in the semi table there, but I don’t know how to do pre-formated text here.

zoid, the problem is that it’s not one field that’s unique, it’s a combination that needs to be unique. It could be that I have to do this in a multiple step query, or table…sigh…or do it by hand.

If I’m understanding you correctly, you consider a duplicate one with the same name, parent and phase regardless of the checkpoint?

I can see why this is driving you crazy. I don’t think I can do it in one step, but it would sure beat hand editing 100k lines of data.

Start by writing a query pulling down Name, Parent, Phase, and Checkpoint two times. Hit the sumnation button, and group by Name, Parent, and Phase. Set the first Checkpoint to Min, and the second one to Max. Make it a save table query, and you will then have a table with each individual value and the max and min dates for that value.

Next, make a copy of this table (there is probably an easier way, but this should work). Lets say you now have Temp and Temp 2 tables. Add a Yes/No field in Temp and call it something like Back.

Now, write a query that uses Temp and Temp 2 joined by Name. Pull down MinOfCheckpoint, MaxOfCheckpoint and Back from Temp. Set the Criteria for MinOfCheckpoint to <[Temp2]![MinOfCheckpoint] AND set the criteria for MaxOfCheckpoint to >[Temp2]![MaxOfCheckpoint]. Make this an update query and update the value for Back to TRUE.

What you will end up with is your Temp table with the Max and Min Dates and any that “wrap around” another value having their Back value set to TRUE. Finally, you can write a query pulling down Name, Parent, Phase, MaxOfCheckpoint, and Back from Temp. Set the Criteria for Back to =TRUE. Now, make it an Append Query and append those fields to Temp 2 EXCEPT append the MaxOfCheckpoint value into the MinOfCheckpoint field. This will give you an unique row for each change. Using your example you would end up with (just for the Two example) and using the MinOfCheckpoint for your new Checkpoint value.

Two/Yellow/A/Day 2
Two/Green/A/Day 3
Two/Yellow/A/Day 5
Two/Yellow/B/Day 6

This isn’t perfect, and I’m not sure how it would work if they bounce back and forth repetedly, but hopefully it will get you moving again.

Good Luck

Spud, I’ll give that a try, thanks…means I’ll have to go back to the original spreadsheets, and update the checkpoint name to something that I can do a min/max on…I oversimplified what the names actually are. Still, that’ll probably still be faster than going through this stuff one line at a time.

Sort of. A duplicate will have the same name, parent, and phase, but if the parent changed during the sequence of checkpoints, I need to know when it changed, and then changed back. If it did. I’m fairly sure none of them changed more than twice. That is to a new parent, then back to the original.

Ok, I just thought of a much easier way (assuming your checkpoint can be simplified).

Simply add a field called “Change” to your table (yes/no field) and set all initial values to TRUE. Now, write a query pulling in all fields from your original table, and create a new field called Next Day, and have its value set to Checkpoint + 1 (this will require the simplification of your checkpoint data). Make this a Make Table Query.

Write a new query using your original table and this new one. Join Name to Name, Parent to Parent, and Phase to Phase. Also join Checkpoint to Next Day. Bring down the Change field from your original table, and make it an update query… update Change to FALSE.

Now you should be able to query against your table and pull any rows with Change = TRUE. This way it can bounce back and forth all it likes and you will just pull records that have changed since the previous day.

Good Luck