Help Needed - Normalizing data for an Access Database

I need a little help with a data file I’ve reccieved - the problem is that it is not normalized. The original file has several values in one field and looks like this:



Field1	Field2
1	A, B, C
2	A, C
3	C, D, E


I need it normalized to look like:



Field1	Field2
1	A
1	B
1	C
2	A
2	C
3	C
3	D
3	E


Obviously this it is not practicle to do this manually for a very large amount of data. So is there a way to automate this conversion? I’m willing to entertain any creative solutions you dopers can come up with.

As always - thanks in advance! :wink:

This may be handled best through excel actually, or at least that’s how I would handle it. Otherwise you can create a new table in access and then write VBA within a button on a new form. The code would essentially read the data from the original table and sort it properly out in your new table.

For using Excel, you can easily export the access table to excel. Then I would write a simple macro to arrange the data properly. After the macro is complete, copy the data back into your access table.

Do you have any experience writing macros or with VBA?

I know a little VBA (plus I have some good books and could probably fumble through it) - no real experience with Excell macros.

I’m struggling with the logic though - how would you code something like this when there could be any number of values in the second field?

Run these three queries (in the sequence given, assuming here that your source table is called Table1 and you want the data rearranged into Table2):


SELECT Table1.Field1, "A" AS Field2 INTO Table2
FROM Table1
WHERE (((Table1.Field2) Like "*A*"));


INSERT INTO Table2 ( Field1, Field2 )
SELECT Table1.Field1, "B" AS Expr1
FROM Table1
WHERE (((Table1.Field2) Like "*B*"));


INSERT INTO Table2 ( Field1, Field2 )
SELECT Table1.Field1, "C" AS Expr1
FROM Table1
WHERE (((Table1.Field2) Like "*C*"));

(if your table actually contains such values as “D, E” and so on, you’ll need some more queries in the pattern of the second two).

This is a really dirty and awful way to do it; I’d normally recommend doing it in VBA, but that is more convoluted to explain.

If you’re doing it in VBA, what you need to do is first strip any extraneous elements (commas, spaces) from the field data, then iterate through the remaining contents, treating it as a string (i.e. For X= 1 to Len(string)), then write a record to a new table for every character in the field in the old one.

Assuming that you have already created an empty Table2 of the desired structure, the following code will cope with any variety of values in field2


Private Sub Command1_Click()
Dim db As Database
Dim rst1 As Recordset
Dim rst2 As Recordset
Dim strtemp As String
Dim IntLoop As Integer
Set db = CurrentDb()
Set rst1 = db.OpenRecordset("SELECT Table1.* FROM Table1;", dbOpenSnapshot)
Set rst2 = db.OpenRecordset("SELECT Table2.* FROM Table2;", dbOpenDynaset)

With rst1
.MoveFirst
Do Until .EOF     
        strtemp = rst1.Fields!field2.Value
        strtemp = AllowOnly(strtemp, "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
        For IntLoop = 1 To Len(strtemp)
            rst2.AddNew
            rst2.Fields!field1 = .Fields!field1
            rst2.Fields!field2 = Mid(strtemp, IntLoop, 1)
            rst2.Update
        Next IntLoop
        .MoveNext
Loop
End With

End Sub
Public Function AllowOnly(yourstring As String, Allowed As String)
If Allowed = "" Then Allowed = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Dim IntLoop As Integer
For IntLoop = 1 To Len(yourstring)
    If InStr(Allowed, Mid(yourstring, IntLoop, 1)) > 0 Then
        AllowOnly = AllowOnly & Mid(yourstring, IntLoop, 1)
    End If
Next IntLoop
End Function


That was fun - my VBA is exceedingly rusty though - there’s probably a fair bit of bad coding in there.

I’d use various string editors to search through the string for the common separator. You can set up a loop that scrolls through each character to find the “,” and ends at Len(string) which is the length of the string.
Here’s a little bit of what I would use if each component separated by a “,” is only one character:

counter1 = 1
counter2 = 1
counter3 = 1
Do
If Range(“B” & counter1) = “” then exit Do
x = Range(“B” & counter1)
Do
If counter2 = Len(x) then exit Do
If Right$(Left$(x, counter2), 1) = “,” then
y = Right$(Left$(x, counter2 - 1), 1)
Range(“D”, counter3) = Range(“A” & counter1)
Range(“E”, counter3) = y
counter3 = counter3 + 1
End if
counter2 = counter2 + 1
Loop
counter1 = counter1 + 1
Loop

Now things would get a little more complicated if the number of characters for each component varies.

**Note: Obviously you have to declare the variables above the code.

Hope this helps.

BTW, I forgot to mention it, but the above code would distribute the data you want into columns D and E from columns A and B. You can easily alter that as necessary and even put the new data onto a separate worksheet.

I would do it in Excel without macros or VBA or any of that via the following method:

First, use “text-to-columns” and specify comma-delimited to separate the values in Field2. Here is what it would look like after this operation:



Field1  Field2  Field3  Field4
1       A       B       C
2       A       C
3       C       D       E

Then, sort on Field3 to bring all the values to the top (assuming some are blank). Now just cut and paste the Field3 values at the end of the Field2 values. Also, you need to cut and paste the corresponding Field1 values to the end of Field1.



Field1  Field2  Field3  Field4
1       A               C
2       A           
3       C               E
1       B
2       C
3       D

There, you’ve just effectively normalized the Field3 values.

Now, just repeat the process for Field4, using creative cut and paste.



Field1  Field2  Field3  Field4
1       A                    
2       A           
3       C                    
1       B
2       C
3       D
1       C
3       E

Now, just sort on Field1-Field2 and there you are. No VBA required.

Thanks for the advice!

OK, here’s another somewhat off-the-wall approach; again, assuming you have prepared a table2 for the results, and that you know all the possible comma-separated values that might appear in Field2 (put them in strScope).


Private Sub Command0_Click()
Dim db As Database
Dim strSQL As String
Dim strScope As String
Dim intLoop As Integer

Set db = CurrentDb()

strScope = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

For intLoop = 1 To Len(strScope)

    strSQL = ""
    strSQL = "INSERT INTO Table2 ( Field1, Field2 ) SELECT Table1.Field1, '"
    strSQL = strSQL & Mid(strScope, intLoop, 1)
    strSQL = strSQL & "' AS Expr1 FROM Table1 WHERE (((Table1.Field2) Like '*"
    strSQL = strSQL & Mid(strScope, intLoop, 1)
    strSQL = strSQL & "*'));"
    
    db.Execute (strSQL)
    
    
Next intLoop

End Sub


It builds and executes a series of append queries.