I need a little help with a data file I’ve reccieved - the problem is that it is not normalized. The original file has several values in one field and looks like this:
Field1 Field2
1 A, B, C
2 A, C
3 C, D, E
I need it normalized to look like:
Field1 Field2
1 A
1 B
1 C
2 A
2 C
3 C
3 D
3 E
Obviously this it is not practicle to do this manually for a very large amount of data. So is there a way to automate this conversion? I’m willing to entertain any creative solutions you dopers can come up with.
This may be handled best through excel actually, or at least that’s how I would handle it. Otherwise you can create a new table in access and then write VBA within a button on a new form. The code would essentially read the data from the original table and sort it properly out in your new table.
For using Excel, you can easily export the access table to excel. Then I would write a simple macro to arrange the data properly. After the macro is complete, copy the data back into your access table.
Do you have any experience writing macros or with VBA?
If you’re doing it in VBA, what you need to do is first strip any extraneous elements (commas, spaces) from the field data, then iterate through the remaining contents, treating it as a string (i.e. For X= 1 to Len(string)), then write a record to a new table for every character in the field in the old one.
Assuming that you have already created an empty Table2 of the desired structure, the following code will cope with any variety of values in field2
Private Sub Command1_Click()
Dim db As Database
Dim rst1 As Recordset
Dim rst2 As Recordset
Dim strtemp As String
Dim IntLoop As Integer
Set db = CurrentDb()
Set rst1 = db.OpenRecordset("SELECT Table1.* FROM Table1;", dbOpenSnapshot)
Set rst2 = db.OpenRecordset("SELECT Table2.* FROM Table2;", dbOpenDynaset)
With rst1
.MoveFirst
Do Until .EOF
strtemp = rst1.Fields!field2.Value
strtemp = AllowOnly(strtemp, "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
For IntLoop = 1 To Len(strtemp)
rst2.AddNew
rst2.Fields!field1 = .Fields!field1
rst2.Fields!field2 = Mid(strtemp, IntLoop, 1)
rst2.Update
Next IntLoop
.MoveNext
Loop
End With
End Sub
Public Function AllowOnly(yourstring As String, Allowed As String)
If Allowed = "" Then Allowed = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Dim IntLoop As Integer
For IntLoop = 1 To Len(yourstring)
If InStr(Allowed, Mid(yourstring, IntLoop, 1)) > 0 Then
AllowOnly = AllowOnly & Mid(yourstring, IntLoop, 1)
End If
Next IntLoop
End Function
That was fun - my VBA is exceedingly rusty though - there’s probably a fair bit of bad coding in there.
I’d use various string editors to search through the string for the common separator. You can set up a loop that scrolls through each character to find the “,” and ends at Len(string) which is the length of the string.
Here’s a little bit of what I would use if each component separated by a “,” is only one character:
counter1 = 1
counter2 = 1
counter3 = 1
Do
If Range(“B” & counter1) = “” then exit Do
x = Range(“B” & counter1)
Do
If counter2 = Len(x) then exit Do
If Right$(Left$(x, counter2), 1) = “,” then
y = Right$(Left$(x, counter2 - 1), 1)
Range(“D”, counter3) = Range(“A” & counter1)
Range(“E”, counter3) = y
counter3 = counter3 + 1
End if
counter2 = counter2 + 1
Loop
counter1 = counter1 + 1
Loop
Now things would get a little more complicated if the number of characters for each component varies.
**Note: Obviously you have to declare the variables above the code.
BTW, I forgot to mention it, but the above code would distribute the data you want into columns D and E from columns A and B. You can easily alter that as necessary and even put the new data onto a separate worksheet.
I would do it in Excel without macros or VBA or any of that via the following method:
First, use “text-to-columns” and specify comma-delimited to separate the values in Field2. Here is what it would look like after this operation:
Field1 Field2 Field3 Field4
1 A B C
2 A C
3 C D E
Then, sort on Field3 to bring all the values to the top (assuming some are blank). Now just cut and paste the Field3 values at the end of the Field2 values. Also, you need to cut and paste the corresponding Field1 values to the end of Field1.
Field1 Field2 Field3 Field4
1 A C
2 A
3 C E
1 B
2 C
3 D
There, you’ve just effectively normalized the Field3 values.
Now, just repeat the process for Field4, using creative cut and paste.
Field1 Field2 Field3 Field4
1 A
2 A
3 C
1 B
2 C
3 D
1 C
3 E
Now, just sort on Field1-Field2 and there you are. No VBA required.
OK, here’s another somewhat off-the-wall approach; again, assuming you have prepared a table2 for the results, and that you know all the possible comma-separated values that might appear in Field2 (put them in strScope).
Private Sub Command0_Click()
Dim db As Database
Dim strSQL As String
Dim strScope As String
Dim intLoop As Integer
Set db = CurrentDb()
strScope = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
For intLoop = 1 To Len(strScope)
strSQL = ""
strSQL = "INSERT INTO Table2 ( Field1, Field2 ) SELECT Table1.Field1, '"
strSQL = strSQL & Mid(strScope, intLoop, 1)
strSQL = strSQL & "' AS Expr1 FROM Table1 WHERE (((Table1.Field2) Like '*"
strSQL = strSQL & Mid(strScope, intLoop, 1)
strSQL = strSQL & "*'));"
db.Execute (strSQL)
Next intLoop
End Sub
It builds and executes a series of append queries.