help debug my python conversion tool (QIF to IIF)

I’ve been tasked with converting a year-long record of business transactions from Gnucash to Quickbooks. Google helped me determine that the IIF format will be importable by Quickbooks, and I’ve already managed to dump the Gnucash data into Quicken Interchange Format. So now it’s a simple matter of parsing through this text file and reading off the details of each transaction.

I figured out a workable conversion program when there aren’t any splits in the transaction. However, my attempt to loop through the splits and faithfully copy the data into the IIF output file is broken. I’m hoping some python expert will be able to pinpoint the bug easily.

Here’s a sample transaction block from the QIF input file.


D03/03/2010
N6381
U-409.09
T-409.09
P51.25 cl * 10
LCost of Goods Sold:Labor:Unnamed Employee
SCost of Goods Sold:Labor:Unnamed Employee
$-512.50
SExpenses:Taxes:FICA Payable
$39.20
SExpenses:Taxes:Federal Payroll Tax Payable
$29.00
SExpenses:Taxes:State Payroll Tax Payable
$35.21
^

Here’s the function in my python program that reads such a block. It’s more wasteful of memory than reading line-by-line and defining the relevant variables based on the first character in the current line. I tried that method too, but I still ran into a problem with extracting the splits.


def qifreadblock():
    keys = []
    values = []
    v = (0000,"",0.,"","","N")
    (date,num,amt,payee,memo,cleared) = v   # now at least everything we want to return is initialized
    while 1:
        line = qif.readline()  # gulp on a line of the file with handle 'qif'
        if line == "^
" or line == "
" :
            break  # don't want to chew on more than one transaction at a time
        elif line == "" :
            raise IOError('EOF')  # tell the main calling program that we've reached the end of the file
        else :
            keys.append(line[0])
            values.append(re.sub('
','',line[1:]))  # break the current line in two and append to the lists
    while "D" in keys :  # extract the date and shorten the two lists
        date = values[keys.index("D")]
        del values[keys.index("D")]
        keys.remove("D")
    while "N" in keys :  # extract the reference number and shorten the two lists
        num = values[keys.index("N")]
        del values[keys.index("N")]
        keys.remove("N")
    while "T" in keys :  # extract the transaction amount and shorten the two lists
        amt = values[keys.index("T")]
        del values[keys.index("T")]
        keys.remove("T")
    while "L" in keys :  # extract the third party name and shorten the two lists
        payee = values[keys.index("L")]
        del values[keys.index("L")]
        keys.remove("L")
    while "P" in keys :  # extract the memo line and shorten the two lists
        memo = values[keys.index("P")]
        del values[keys.index("P")]
        keys.remove("P")
    while "C" in keys :  # extract the cleared status and shorten the two lists
        cleared = re.sub(r'*','Y',values[keys.index("C")])
        del values[keys.index("C")]
        keys.remove("C")
    splits = []
    while "S" in keys:  # go through every explicitly-named split and find out where the money went
        Iacct = keys.index("S")
        Iamt = keys.index("$")
        try :
            Imemo = keys.index("E")
            Smemo = values[Imemo]
            del values[Imemo]
        except ValueError : Smemo = '' # it's not often a split will have its own memo, you know.
        acctree = values[Iacct].split(":") # taking apart the branches of the account hierarchy
        Sacct = acctree[-1]  # only the last branch is relevant to Quickbooks
        sli = [Sacct,values[Iamt],Smemo]
        del values[Iacct]
        del values[Iamt]
        del keys[Iacct]
        del keys[Iamt]
        splits.append(sli)  # add the just-defined split to the master list for this transaction

    if len(splits) == 0:  # even if no splits were explicitly defined, the IIF file still needs one.
        sli = [payee,-float(amt),'']
        splits = [sli]

    # debug time !
    print splits

    return [date,num,amt,payee,memo,cleared,splits]


I know the error lies somewhere in this function, because the printout of the list of splits gives me


converting block 1 ...
[['Unnamed Employee', '-512.50', ''], ['Federal Payroll Tax Payable',
'-512.50', ''], ['State Payroll Tax Payable', '39.20', '']]

which is clearly wrong. I think I’m misunderstanding something fundamental about how the ‘while’ code block is executed. Are all the enclosed commands attempted before the condition is tested again? Or is the condition tested after each command? If the latter, I would appreciate some helpful suggestions for refactoring the code so that it does what I want.

I think I know the problem. You seem to understand while loops. The condition is tested once before the loop starts and then again at the end of the list of commands, to see if it goes through another time.

The problem seems to be with the way you are deleting items from the keys and values lists.

For each split, it runs


Iacct = keys.index("S")
Iamt = keys.index("$")


This gets the line numbers for the “S” line, and the corresponding “$” line.

You get the relevant details, and then delete the items from keys and values. There is a problem with both, but consider values first:


del values[Iacct]
del values[Iamt]

The first command is fine, deleting the “S” line from the values. However, this means that the value Iamt is now wrong. It is a line number*, but it is one on from where it should be. Suppose Iacct and Iamt are 6 and 7. The “" starts as line number 7, but when you delete Iacct, the elements are all moved up one, so the "” is now element 6, and element 7 is now the second “S” line. This means that the second command (“del values[Iamt”) deletes the next “S” line, rather than the “$” line it should.

This applies equivalently to keys.

This then causes problems with the second “S” line, which is now SExpenses:Taxes:Federal Payroll Tax Payable", having deleted the “SExpenses:Taxes:FICA Payable” line. Also the “$” line is repeated, as it wasn’t deleted previously. The other errors follow from this.

To solve this, I would replace this block:


del values[Iacct]
del values[Iamt]
del keys[Iacct]
del keys[Iamt]

with
del values[Iacct]
del values[keys.index("")] del keys[Iacct] del keys[keys.index("")]

This will mean that the second set of deletions use the updated index for the “$” lines. Give it a go, and see if it works.

The lesson here is to be careful with indexes in lists. If you delete an element of a list, then the following elements all move up one place to fill the gap. This little example might be worth looking at:


a = [1,2,"right","wrong"]
n = 2
print a[n]
del a[0]
print a[n]

It should say “right” then “wrong”.

  • Actually the index of an element in the list “values”.

[QUOTE=silverfish]
The lesson here is to be careful with indexes in lists. If you delete an element of a list, then the following elements all move up one place to fill the gap.
[/quote]

Coming from a FORTRAN background, it’s a bit disconcerting that using a mutable object (in this case a list) to assign an integer value (here a list index) to a new variable makes that variable sensitive to subsequent changes in the original mutable object. More specifically, I didn’t expect Iacct or Iamt to change when I deleted the referenced elements in the list of keys.

Another workaround besides silverfish’s solution might be to copy the lists of keys and values to temporary variables, using these temporary variables (whose elements are now unchanging) to extract the transaction details, and deleting the elements in the original variables so that the ‘while’ block eventually terminates. This method has the potential to generate twice as much debugging work, so I went ahead and tried silverfish’s suggestion. The resulting function, quoted below, works just as I had hoped. Thanks silverfish!


def qifreadblock():
    keys = []
    values = []
    v = (0000,"",0.,"","","N")
    (date,num,amt,payee,memo,cleared) = v
    while 1:
        line = qif.readline()
        if line == "^
" or line == "
" :
            break
        elif line == "" :
            raise IOError('EOF')
        else :
            keys.append(line[0])
            values.append(re.sub('
','',line[1:]))
    date = values[keys.index("D")]
    if "N" in keys :
        num = values[keys.index("N")]
    if "T" in keys :
        amt = values[keys.index("T")]
    if "L" in keys :
        payee = values[keys.index("L")]
    if "P" in keys :
        memo = values[keys.index("P")]
    if "C" in keys :
         cleared = re.sub(r'*','Y',values[keys.index("C")])
    splits = []
    while "S" in keys and "$" in keys:
        Iacct = keys.index("S")
        Iamt = keys.index("$")
        try :
            Imemo = keys.index("E")
            Smemo = values[Imemo]
            del values[Imemo]
        except ValueError : Smemo = ''
        acctree = values[Iacct].split(":")
        Sacct = acctree[-1]
        sli = [Sacct,values[Iamt],Smemo]
        del values[Iacct]
        del values[keys.index("S")]
        del keys[Iacct]
        del keys[keys.index("$")]
        splits.append(sli)

    if len(splits) == 0:
        sli = [payee,-float(amt),'']
        splits = [sli]

    # debug time !
    print splits

    return [date,num,amt,payee,memo,cleared,splits]


I’m glad you got the program working. I should clarify that Iacct or Iamt don’t change in values when you delete elements of the lists they refer to, but which element values[Iamt] and similar refer to.

I would generally avoid deleting elements of lists you are looping through if possible, to avoid this sort of situation. One alternative might be to re-label the keys, rather than deleting them. If you replace the 4 deletion lines with:



keys[Iacct] = "X"
keys[Iamt] = "X"


Then, each split will be detected once, then the relevant lines have the keys turned to “X”, so they won’t get detected again.

Another alternative is to replace the deletion lines with this:


keys = keys[Iamt+1:] 
values = values[Iamt+1:]

The idea here is to chop off the start of the lists, including the split you have just processed. The Iacct and Iamt now don’t get used until the next time around the loop, when they have been updated.