Extract... data... Gnnnn... life... force... draining... away...

My mild rant of the month is about this bloody accounting/stock control system from which I’m trying to extract data to transfer across to an entirely new system (that I had a hand in developing, I might add)

Anyway, this old software (which I won’t name, in case it causes trouble) cost the company tens of thousands of pounds to install and it costs three grand a year to license; enough was enough! - it only has to cope with about 10,000 products and half that many customers.

So I got the job of trying to extract the customer and product data to pile into the new system.

I. Have. Never. Seen. Anything. Like. It.

The database is a fucking complete mess, there is no other way to describe it; the customer account information is spread across four or five tables, sort of like a relational model, but not, as there are the same number of records in each table - the relationship is one-to-one… WHY?!? The field naming is inconsistent, there is duplication and redundant data everywhere.
As far as the products data goes, again, there are multiple narrow tables with a one-to-one relationship and the bloody PRICING structure is unbelievable; as far as I have been able to work out so far, it is like this:

There are ten selling price value fields. (numbered 1 to 10). There are ten currency legend fields, if the first of these contains a $ sign, it means the first of the value fields is a dollar value, no, wait, it actually means look at the first of ANOTHER set of ten fields and get the price from there.
If the second legend field contains a £ symbol, then it means the second of the currency fields is pounds Sterling, oh and also if the currency legend field is blank it means Sterling too, unless the value field is zero, in which case it means that there isn’t a price for that currency (not that the price is zero). The order of the currencies in the ten fields is completely free-form, so you can have dollars, pounds, Euros in one record and Euros, dollars, nothing in the next.
Suffice to say that I don’t think I’d have done it that way.

What a lame rant, sorry, I just had to vent.

As a DBA I can feel your pain. It is amazing how badly people can construct a database data-dictionary and still end up with a system that seems to work. Then years later some poor sod has to come allong and untangle the mess. I wish you well.
Cheers, Bippy

(I doubt I can help, but my first thought on your problem is to create a set of well organised views of the tables in the legacy database, and when you are happy that they collect the legacy data into a set of properly designed table-views then copy the views data into actual tables on your db)

Oh my. I’m taking my first introductory GIS class and I’m lost just dealing with the little tiny databases in ArcMap. The OP is definitely not a lame rant.

Lots of sympathy to you.

I added some spaces to your title so it won’t mess up the index page.

Carry on.

Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.


[sub]Paid for by the Bib For President 2006 campaign.[/sub]

As a recent CS graduate, I swear that if called upon to create a database, it will be in at least 3rd Normal form if not BCNF, and (at design time at least), contain no internal oddities.

I feel for you; may your queries defeat this hideousness you have just described!

An admirable goal, Nanoda. Unfortunately it seems to me that that vast majority of people tasked with creating a database don’t even know WTF a Normal Form is. My current place of employment asked me if I knew what it was at my interview. It made me glad that I would be working with compotent people.

Annoying, but easy enough to concatenate into a single table. Sounds like 3 hours’ work in FileMaker, assuming you’re talking about ~ 5-10 tables and no more than 200-300 fields per table.

Coping with legacy amateur databases can be a hair-puller, no doubt about it. You know it’s gonna be a long day when you discover that half the data has been typed into an undifferentiated text field with hard returns and labels in ALL CAPS that they want you to abstract out into separate fields; and the dates have all been stored in plain text fields and contain strings like “Jan 3rd”, “4/7”, “Tuesday”, “Feb2”, “Nov-Dec 2002” and “17-3/03” which you’re supposed to parse out into real dates. And one secretary never got the concept of “new record” and duplicated the entire file every time a new client had to be input, and has 722 copies in a folder on her desktop.

**Yes, that’s what I’m having to do, but it isn’t easy as the only way to identify one of the tables as, say, part of the product data is that it contains the same number of records as a known product table and has some kind of primary key that looks like the primary key in the known table; the field and table names are complete bollocks.

There is also a significant element of this too, but that is largely the fault of the previous users, rather than the database designer.
(I’ve spent the whole week writing functions to convert ALL CAPS to Smart Proper Case, to parse dates, to strip out multi-line addresses from memo fields (just so I can get at the post codes and country names).

bibliophage thanks (here’s me whining about data integrity and all)…

[nitpick]There are times when one-to-ones are valid. Say, if you want to preserve the possibility of changing it to 1-to-many, or if you want all the blobs in a separate table. Also, normal form is a good starting point, but occasionally its more effective to not meet it perfectly, eg. city/zip[/nitpick]

[cynical]Wouldn’t it be convenient for the company who designed this if you couldn’t parse it, and decided to go back to hiring it from them?[/cynical]

For everything there is a season, and a time for every purpose under Heaven … a time to normalize, and a time to denormalize …

Most legacy databases, though, are just crap. As we all know full well … I’ve dealt with ones that had field names like “spare_numeric_1” and “thursday” , or had “ATUBSWDTYA” fields (stands for “Acronym Thought Up By Someone Who Died Ten Years Ago”, and only he knew what it meant). The world is full of crap legacy databases.

Nothing sucks more than databases designed by someone who doesn’t know how to design databases.

Or one designed by a company who sell it to a customer then do a quick ‘n’ dirty fixes on the data structure to meet the customers requirements. And then sell the same modification to another customer, adding further quick ‘n’ dirty fixes on top of the existing quick ‘n’ dirty fixes.

Oh the stories I could tell…

I must admit that this crossed my mind, but no, it really does look just like plain old poor design in this case. If they wanted to lock me out of the data there are more effective and secure ways.

Shade, you’re right, but there’s no evidence that they intended to create a one-to-many here (or they would have done it that way instead of having ten separate fields to store the ten separate prices for a product, surely). Furthermore, there are some fields that appear in more than one of the tables in the group (in addition to the primary key, of course).

Incidentally, Steve Wright, you hit the nail on the head with the field names; The primary discount for a product is stored in a field named CUDISCMAIN, the penalty discount (for small orders) is stored in a field named CU_USER_DATA_FIELD_1 - there isn’t even consistency in the way that the fields are named (underscores in some, not in others.
Of course it can be argued that the penalty discount was added later, but since they had to do a lot of development to get it to actually behave the way it should in the application (not to mention appearing in the right place in all the forms), why wouldn’t they take an extra minute or two to name the field sensibly?
The answer is probably to maintain compatibility with their own development database and to avoid the complications of adding a field to a database that has already been distributed.