Database Geeks! Professional Opinion on Migration/-upgrade strat proposal , Please!

I work for a company using a high-volume database, at the core of which are two main tables, Orders and Line Items. The system in its entirety is massively complex as well as old and creaky, so with their affirmative blessing I rebuilt from scratch*. New system is being beta tested.

The following may or may not be my idea; it may or may not be an idea proposed instead by someone else at my company.:

What if, instead of siphoning off some workers’ time & energy to put some trial records into the newly-rebuilt system, and then, once it seems to be working reliably, do a fresh import of all the data from the old sys and switch to full-time use of the new, we instead have the workers gradually start using the new system as the “real” system?

I mean, let’s say beginning with orders of a certain type (let’s say orders for electronic devices of a certain type, with warranties of a certain period, placed by new customers only, and by customers not ordering any other items at the same time), we would enter those orders in the new system ONLY, all other data entry taking place in the old system. We would do that for 4 days and if it goes smoothly we would then also start entering orders of a second type (let’s say an order otherwise exactly like the first type except electronic devices of a different model series would constitute this second type; or perhaps the second type would be customers who are purchasing a second item at the same time). This second type would now be entered ONLY in the new system, the first type would CONTINUE to be entered ONLY in the new system, and all other remaining types would continue to be entered in the OLD system.

For any given order type, if there were problems within the new system, we would halt order entry in the new system for that order type and go back to doing it in the old system untll that problem was sorted out. Then we would try relying on the new system for that order type once again and after a few successful days go on to a third, fourth, etc, order type until eventually the new system is the only system being used for entering new orders, of any type.

I should perhaps clarify that the proponent(s)** of this plan consider the primary advantage thereof is to not do any redundant data entry during “beta testing” – to enter a few types of orders in the NEW system which would NOT be entered, at all, at any time, in the OLD system. No order gets entered twice. As time goes on and additional order types are switched from the old system to the new system, bit by bit the old system becomes obsolete. The new system would be the sole location of real-time order entry, one order type at a time, until all the types have transitioned.

=======

Your reactions, thoughts, and comments please.

=======

  • It’s FileMaker. In case you are Filemaker-savvy and your inclination is to wonder why a developer would craft the next version of a mere core-two-table FmPro system offline instead of just modifying it on-the-fly, the original (live) version consists of > 500 freaking tables. The central core table-pair is where most of the action lies but it’s not the whole story. The whole rat’s nest needed a serious start-over rebuild. You’ve never seen such a tangled mess of relationships and illicit table names and inappropriately separated files pretending to be a unified system. But now that it’s switchover time, it’s the core-table functions that are cause for concern regarding how best to switch.

** Sorry for the deliberately vague wording, it is the wish of all people involved that any attitudes towards AHunter3 per se not color the responses of the people whose opinions are being solicited. I know that’s bloody pretentious in so many ways but believe me, it’s useful if you will tolerate it briefly. I will come clean about my own role & position on this. Promise.

Environment: is FileMaker Pro, hosted on FileMaker Server 9 (7 interim), accessed by FileMaker Pro 8.5/9.0, multiuser environment of less than 70 (perhaps less than 40) concurrent people for the immed foreseeable future (including beta-testing / switchover timeframe). Host @ local serverfarm company, dedicated host computer w/o other processes or other dbs, wide-flung corporate clients, WAN access with std high-speed internet connectivity at all locations.

I think it’s a recipe for disaster, honestly. I can see so many things going wrong with this scenario. The whole point of doing a parallel upgrade is to have redundant data - I would suggest that you revisit the proponent(s) of this idea and remind them that while “redundant” means “useless” it also means (from dictionary.com):

The way you’re planning to do it, if something goes wrong, you lose all data entered in the new system - or worse, the data is inconsistent with the old system.

Has any testing with real data been done on the new system? That’s the first thing to do - run it in parallel with the old one to test it on live data, and verify that it works before transitioning to it. I’d keep the old one running in parallel, perhaps introducing order types into the new one as you describe, until I was sure the new one was fairly robust; then jettison it.

I would (very broadly speaking):

Backup
Verify backups
Import old data into new system
Verify old data in new system
Run new data entry in parallel until satisfied about new system
Take old system offline (or remove access)

I have migrated very large, high transaction databases in the past.

What you’ve outlined is a relatively high risk strategy - you’ve done some things to mitigate it, such as only migrating one transaction type at time, but you still have significant risk that if the new database doesn’t work, you could lose or corrupt that transaction type, with no mechanism for recovery. You also face the risk that the 10th transaction type that you migrate has massive problems that can’t be fixed, and you now have a reverse migration problem for all the transactions made for the past X days (weeks?) for transaction types 1 - 9.

You should also not kid yourself that pumping a few transactions through constitutes proper testing. (Now the former software tester in me comes out). You’re testing with a few “normal” cases, but not coming close to potential corner cases or volume related issues.

That’s not to say the proposal doesn’t have potential benefits: it gets load off your original database quickly, and it does simplify testing somewhat.

Only your organisation can evaluate the risk against the benefit, but I know in the organisations I’ve worked with, the risk far outweighs the benefit.

Dervorin’s broad plan is pretty close to what I would propose in the first instance. However, I have also seen many cases where the parallel run is simply not feasible. If you don’t have a parallel run period, you do introduce a risk that your destination database has a problem a week down the track and you have a difficult/impossible role back to your old database. Reasons a parallel run may not be suitable include:

  • The hardware needs to be reused for the new system (not a good reason, but a reality for many organisations)
  • Costly (or complex and thus risky) software development is required to provide the transaction pass-throughs
  • The transaction pass-throughs also have to be tested in pre-implemenation testing, which can extend testing

It really comes down to how much risk your organisation is prepared to accept. It’s a perfectly valid strategy to build your new database, throw the data into it without testing, and cut users across immediately, deleting the old database as soon as you’ve done it. But it’s also exceedingly risky, and I would never, ever recommend it. To the other extreme, you could test this thing for a year, and then run in parallel with the old for a year complete with automated cross-checks, it’s very low risk, but extremely costly. The former strategy might be OK for a throw away DB for a high school project, the latter for a system where errors could put lives at risk.

I’m no database geek, but I am an IT guy and I’ll echo that this is simply a recipe for disaster. This is a core business application and it absolutely must be properly tested before it goes live.

You’re describing a beta test, which is what happens after you’ve done extensive testing.

What do you do with this data once it’s in the system? You must run reports or something off it. How are these reports going to make any sense if some of the data is in one system and some is in the other?

I don’t have much to add, other than another vote for “recipe for disaster”. If the new system tubes up your live transactions, you’ve got no way to revert to trustworthy data. I do a lot of these kinds of upgrades and cutovers in my current job, and I’d never recommend this approach. The users might hate testing, but they’ll hate it worse if they have to call a customer and tell him his order is fubar.

I’m not specifically a database guru, but I do work with them peripherally in some mission-critical applications*, and I agree with the general consensus so far. The proposed plan would be a very risky way to handle the migration. Anything that puts the only copy of live data in an effectively untested system sets off all sorts of warning bells for me.

*For the record, I work with one of the creakiest databases known to man. It runs on our only VMS box, because it was originally created on VMS and was never ported to anything else.

I also have some experience in migrating large mission-critical databases, and I agree with everyone else. In my opinion, you test, then beta test, then perform a real-world pilot with several users, then migrate. You also have a roll-back plan for a worst-case scenario. It’s about 90% planning and 10% implementation, but I’m sure you already knew that.

Time to end the suspense and explain what’s going on. I appreciate all the feedback.

Here’s the deal: My company’s owner/executive is the person who has proposed this plan as an alternative to my standard plan of having the regular users put a small percent of the live data (5-8 %) into BOTH systems, enough dual data entry to give the new system a good solid shakedown cruise, after which the data in the new system would be dumped and all data from the old system reimported, and from then on the new system would be THE system, with the old one available only as a reference.

The company owner has been reluctant to let workers spend time beta testing (they are understaffed, it takes them away from “real” data entry) and is also reluctant to hire some former employees in the area who left the org on good terms, who know it well enough to do beta testing (this idea was floated).

I said: hell no, that is totally not the way it is done, you don’t DO that, that would be like driving down the highway with one leg in the window of a Pontiac and the other leg in the window of a Chevrolet in the next lane over. Really bad idea. You hired me, but I am a professional. It would not be ethical of me to do this.

He is not accustomed to this. He said (not directly to me, but through the person to whom I directly report): I’ve worked with databases and people do this, it’s a reasonable strategy. I said: Umm, no it isn’t. I said: would it be useful if I poll my peers to get some second opinions?
Well, I got them. Thank you. Via the TechTalk Digest, FileMaker Cafe, Worldwide FileMaker Forums, the New York FileMaker Developers’ Group, and for a more general pool of database geeks the Straight Dope Message Board, I got replies. A surprising 3 people out of 22 respondents said it might be OK, although 2 of those three went on to say it’s not how they would do it. Three people said it was a recipe for disaster. The overwhelming response was “you should do parallel testing instead”, and a great many specific concerns were voiced about the kinds of things that happen when your “live data” is not in one system or the other but awkwardly distributed between the two, concerns that mirror my own thinking.

Thank you again for giving me ammunition to claim that I’m not holding an unusually conservative opinion here, and that pretty much any other developer would tell him the same thing.

Now I have to go explain to this guy why being a hospital administrator does not authorize one to tell the surgeon how to do the surgery.

From someone who spent 3 and a half years migrating a rather massive billing database for Australia’s largest Telco, the proposed plan is nuts.

Doing things right the first time in massive systems is infinitely better than doing it wrong, pausing, rolling back, fixing data, trying again, lather, rise, repeat ad nauseam .

6 months to get it right followed by 12 months of migration would have been better than 32 weekends over 3.5 years. 18 months seems like a long time when the consultants say it should take 6-8 months all up, but 18 months is still faster than 3.5 years.

You don’t say how massive your DB is, so we could be worlds apart, but even so, the proposal is nuts.

Remember though, at the end of the day, it’s his choice and if it goes astray will try to play the blamestorming game, but as long as you document your (sane) proposal and objections you’ll be safe(ish).

Well, to cap it all off, they want me to be making “little changes” and “criticically important fixes”. In the OLD system. Which is a crash-prone corruption-plagued monstrosity with bad naming conventions.

Just shy of a million recs in the 34 largest tables, under a thousand apiece for the remaining 40; 8 gigs (got some big chunky container-field objects in there, aka “BLOBs”), 75 native tables (down from approximately 470, no kidding, in the old system, which, like the provarbial Topsy was never planned but just grew; old system consists of file after file after file after file each of which looks kind of like this under the hood. I selected the 414 primary-use tables and rewrote from the ground up in a single file, concatenating the redundancies and weeding out the deadwood, resulting in a somewhat cleaner if still sprawlingly convoluted structure. Vastly simplified permissions architecture of only 8 privilege sets with some optional flags loaded per-the-user via field values in their account rec at logon. 346 layouts (screens), not all of them for end-user purposes, quite a few unnecessary except for data import from old sys; organized alphabetically by underlying table occurrence. Scripts rewritten in modern Fm8 parlance and organized in logical groups and alphabetized within each group. Import of data from the old sys is fully automated with subscripts in the various old files and a log documenting record count per table in old sys, in new sys @ conclusion, and elapsed time & rec/second import performance for each operation and updating next serial number for each after import. Not counting the latter (which are not a permanent part of the structure), 714 pages’ worth of scripting. (That’s a bit deceptive, page break between scripts means lots of short scripts still eat a full page each).

It’s still a convoluted contraption but vastly sleeker than what it’s replacing, yet 98% of the end user experience (workflow, appearance of screens, etc) is identical or very close to it. Vast acres of custom homemade functions were ripped out and replaced with standard system functions that any Fm developer would know without having to climb into Define Custom Functions, and none of the handful I kept cascade (in the old sys, there are CFs that reference other CFs that reference yet other CFs. many of these are corrupted — they work but the definition code for them cannot be read , the definition trails off into gibberish symbols, so they had to be backwards-engineered).

I did the whole damn thing between May and September, after which point it has been gathering digital dust. The developer can’t really test the system; only those who actually WORK in the environment know their own workflow and can readily notice structurally correct but factually erroneous stuff onscreen so as to say “that ain’t right”.

Can’t get them to beta test and now the boss is proposing this damn-fool idea. Yeesh.