Spectacular IT failures generally happen for several reasons:
- Poor feasibility/requirements analysis up front.
- Scope creep
- Chaotic management
- Bad architecture
- Hubris - taking on more complexity than you can handle
Big systems can look simple from a high level. It’s when you drill down into the details that the real complexity emerges. When you have multiple systems that have to interact with each other in real time, the permutations grow large. When the data structures are very different between systems, normalizing them can be very difficult. When you have to deal with legacy systems transactionally, and many of them are part of the transaction, it gets hairy.
Consider a simple problem: You’re trying to look up a name, so you specify an interface into a system that contains a name query. Your system has separate fields for first name, last name, and middle initial. The other system just has a ‘name’ field.
So you want to look up:
First Name: John
Mddle Initial: S
Last Name: Smith
Now, in the other database you find records like this:
Smith, John S
John Steven Smith
John Sebastian Smith
Smith, John Steven
Making a query that will guarantee a match can be hard. You have to parse the many different ways that a name can be written. You have to handle duplicate names, names that aren’t qualified with a middle initial, names that have data entry errors in them… And this is a trivial example. When you want to pass form data containing hundreds of fields and the field lengths are different and the structure is different and data is separated at different places, it can be a nightmare.
Now try doing that fast, with 30 different databases, and pulling all the data together into a common record that makes sense. And do it in a way that scales, and can be wound back if any of those queries fails.
Almost always it’s because these systems accumulated reams of custom software over the years, and it would all have to be rewritten if they move to modern hardware. I work in factory automation, and this is always a problem when trying to digitize a factory. Most factories that have been around a long time have accumulated all sorts of one-off programs written in-house or by consultants long gone. They absolutely rely on all that software, and it’s hideously expensive to re-write it all. So the factories muddle along with ancient hardware until the maintenance costs or efficiency losses overwhelm them, then they bite the bullet and upgrade. Maybe.