Obamacare "Tech Surge"

So, Obama is calling for a tech surge to fix the Obamacare IT problems.

Looks like someone hasn’t read “The Mythical Man Month”…

Here’s the original press release: Doing Better: Making Improvements to HealthCare.gov

My first thoughts as a software engineer were, “If you’re looking at bringing in masses of new IT people, this is not a fix that will be measured in days or weeks”.

Anyone else with an IT background care to discuss the rollout of the Obamacare websites/integration hub - what went wrong, what’s likely to happen now?

That book is decades old. Nobody understands the Mythical Man Month. :sad:

I am an old (59) software developer who has two copies of The Mythical Man-Month that I have been lending to managers and executives for years. No one seems to read it any more, and it shows.

I haven’t experienced the issues with the ACA websites personally, and the stories I have read about the situation have been typically tech-ignorant. I don’t feel like I know anything about it technically.

The press release cites problems that relate to high-traffic/scalability issues as well as problems that relate to crappy code. The scalability issues are probably more easily fixed than the crappy code. Crappy code is like an iceberg: the crap you see is dwarfed by the crap you haven’t seen yet. Scalability can be helped to some extent by tuning databases and throwing hardware at the problems.

Throwing new people at crappy code is not going to be a quick fix. To quote Brooks’s Law, coined by Frederick P. Brooks, author of The Mythical Man-Month: Adding manpower to a late software project makes it later.

I know this project is not technically late, but is something that is delivered on time but not functional truly delivered yet?

I have a degree in computer science, though I haven’t worked on a big software project in years. I predict failure, at least for the next few months.

There are millions of websites out there that let customers shop among various options and buy the one they want. It should not have been hard for the government to build a website that does the same thing. The administration instead gave the contract to CGI Federal, a company based in Canada that does work for governments worldwide. Reports say that CGI Federal has a history of cost overruns and an “uneven record” when it comes to completing the work they’re paid to do. There are reports that CGI Federal was the only bidder on the contract. The administration denies it, but won’t tell who else submitted bids. Meanwhile costs have soared.

Contracting for the government: it’s good work if you can get it.

This sounds like a scaling problem. But if the software were scalable, then this would have already been fixed by simply adding more capacity. So, some unfortunate people are in for several months of long hours, sleepless nights, and unwelcome attention.

Having lived through problems like this before, I would bet 2 months minimum to basic stability, and minimum 4 months to being highly reliable.

When they get the web pages fixed … I hope they put myth busters about ACA at the top of the page.

The Obamacare IT system is not like other web sites. Building a scalable, user front end should have been the easy part. It’s something we know how to do very well, and it’s relatively straightforward. That so many fundamental mistakes were made with the front end/web server code is amazing.

The hard part of the system is the back-end integration. That’s what would scare me, and it’s what we have the least amount of information about because the code can’t be inspected like client code can.

Here’s a diagram from Xerox showing the high level architecture of the system. As is typical with such diagrams, it looks simple enough. That is, until you realize that all those purple boxes represent external systems that have to be connected to the ‘data hub’. Dozens of them. Each one with its own formats and capabilities. Some of them use hardware dating back to the 1960’s. And some of those purple boxes (like “Health Plan Carriers”) could actually represent many different legacy systems.

If you’ve ever worked on a project that required building transactional interfaces to legacy databases, that table should terrify you. Having to do it in a way that scales to millions of queries per day when some of the back end systems may have never been designed for that kind of throughput in the first place, makes it much harder.

There are stories coming out that indicate the back-end code is as bad or perhaps worse than the front end code. If so, this is a trainwreck that will never be fixed. It’s not unprecedented for ambitious projects like this to fail. The FBI blew hundreds of millions on a new case file system that was eventually scrapped. The California DMV did the same when trying to consolidate license applications in the 1990’s. The private sector has had some pretty big failures as well.

What shocked me was when I read that there were 55 different contractors involved, with no integration lead. Final requirements were still being pushed out just a few months ago, and integration testing only started about a month before the system was rolled out. That’s crazy.

The reason why it went wrong was because it had only scant months of daylight between the rollout and enormous political muscle threatening to sink it. They couldn’t have done this years ago because it was still being litigated at that time. Up until this year, there was still a question of which states were going to sign up for it and which states weren’t. Add to that, a majority of states with Republican governors have not only not signed up for it, but are committed to making its implementation as hard as possible and you’ll get some problems

With all these problems likely in the near future, its a credit to Obama’s team that it has gotten up and running as much as it did. The bugs will be worked out. A new startup website such as this is bound to have problems. Plenty of companies and instances of new online content have been crashed by demand and unexpected problems. This sort of thing is the rule rather than the exception

In the case of the FBI case file system and the California DMV system, why did they fail?

Why use hardware that dates back to the 1960s, given how cheap computing is?

That’s nonsense. You can’t blame Republicans for this - this was a project internally managed by the federal government. It was fully funded. The IT plan started soon after Obamacare was passed.

Do you have any actual evidence that Republicans have thrown a wrench into the works anywhere? Something besides idle speculation?

Are you being serious? Yeah, that web site is just a giant feather in the cap of Obama. The whole IT industry is singing the praises of the team that rolled it out. After all, achieving anything at all in a world that contains Republicans is a miracle, right?

Name the last rollout of any large web site that was plagued by problems anywhere near this level.

There aren’t ‘bugs’ in this software. The system doesn’t even function. The problem wasn’t demand - the site still isn’t working and the demand is a small fraction of what it was on the first day. It just plain doesn’t work.

Spectacular IT failures generally happen for several reasons:

  1. Poor feasibility/requirements analysis up front.
  2. Scope creep
  3. Chaotic management
  4. Bad architecture
  5. Hubris - taking on more complexity than you can handle

Big systems can look simple from a high level. It’s when you drill down into the details that the real complexity emerges. When you have multiple systems that have to interact with each other in real time, the permutations grow large. When the data structures are very different between systems, normalizing them can be very difficult. When you have to deal with legacy systems transactionally, and many of them are part of the transaction, it gets hairy.

Consider a simple problem: You’re trying to look up a name, so you specify an interface into a system that contains a name query. Your system has separate fields for first name, last name, and middle initial. The other system just has a ‘name’ field.

So you want to look up:

First Name: John
Mddle Initial: S
Last Name: Smith

Now, in the other database you find records like this:

Smith, John S
John Steven Smith
John Sebastian Smith
Smith, John Steven

Making a query that will guarantee a match can be hard. You have to parse the many different ways that a name can be written. You have to handle duplicate names, names that aren’t qualified with a middle initial, names that have data entry errors in them… And this is a trivial example. When you want to pass form data containing hundreds of fields and the field lengths are different and the structure is different and data is separated at different places, it can be a nightmare.

Now try doing that fast, with 30 different databases, and pulling all the data together into a common record that makes sense. And do it in a way that scales, and can be wound back if any of those queries fails.

Almost always it’s because these systems accumulated reams of custom software over the years, and it would all have to be rewritten if they move to modern hardware. I work in factory automation, and this is always a problem when trying to digitize a factory. Most factories that have been around a long time have accumulated all sorts of one-off programs written in-house or by consultants long gone. They absolutely rely on all that software, and it’s hideously expensive to re-write it all. So the factories muddle along with ancient hardware until the maintenance costs or efficiency losses overwhelm them, then they bite the bullet and upgrade. Maybe.

My experience is that this is not so simple. Bidding on government work tends to involve conforming to all sorts of rules and regulations and meeting all sorts of irrelevant technical qualifications that aren’t present in the case of private clients, and the projects themselves frequently have a lot of technicalities of this sort themselves.

These rules and regulations are put in for very good reason - to prevent abuses - but they cumulatively have the effect of making the bidding and work much less efficient than it would be otherwise.

One big reason that you left out is that the Obama administration stopped issuing regulations during the election campaign, afraid that anything they put out might be used against them and not wanting to attract attention to the program generally. A lot of the private (& public) sector actors were awaiting the regulations to see whether and to what extent to participate. Untill the regulations and participants became clearer, it was hard to move things forward.

The FBI system failed because it was poorly managed from the outset, in every way that a software package can be poorly managed. This list of reasons from Wikipedia could be summed up as "they read The Mythical Man-Month and did the opposite of what it said.

As Sam said, people use old hardware because of the expense of rewriting the software, which is almost never portable.

Oh, please. Shall we all simply define the word “success” as meaning whatever the Obama administration accomplishes, regardless of what that happens to be? If Obama trips over a root and falls on his face, his supporters will call it “an excellent example of successful face-ground intercommunication”.

The notion that the exchanges will start working at some point down the road and that this will be just fine has two rather large problems:

  1. First, everyone’s required by law to purchase health insurance. Those of us who don’t will face heavy fines. At the moment, purchasing health insurance through the exchanges is physically impossible for almost everyone. Fining people for failing to do something that the federal government made it impossible for us to do would probably be classified as a bit unfair, even by Democrats.

  2. Second, even supporters of the law acknowledge the basic fact that it can only succeed if a lot of people purchase insurance. If the number of people buying insurance is too small, then insurance costs will go into a “death spiral”. At the moment, virtually no one can buy individual plans on the exchanges. Unless that changes soon, guess what happens?

Would it be possible or is it even an option to start all over again with a new web page sign up and redirection page? Now that president Obama is asking for outside help to solve the problem?

Just extend the deadline for signing up from March 31st 2014 to end of June or July?

He could hang up a banner that says “Mission Accomplished”.

Slight problem: tax returns are due April 15. The Obama administration expected to be able to start punishing those miscreants who didn’t buy insurance during this tax year. Push the deadline back a few months and they won’t be able to punish us until the following tax year. That means losing quite a bit of money.

The other problem with that is that the delays in rolling out the ACA have already been quite embarrassing for the administration. A big delay in the biggest portion of the law would be vastly more embarrassing, and his poll numbers are already dropping.

Two articles from Slate on the topic, both worth reading:

The first notes that the entire program contains 500,000,000 lines of code and quotes an estimate that 5,000,000 lines will need to be rewritten.

The second delves into the fact that the front end and back end were developed by different contractors and contains nuggets like this:

Likewise, the bugs around username and password standards—for example, the fact that the username required a number but the user interface didn’t tell the user about it—are not problems of scale. They’re problems of poor cross-group communication. I’d bet that plenty of people knew what was going to happen when the site rolled out, but none of them were in a position to mitigate the damage.

Yep, that first article mentions all the things that came to mind when I heard that number - they have to have been writing their own replacements for perfectly good standard libraries, duplicating functionality in multiple places, and doing all the other classic bits of stupid that increase the chances that your project ends up featured on The Daily WTF.

Personally, I think ‘months of additional failure’ is probably a little too severe - we’re more likely to see gradual improvements along with a few issues that just won’t go away.

This was a massive undertaking that was massively mismanaged and massively underestimated. Somebody should and probably will lose their job over it. Scapegoats are being identified skewered as we speak. However, I can’t imagine pointing the finger at any one single problem/person and trying to make the claim that it was this and this problem alone that is responsible for the entire clusterfuck. The fact that the usual suspects are trying to make the president responsible for this is not surprising at all and reeks of desperation that is the GOP, et al.

Per OP, I’m an IT professional with 25+ years experience in development and analysis and architecture and project management in a variety of large system implementations.