y2k bug: a valid worry, not humbug.

The Y2k bug scare consisted of IT people worrying that an undefined number of computers and their software would be thrown significantly out of whack because those computers kept track of the year using two digits (00, 01, …, 98, 99), and would therefore behave unpredictably when the year changed from 1999 (99) to 2000 (00).

For example, if the interest on a loan were based on how long ago you bought the CD, and the arithmetic were based on subtracting one two-digit “year” from another, then a loan taken out in 1995 would accrue interest normally 'til 1999 (99-95 = 4 years), but then the math would get screwy when we reach the year 2000 (00-95 = negative-95, not plus-5).

Many people feel that the Y2K bug scare was too exaggerated, since little or no Y2K bug related problems occurred. For a while, the paranoia was wonderful for sales of survival gear, and horrible for airline companies, some of which GAVE AWAY tickets on the first flights of January 1, 2000, in order to restore consumer confidence rapidly.

Well, I’m here to tell you, it wasn’t paranoia. The Y2K bug was nicely averted through the hard work and foresight of a savvy IT industry. It could’ve been so much worse, as the example I cite below reveals.

Around the year 1999/2000, I worked at a large PC company’s national call-in support center. We answered questions from novices, IT managers, and even from engineers making repairs in the field.

One of the PC models that we supported was a PC based loosely on the PC/XT, IBM’s first microcomputer. It had premiered years before, and frankly, we didn’t expect that many of these “dinosaurs” were in mission-critical roles. We knew that come January 1 of that year, these computers would stop tracking the year properly, since they were only capable of keeping track of 8 years in total. (8 = 2 to the power of 3. Computer engineers loooove to use 2, 2 squared, 2 cubed, and so on.) That 8 years included 3 years prior to the computer’s sale, and 5 years after the computer’s sale. Beyond that, their calendars simply scrolled the year back 8 years.

Well, since we thought these old PCs were probably only in kids’ bedrooms and dusty warehouses, we didn’t bother to prepare much. We had a software program that could correct the date, but didn’t bother distributing it. We didn’t put any extra staff in the call center, either.

On January 1st, we got a tidal wave of phonecalls: 20,000+ in a single day. In some cases, lawyers were threatening to sue because their paralegal secretaries’ PCs were sending out incorrect bills. We were miserable and humiliated, but this incident and similar ones prompted the IT industry to erase the Y2K bug before it exploded.

Here’s a rather lengthy (and ultimately boring-as-a-phonebook) list of date-bugs similar to the Y2K bugs. I include it only to show how commonplace such bugs can be.
http://www.csl.sri.com/users/neumann/cal.html

I assume you’re talking about this column:

As someone who actually wrote a book about preparing for Y2K, in my opinion the whole thing comes down to this:

Prior to Y2K, experts claimed that there were not enough man-hours left to track down and correct all the faulty code out there. As a result, major errors were inevitable. Yet, in the end, nothing catastrophic really happened.

The problem was way overhyped. Period.

I don’t know too many people that there were no Y2K bugs, or that there didn’t need to be work done to fix them. The problem was the hysteria that airplanes were going to fall out of the sky, banks would be thrown into chaos, there would be riots in the streets, etc. One example is my sister who refused to fly home for Xmas because she thought it would be dangerous to fly

I’m not sure I’d call lawyers’ billing systems a “mission critical” application.

Yeah, I get the “loans could go wonky” worry, but the catastrophe angle was weird. What, did the government find the only way they could keep the nukes from launching is to write:

while (date.POSIX > 0){ LaunchStatus = DO_NOT_LAUNCH; }
LaunchStatus = DO_IT;

Edit: Though I’ve heard some funny stories about having to reverse engineer unmaintained COBOL systems because no one bothered to keep the editable portion around, just to store the date better.

The problem there is memory overruns. While the Java programming language is pretty good at making sure that data stored in variable A doesn’t leak into variable B, there are lots of programming languages (Assembly, C, etc.) that are far more prone to this type of error.

Famously, one way of inserting a virus into a particular password-protected operating system was simply to type so much, so fast, at the log-in prompt that the extra characters wound up in a memory area that should have been reserved for safe, trusted executable programs. Those extra characters ran as a trusted program, effectively taking over the computer.

Hence, it’s believable that putting 4 pounds of mud in a 2 pound bag (4-digit years in a 2-digit memory space) could lead to unpredictable errors as “fun” as misspelled names or as serious as planes not knowing what altitude they’re at.

The former, yes. The latter, no, as those systems had long since been verified.

In my experience it was definitely overhyped. I worked as a software consultant from 1994 through 1999. (In my own business from 1995 on.) I worked for several fortune 500 companies and every single one of them already had a plan to deal with Y2K by 1995. The dev community was already aware of the problem well before the media caused the scare. It was indeed averted

But that is not to say that they hype was real. Had the industry ignored the issue, there would have been problems, but they were on top of it long before it caught the imagination of the media.

This is actually a response to Cecil’s column (as linked by toadspittle) so I’m moving it to the appropriate forum.

How many “non-critical” systems would have to fail before the chaos would be enough to make life difficult to be described as a “crisis”? IIRC, one of the early failures was a taxi-meter system; since the rates had a lifetime, as early as a year beforehand the system had to be ready to take 4-digit dates. What about payroll systems? Is it a criis if you don’t get paid that month? It’s not a crisis if your bills don’t arrive, unless the power company shuts down.

Most places, like where I worked, started planning up to 10 years ahead for this. We still went down to the last few months. The new payroll went live Sept. 1999; the old one would not handle 4-digit dates. Fudging 100 programs and hundreds of thousands of lines of spaghetti code was not an option.

Most process control systems did not depend on dates, so our systems were not going to dump giant vats of molten metal on the floor because of a Y2K glicth. (Mind you, one did a few years later because of a different computer glitch) They had us check network switches, which made no use of any date settings. The Honeywell HVAC system. The laser printers. Programmable fire/security alarms. It was overkill, but basically there were no significant issues with Y2K - not because we ignored it, but because we planned ahead and spent the money - lots of money! We replaced our entire purchasing, warehousing, accounting, and payroll systems. The original mainframe home-written systems were using a database that was no longer supported, and could only run on a version of the mainframe OS that would not work after 1999, even if we wanted to spend a few man-centuries re-writing the COBOL mess.

PDP11’s, BTW had a Y16 glitch. The earlier RSX11M O/S (? I think?) stored the date as 4 bits, 0-15; IIRC 0 started in 1977 when PDP11’s were first made. In 1993 suddenly the “scheduler” stoppped. Process control tasks would not run. In our case, there was nothing critical that shut down the manufacturing, you just couldn’t get production reports. An emergency OS patch was needed to permit the computer to schedule jobs past 1993.

It seems to me that Y2K mania was a key element fueling the dot-com boom/bubble, by encouraging sweeping replacements and upgrades of IT hardware and associated software and infrastructure that might otherwise have been deferred for years more. The investment bubble burst in early 2000, once the collective apprehension was past.

It was a mixed bag. Some companies had an equipment “lock down” and delayed purchasing new equipment until after 2000.

I think Y2K is not an example of an extreme either way. There was a real problem, it was addressed well, and some small bugs still cropped up. Just like you’d hope and expect it would work.

This is generically known as buffer overrun. I don’t think it was a big part of the Y2K problem. You already had that issue if someone typed in 1999 rather than 99. No if someone tried to fix a program and made some changes to allow a 4 digit date and somewhere else in the program it still expected a 2-digit date you could have a problem. The way older languages like COBOL handled character data would not have created buffer overrun problems.

I’ve got a couple of questions for the OP. First, exactly which lawyers had offices open Jan 1 to be calling with complaints? Second, could you give the exact model of the computer with the problem, and the application that caused it? I wonder about a knucklehead who would use 3 bits for a year. A nibble just possibly, which would give a 16 year leeway, or more likely a byte.

As for the Y2K problem itself, I’m of two minds. Of course it was overblown, since embedded systems care little about the date, at least not for critical things. And of course it was in the business interest of many people to blow up the problem to get work. However I doubt that letting things fail, as Cecil suggests, would have been a good solution. I’d hate to be the person who recommended this strategy when an app critical to the business failed, and trying to find some Cobol programmers who could comb through 10,000 lines of undocumented code looking for all the problems. I’d be interested in how many bugs were actually found during the remediation efforts.
But Y2K did have benefit, and not just from getting good jobs for aging Cobol programmers. It stimulated lots of people to upgrade just to be safe, which both made the industry money and got rid of hard to maintain legacy code. I kind of wish there was a Y2K+10 problem, which might help the IT economy.

In the old days a buffer overrun could expand into other data or even code areas and cause all sorts of problems - but that was just a standard bug, one I’ve committed more than once. The real danger is if someone who knew that buffer overruns were not being checked could use the bug to write code into memory and then execute it.

Yes, I understand that. I always wondered how that could work as pages that are executable are not generally marked as writable.

sethness said:

Something is wrong with your cause and effect. You are saying that a computer problem on Jan 1, 2000, triggered IT people to start addressing the Y2K problem months and even years before? Color me confused.

Most of the alerts I’ve seen on this have come from someone being able to do it basically in the lab. I’m not sure how many times someone has compromised a computer in the field using this method. There appear to be lots of much easier ways of doing it.

The first Y2K event that I know of occurred in the early 70s, on date Y2K-9999, when “keep forever” tapes on IBM mainframes started being marked “recycle” as soon as they were created.

I also had a failure in 1970 on a system that had allowed only one digit for the year.

In the good old days, you were happy to get any memory you could. An older co-worker told me he leared assembler to pare down a COBOL program that wouldn’t fit in the mainframe’s 40KB of memory. Many of the compilers just jammed together all the data, in the order declared, into space at the end or beginning of the executable; this for each executable, so odds are data at the end of one subroutine was at the beginning of the next. Later, more complex programs would have separate code and data pages.

But exploiting this stuff was reserved for PC days, when the same executable could be found on millions of PC’s; so you could reliably expect that what you analyzed on your PC was likely true on the next one. Windows and other programs didn’t help by recycling code that did pretty much the same through multiple newer versions. Only when viruses became a big problem did MS and others start combing the code looking for buffer-overun vulnerabilities.

Yes, that’s more like it. We knew in the late 80’s that our programs were toast in 1999. But then, they were only written in the late 1970s and later, so the usual attitude was - “This will be re-written or replaced by 1999 anyway.” Yeah, right. I still see people running DOS-based applications in some businesses. If it runs like a tank, why kill it?

It just meant that the effort to replace these programs detracted from any other work for half a decade. The only bonus was that we went straight to replacement off-the-shelf programs which added significant features, and replaced a shop of programmers with annual maintenance fees.

The problem with this was that the experts’ estimates were wrong (as usual). But they were wrong in being overestimated, rather than the typical underestimate. So there was a lot of work to do, but programmers were able to do it much more efficiently than had been expected.

I spent several years working on Y2K fixes. In many systems, the date logic part of the program had not really been changed since it was originally written, so the whole system would have similar logic, often written by the same person. So I would just be making nearly identical changes to dozens of programs, one after the other. That can be done much faster than a normal bug fix, that involves searching through many undocumented programs looking for the bug. So there turned out to be efficiencies of scale, which is normally not true of software bug fixing.

Also, in a whole lot of shops, the Y2K problem was knowingly hyped by IT management – they used this ‘crisis’ to get the company management to finally budget the money to replace creaking old systems that had been patched over for years, when they should have been replaced years earlier. But company management was so focused on short-term profits for their stock options that they never thought about long-term maintenance of their software assets.

This was a boon for ‘aging COBOL programmers’ not only then, but for years to come. These older systems were re-written during the Y2K crisis, and are now set to last for another few decades – so the COBOL programmers will still be needed into the 2020’s & 2030’s.