Computer people - share your stranger bug fixes

I was running a multi-user database system – order taking, invoicing, etc. – at a company with a few dozen networked workstations, written in Foxbase, a much improved version of dBase III. They had a “get random, unique, file name” function that I used all the time for temporary files. I never found out how they generated the filename across multiple users all sharing the same server (or even the same workstation) without conflicts, but it worked very well. For years.

Over the years, we gradually upgraded our computer equipment. We also gradually acquired program errors that I could trace to the random file name function, although I forget the symptoms. But everytime I tried calling the function to test, it worked perfectly.

So I wrote a routine to generate filenames continuously and ran it on several workstations overnight, comparing each name with the previously generated one. Sure 'nuf, after hundreds of thousands of iterations, one filename came back a duplicate. Not often, but once in 24 hours is not acceptable.

Probably, I thought, because the original Foxbase program was written on slower machines, and relied on a millisecond-range delay loop. Since ours were faster, something was happening too fast. Perfectly logical, but how to fix it? If we put a short delay in each filename fetch routine, it might break again after the next upgrade.

So I finally rewrote the routine to get two filenames in sequence and only if they were different would the first one be returned, knowing that it would be unique. I ran a test routine on the fastest CPUs we had for a week before implementing it, and it never broke again.

We had a bank of servers - I’d say it was about twelve (this was years ago when servers doubled as end tables) - the first six or seven in a physical line would go down in rapid secession right after lunch each day.

Supplemental air conditioning unit would kick in - those servers were close enough to the unit that the vibration from the air conditioning would bring them down.

I run some software that has a bug in it so it occasionally freeze up and makes the keyboard and mouse totally unresponsive, even though the OS is running fine.

Since all I really needed was a way to communicate with my OS that didn’t need a mouse or keyboard, I wrote a script that would kill the buggy software whenever I eject the CD tray. Once the software is killed, my PC becomes responsive again and I can restart my script and the buggy software for another few hours till it freezes again.

Works pretty well =)

A couple of stories from many years ago, working for a big chain of banks, most of them small-town ones scattered all over the Midwest USA.

At the time, we were enhancing the check processing procedure: instead of all these banks sending all the deposited checks by courier each afternoon to the computer center, where they were data-entered into the system (and absolutely had to be processed by midnight!), the new system installed entry machines (early PC based) at each local bank. Then the local bank did data entry on these machines, using their existing clerical staff and ‘user-friendly’ software. The data was written onto 8-inch floppys, copied, and the floppy was sent with the physical checks to the computer center. (Still by courier, at this point, though there were plans for eventual dial-up transmission.) The computer center then read the data from the floppy and sent it on to the mainframes (or if the read failed, fell back to doing data entry from the physical checks.)

These clerical people were older bank tellers (nearly all women) or a secretary to the Bank President. The designers of the system were young computer nerds. Guess what? They didn’t communicate very well.

The 8-inch floppys tended to wear out, and since the data was so valuable compared to the price of the disk, it was decided to use each floppy only 5 times. So they printed labels for them, which had only 5 spaces to write the bank id, date, etc. One space was filled out each time they were used; when all 5 were full, the computer center kept it and returned them a new blank floppy instead. (So the computer center ended up with a bunch of used floppys to discard (used 5 times!). Since I (and some others) had PC’s (this one - TRS Model II) that used 8-inch floppys, we took lots of them home – a nice fringe benefit.)

But there were lots of failed floppys, eventually traced to the clerks filling out those labels with a hard pen – pressing down on those floppys with a sharp pen could damage them enough to make reading them iffy. So the bank bought a few gross of a soft felt-tip pen in a strange purplish color, sent them with every floppy, and told all the clerks that the floppy labels “absolutely MUST” be written with this purple-colored pen. It worked; the number of unreadable floppys dropped way down.

But an unintended consequence: many of those senior clerks (and the new hires they trained over the years – most of the original ones are long since retired) are convinced that computer disks MUST be labeled in purple ink. Those old 8-inch floppys, the 5-1/4’s that replaced them, the 3-1/2 inch ones, even modern flash drives – anything other than purple ink will surely damage that storage. The supplies department of that bank chain is still stocking great numbers of purple felt-tips, 25+ years later!

Another story: those local machines were not very reliable, and when needed, the bank sent a team from the computer center out to these small towns to repair them. One time, they asked the clerk if she had a backup copy of the data disk. (Not expecting much; experience had taught them that backups were frequently skipped.) She said “of course I do. I copy it every day, just like the instructions say.” And she took out a file folder, and handed them – a careful xerox copy of the 8-inch floppy disk! When they protested that this would not work, it was not a copy, she (a formidable woman with 20+ years experience at the bank) looked at them sternly and said “young man, I know what it means to make a copy. I’ve been making copies and filing them since before you were potty trained. That is a good copy.”

When they were telling this story back at the computer center, I upset some people by saying she was right. The system was supposed to be specifically designed for use by existing clerical people, with no computer experience. The instructions said only “make a copy of the floppy disk, and file it.” Experienced clerical staff are far more likely to think of xerox copies than anything else. Also, the instructions for copying a floppy disk were hidden in an appendix of the instructions, and not mentioned at all on this page. They did change the instructions after this. And I don’t think she was the only clerk doing this, based on the sudden increased demand for floppy disks after the new instructions went out.

I was asked to help a family friend with her computer. Apparently her browser wasn’t working any more and showing the standard error screen instead of bringing up her start page. She was running Win XP.

In cases like this, I usually will have them try a different browser, but she didn’t have one. She only had IE 7. So I installed Firefox from my flash drive. Same problem.

I checked the Internet settings in Control Panel figuring that her connections settings were set to some sort of Proxy. Nope. It was set to “Direct connection”.

Then she told me that it was working fine until she had some sort of weird pop-up. When she closed it, that was when she started having problems. I ran my usual spyware and malware scans. They found some stuff, but she still had the same problem.

That was when I realized there was one other place I had not looked… her hosts file. Sure enough when I checked out her hosts file, it was chock full of listings all pointing to 127.0.0.1. Once those were cleared out and her pc rebooted, she was in business.

I used to work as a programmer on airline reservation systems. Back then, these were old, clunky mainframes. After a travel agent (remember those?) entered a reservation, they’d get back a six-character, pseudo-randomly-generated “record locator” to identify the reservation (you will still see these in your online reservation info today). The system would say something like,

“END OF TRANSACTION - ABC123” where ABC123 is the record locator.

Unfortunately, due to the randomness, something like the example below was bound to happen. The agent was not impressed, and filed an angry bug report after receiving this response:

“END OF TRANSACTION - FUCKOF”

The agent not being placated by the engineers’ assertion that at least it wasn’t “off”, a fix was demanded. Initially we considered creating a table of “forbidden words” - if the record locator came out as something in the table, roll the dice and try again. Fun as it would have been to come up with every swear word we could imagine, this seemed unwieldy, especially since the system was in 30+ languages, so we settled for a much simpler solution: ban vowels from the middle of the record locator. I guess a few Polish swearwords could still slip through, but we never got any complaints after that.

A few years ago I was watching a consultant troubleshoot a program. He was typing in a SQL query to determine the average length of time each transaction was taking. This was a promising approach, but it kept coming up with errors.

I pointed out that he had a typo in every instance of the field COMMONNAME – they were all entered as COMMONNNAME. This prevented it from correctly calculating the average.

So you see, the Ns didn’t justify the mean.

the monitor power came from the PC through the single cable to the monitor from what i recall.

.CSV (comma separated value) files can be a handy way to put large data sets into your program, because you can create and edit them with common spreadsheet apps like Excel, and the resulting files are pretty easy to parse. This allows you to use a common off the shelf tool that lots of people already know how to work, rather than writing and debugging and training users on a customized editing program.

If part of your data is text, though, bad things may happen if the text contains commas…line feeds too, but you really have to work at it to get those into a spreadsheet cell. The Open Office spreadsheet also uses a 12:34:56 AM time format by default. It will open a file with 13:34:56 data and display it as such, then “helpfully” convert it AM/PM values when you save it. If the values represent not time-of-day, but elapsed time, then it didn’t really make sense for the application to parse for different time formats. Fortunately our applications don’t need elapsed times nearly as long as 24 hours, or I’m sure the spreadsheets would find other ways to annoy me.

And yeah, the last few weeks I’ve had some time, so I’ve been working up some Excel macros that will address these and other issues.

I work with a large amount of legacy software which uses an off the shelf package for the GUI. The old versions of this software had a bug in that the supposed blinking color function didn’t work. The previous programmer used large numbers of these blinking colors that didn’t blink. The new version of the software package fixed the bug, and now our displays will induce epileptic seizures. The old version will only work with single-core processors, so we get to spend some fun time going through and fixing all the blinking graphics to make it work on currently available hardware.

This happened just today.

For a final project for a class, my group made an electro-mechanical Pacman game. It’s pretty sweet. Anyways, we used a joystick as a controller. The innards of the joystick are pretty standard - Chinese made, basically four 2-terminal microswitches. These microswitches are functionally equivalent to pushbuttons - you activate the switch by moving the joystick, which moves the metal lever, which pushes the switch, which causes 2 contacts to touch creating a current.

This morning, after the joystick had been working fine for a month, it stops working. Uh oh. We have to demo the project at the local science museum in a few hours. But the joystick’s dead. We jerry-rigged a solution - four pushbuttons mounted in a cardboard box - the box that the joystick came in! That worked fine enough for the demo and the kids loved it.

Later today, I took apart the microswitches to figure out what happened. We made a box for the joystick out of acrylic and decided to use super glue to fasten the box. Bad idea. Apparently, when you super glue acrylic it creates a very fine mist of superglue and it gets EVERYWHERE. What happened was the superglue mist made its way in to the microswitches and prevented the metal contacts from conducting electricity. I scraped the residue off each of the contacts with a pocket knife, reassembled everything, and now it works perfectly fine. Luckily, we haven’t been graded yet.

I don’t think that this really qualifies as a bug fix, but it was definitely an a-ha moment.

About a million years ago I asked someone to get me some data. He finally came through last week.

It’s a list of every surgery performed by certain surgeons over the past nine years. But the data looked weird. And I had no idea who the surgeons were. Varand? Robmol? Morrab? Who the hell were they? They’re not in our department!

I searched the company directory. No such people. Did the data extraction guy use some complete other hospital.

I was about to shoot an e-mail off to the guy, asking who those doctors were. I was expecting to see certain names, like Andrew Vargas, Molly Robinson, and Robert Morrison.

Then I had a :smack: moment.

http://boards.straightdope.com/sdmb/showthread.php?t=311773 A similar thread I started a while back with my best story.

I once had an involvement with repairing IBM mainframes; I could tell 1st- and 2nd-hand stories of interesting bugs, with the bugs often more interesting than the fixes.

These days I’m a low-power user, but have experimentally developed a fix for a problem I have with my Internet connection.

I connect via a Novatel Wireless CDMA (Mobilink) modem, that looks like a flashdrive. If I don’t access Internet for a few minutes, down-transfer ceases. (I can click “Try again” for quite a while. Sometimes it heals after several minutes.)

But through trial-and-error I’ve found a way to get reconnected in 27 seconds, that almost always works!

  1. Remove the modem from its USB port.
  2. Wait until the Mobilink window acknowledges “No device”
  3. Reinsert the modem.
  4. Wait until Mobilink acknowledges “Ready.”
  5. Remove the modem again!
  6. Wait, reinsert as above.
  7. Click Connect.

If you omit steps 5-6 you’ll get a “Modem already in use” error message.
If you click Disconnect before step 1, and don’t wait for the time-consuming Disconnect acknowledge, fastest recovery may then be reboot!

(I think the underlying problem is with my provider, but it could be defective modem or laptop virus for all I know.)

Am I going to be the only Doper who will admit to an actual Grace Hopper moment? Yes, a server went down because an insect got stuck in the CPU fan.