I'm an idiot, or bugs you've created

I made some changes to a script filtering a 29 million row file of summary statistics for genetic variants to remove all variants close to one of a few hundred known variants.

The changes required replacing a for loop with a while loop and increasing the row index manually for some but not all iterations.

The new version didn’t finish in the time I expected and was clearly in an infinite loop, but it took me very long, and removing one other bug, to notice that one of the four i = i + 1 lines actually read i = 1 + 1.

The signs were all there in the output, and I just didn’t interpret them the straightforward way!

Was this in a laguage where you couldn’t use i++ or i += 1 ?

I left out an End-If once.
(VAX BASIC)

The worst bug I ever created was in a stack-based language (Forth). I figured that even after I pulled something off the stack, it was still there, just below the actual stack. What I forgot was that, at least on an original IBM-PC, the clock interrupts 18 times a second to run its routine. The result was the worst thing possible–an extremely rare bug that just caused an occasional error that was very hard to track down.

A colleague came to me with the same kind of intermittent bug. Also in Forth. I asked him one question. Had he ever reused something that had been popped from the stack? Yes he had. The expression on his face when I explained was priceless.

Yeah. It’s in r. So the line was actually i <- 1 + 1

This isn’t mine, but one somebody here posted about years ago – I can’t remember who it was or when they posted.

They were debugging, and added code that captured the state of something and drop it into a little text file before the program could crash, so they could at least see what was happening shortly before the crash.

But when they went to look at that text file, they got “File not found.” They kept working on it for quite a while, certain that the program should be creating the text file, but unable to find it.

Only after fighting this battle within a battle for some time did they realize that “File not found” was the text their file contained.

I’ve done something very similar - created some loop, and then realizing after several runs that I fail to replace a couple of the placeholder/testing digits with i or k or whatever the increment is. I’ve done it often enough that it’s the first thing I look for now when a loop misbehaves.

Going way back, when I first joined my graduate research group, I was doing some circuit analysis in Fortran, programmed on punch cards and run on a CDC mainframe in the Computer Science department. For bookkeeping purposes, the time on the mainframe was billed at $XX per CPU hour and each professor was given a “budget” of $Y,000 dollars for computer time.

One morning, my professor called me into his office and told me that the job I had started the night before had run all night and eaten his entire budget for the group for the year. Turns out I had a loop in the program that somehow managed to run without crashing or violating any system constraints, so an expected few CPU minutes program had run for something like 12 hours.

I don’t program but I do dabble in putting ccs and html tables on my webpages. On one particular page I had a typo or omission in a table that caused words to spell out vertically instead of horizontally. It was fascinating to look at, sort of like rain running down a windowpane with longer words creating longer “raindrops”, but virtually impossible to read.

In my first job after college I had to write some code to do something along the lines of reading some values from a config file and use them to configure some sort of sensor. We tried my code with some known good values in the config file, but the sensor wouldn’t produce good data. The guy who designed the sensor hardware had a simple script he used for his initial prototyping with those same good values just hard coded in it; when he ran his script it worked perfectly. I kept telling him I swear, my code is doing the same thing as your script; I don’t know why it doesn’t work. Finally he sat down with me and we went through my code line by line and compared it to what his script did, and we found the bug: Many of the numbers we were using were in hexidecimal, and in most places I correctly handled that. But for one of them I apparently had a brain fart and treated the number as a regular decimal value, and was therefore programming the sensor with some really wrong setting.

Also in genetics…every year or two, one of the graduate students manages to reinvent the fork bomb.

Many genetics analyses are (technical term) embarrassingly parallel. Some tasks can iterate over whole chromosomes, and others over sets of 10,000 SNPs, or whatever. That often means there are lots of files to which the same thing needs to be done.

Combine that with a supercomputer, which is really just a bunch of regular computers connected together and managed by some fancy software. So, tell the fancy software do-thing block1 and it will dispatch the job to a free computer to do the the thing. If there are 10,000 blocks, the obvious thing is to write a script to tell the fancy software to dispatch jobs.

So, people will write a script to start a bunch of jobs, but sometimes the jobs are slightly different, so it would be nice to be able to use the same script for different kinds of jobs. Write the script so exactly what it should do is on the command line:
launcher-script.sh analyze-script.sh --output run1 --test all --input
or
launcher-script.sh analyze-script.sh --output run2 --test all --input

Intent: The launcher-script.sh will dispatch a bunch of jobs, each one running analyze-script.sh, but in a way where the arguments to analyze-script.sh can be easily changed.

launcher-script.sh:

#!/bin/bash
# My script to launch a new job for each block to analyze
for i in /data/inputfiles/block*
  do
    sbatch $0 $i
  done

Those who have done a bit of bash scripting will notice that somebody has used $0 as the argument to sbatch (the program that schedules a job). In bash $0 is an automatic variable, which is replaced by the name of the program that was run.

So, when launcher-script.sh is run, it will schedule itself to run once for each input file, and when each of those run, they will in-turn schedule more instances of launcher-script.sh to run.

sbatch launcher-script.sh block1
sbatch launcher-script.sh block2

etc. In a cool sci-fi movie world, this would cause the supercomputer to shoot out sparks, and smoke. In the real world, the second time it happens the user hits the limit for the number of jobs that can be submitted by a single user, eventually notices the problem, and scancels all of their own jobs. The first time it happens there is no job limit, the scheduler grinds to a halt (or just dies), an admin has to cleanup all of the jobs, and then implement a per-user job limit.

What should have been used was the variable $@, which is automatically replaced with everything that came after the name of the program on the command line. When working properly the launcher-script.sh would create lines like

sbatch launcher-script.sh analyze-script.sh --output run2 --test all --input block01
sbatch launcher-script.sh analyze-script.sh --output run2 --test all --input block02

etc. Each of the analyze-script.sh jobs would do their thing, and then exit.

Remember when the IBM PC and the Apple ][ came with BASIC installed? And you could peek and poke directly into memory locations? And they published things like how memory addresses mapped into hardware settings? And if you modified certain memory locations, that you could change things like how the display works?

Oh, and did you know that the instruction POKE(35, PEEK(35) OR 7) doesn’t mean you get to choose between poking either the value of the peek of 35 or the value of 7. Oh, it’s a bitwise OR, not a logical OR? I was amazed at how quickly I was able to turn the computer off before it started smoking (it was just fine), or even worse, before anyone else noticed what stupid thing I just did.

Doesn’t matter much for me… even with a language that has features like that I’m perfectly capable of coming up with even more dumb bugs!

This reminded me of a group project in a programming class at university. We were to program a small electronic circuit simulator capable of simulating the response of a capacitor, resistor or coil when a voltage was applied for a given time and then shut off.

We were given a set of values this should work for in a practical demo at the end of the project and I was tasked with writing the functions doing the simulation, while the other guys did the interface and graphics.

The demo worked flawlessly, but there was a crossover to a basic electronics course where we were to use the simulator to solve some very simple problems and with at least one combination of values the capacitor behaved normally only until the voltage was switched off and then it kept increasing …

We borrowed a different group’s simulator to finish the project. Their code produced the right result, but they’d made the incredibly annoying choice of requiring people to insert an actual μ when adding e.g. microvolts, which required doing the alt+0181 (maybe) combination. Our superior interface allowed the use of a simple u.

Um … … what? … … :flushed:

The worst ones I’ve made aren’t simple ones like in the OP, but when I miss a corner case - which promptly comes up during testing. Though I’ve done something like in the OP also, but since I could fix it easily I don’t consider it anywhere close to the last one.
When I interviewed candidates with programming experience one of my questions was “what’s the worst bug you committed.” If they didn’t have one I’d question how much programming they actually did, and as a believer in egoless programming I wanted to see if they could admit it and laugh about it.

Also, I learned as a TA in a PDP assembler class that people tend to make the same mistakes. When the students came to us, asking if we could help debug the code they’d just spent an hour on, I could usually find it in a few seconds. They though I was a genius - actually I had seen exactly the same bug five times before.

You had to be there I guess. :wink:

Can we include simple errors that our students have made that totally stumped them but are easy to find, mostly because you’ve done it yourself? If so, I had a student in a Unix/Linux class who was trying to write a quite simple shell script.

“I spent hours on this last night and it just doesn’t work.”

I looked at it for about five seconds, and said “Look at line 17.” He said, “It looks right to me.” “Look at the end of the line.” “I don’t see anything.”

Finally I told him to increase the font size on his editor. “Bigger! Even bigger!” Then he finally saw it.

Yep. He had typed a colon: instead of a semicolon ;

I program primarily in R, which is great for doing some stgatistical analysis on the fly with command lines but can be dangerous.

Particularly dangerous is that they use “=” for assignment, and “==” for comparison.

So you might for example want to run the line

is.ABC = Important_data ==“ABC”

whichs a vector is.ABC indicating which elements of the vector Important_data are equal to “ABC”.

But if you type

is.ABC =important_data = “ABC”

You have just erased all of your important data and replaced it with the single value “ABC”.

Doh!

And speaking of that, in C and C++ code like this…

if (a = 0) {
//Do some stuff
}

…is perfectly valid code and will happily compile. But it won’t do what the programmer probably intended. What it will do is assign ‘a’ the value 0, then evaluate whether ‘a’ is “true” or “false”. Since ‘a’ is now 0, it evaluates to false, and doesn’t execute the code inside the if block, ever. And ‘a’ no longer has the value it’s supposed to have.

Modern compilers will, I believe, at least give you a warning if you do something like that – “Did you mean ‘==’?” But the old crappy compiler in computer lab in college would just happily compile that code and give you no indication that something might be wrong.