Computer “glitches”. Why?

Oly · May 10, 2023, 2:53am

Alas, my friend. What I so desperately hoped would be a panacea has turned out, as before, to be just a “See, ya”.

Didn’t work today

Goin’ back to my chill pills.

Francis_Vaughan · May 10, 2023, 3:23am

A zillion years ago I was talking to the plant manager for what at the time was a DEC fab. He related how they discovered there was a spot on the wafer where they got more than average failures. Turned out that was where a jet of cleaning fluid impinged.

Nothing is easy.

Once we get to hardware glitches all bets are off.

I once spent a solid week chasing a bug that looked for all the world like a processor issue, but turned out to be a flaw in a kernel exception handler.

Writing and debugging software on beta hardware next to the guy designing the hardware is fun. We had a deal that whoever was responsible for the bug of the day bought dinner. (Lots of late nights on that project.) We ended up about even.

Voyager · May 10, 2023, 5:35am

That problem I know. When I joined the processor quality group, the first thing that happened was that we were having tons of failures, all due to the processor. I got asked if we should recall all the parts from the field. My intuition was that we shouldn’t, which was the answer our executive VP wanted, it being cheaper. It turned out I was right. The boat carrying the wafers at our supplier contaminated some of them, and they failed early in their lives. If we had recalled them, we would have taken the parts without the problem and replaced them with parts that might be bad, so our quality would have gone down. It got me to thinking about no trouble found parts (ones returned which pass everything) and I got a good paper out of it also.
In my experience, as a computer scientist who worked in hardware, software accounts for most of the bugs. So I think you made a bad deal.

Voyager · May 10, 2023, 5:39am

When you reboot and try it again, you are not doing the same thing. You’re working in a totally new environment, so Einstein’s observation does not apply.
When you test you have to ensure you are doing the same thing over and over, and we sure don’t let random programs run while we do it.

solost · May 10, 2023, 12:05pm

Perhaps my post wasn’t entirely clear. Let me emphasize the important part:

So in my scenario, I try a computer action, it glitches, I briefly consider rebooting but then decide to just try the exact same thing again, and it works the second time.

Also, just for the record, I also mentioned that that quote / observation is incorrectly attributed to Einstein.

TwoCarrotSnowman · May 10, 2023, 6:39pm

The action you perform may very well be exactly the same, but the state of the system upon which you are performing it may have changed in invisible but significant ways.

Voyager · May 10, 2023, 6:40pm

Sorry, I missed that. However other processes are in different states, the memory allocation is different, you might be running on a totally different core, etc., etc. Oh for the days of minis when no one was on the computer except you.
As for Einstein, I suspect, like Yogi Berra, that he never said half of the stuff he said.

solost · May 10, 2023, 7:29pm

Guys, it was really just an off the cuff joke: “Whoever said ‘The definition of insanity is repeatedly doing the same thing and expecting different results’ never used a computer, amirite?”

I am aware of the fact that background processes and memory are constantly changing, but it’s amazing how often I have experienced “try computer action → FAIL…try exact same computer action 10 seconds later → SUCCESS”

Voyager · May 11, 2023, 1:26am

I spent over 35 years trying to prevent hardware glitches and debugging some, so thanks for the opportunity to blather on about it!

solost · May 11, 2023, 12:31pm

Hey, glad to help!

Max_S · May 13, 2023, 8:51pm

At that age, the physical components of the machine will start to break from wear. They simply aren’t designed to last that long - think blown capacitors, dead random-access memory, and hard bad sectors on hard disk drives. Even with a 20-year old car, you’re going to have to learn diagnostics and repair because mechanical failures (glitches) will start to appear. Or at least hire someone for that purpose.

~Max

Cervaise · October 26, 2023, 9:40am

Bump, because I just happened across this recent article and remembered the old thread:

It’s a very clearly written article with a minimum of technical jargon. It effectively explains why this annoying little defective behavior (very similar to the floating message in the OP) was happening in the first place, as well as why the bug persisted for over twenty years before somebody decided to fix it (a combination of uncommon trigger conditions plus the fact that it was just an inconvenience and didn’t really break anything major, so it wasn’t anyone’s priority).

For anyone in the thread who would find it useful to see a concrete example of some of the concepts discussed, this should be an informative story.

Hari_Seldon · October 27, 2023, 8:17pm

I will tell my little story. This happened nearly 40 years ago on IBM-PC. No multi-tasking, no windows, etc. I wrote a mini-TeX interpreter to print out a book I was working on. The language was the stack-based language Forth. The program read a character from the floppy and either output a character, rolled the printer a half-step up or down or executed a Forth command (I’m simplifying a bit). For some reason that now escapes me, sometimes I needed to take a byte off the stack and then take it again. So I used the stack - 1 (Don’t so that!), figuring that no process was going interfere here. Every long once in a while (it took a few pages of printout to happen), the wrong character printed. It took me a long time to find the cause. Which was that it was not quite true that no other process was running. Around 18 times a second the computer was updating its internal clock. The clock program used the stack and restored it, but it certainly had no obligation to restore what was below the stack pointer. Several years later a colleague came to me asking for help with a similarly uncommon glitch. He had done the same thing. I immediately asked him if he had ever gone below the stack pointer. He had. When he stopped doing it, his problem disappeared. He thought I was a genius for finding it so quickly.

Guapo · November 1, 2023, 9:40am

I was looking at some C code one day and noticed that the last parameter of one of the function calls was omitted. The code had run for years with no errors because, by coincidence, the correct value was on the stack. Of course if someone had rearranged the code some different value would have been on the stack and it would no longer have worked. Lint probably would have caught it, but obviously was not run.

Voyager · November 1, 2023, 6:16pm

Exactly why object-oriented languages were invented. In the early days stacks were used as a simple example of an object.
I had a friend who loved Forth, and I read the manual. Never did it for me.

Topic		Replies	Views
My monitor is broken, sometimes, ambiguously, until it isn't anymore. I HATE THAT Miscellaneous and Personal Stuff I Must Share	3	822	January 15, 2009
Help diagnosing a computer problem Factual Questions	5	772	November 27, 2004
Tell me about your computer's inconsistencies In My Humble Opinion	12	2178	March 10, 2012
Yet Another Computer Problem [Long] Factual Questions	23	1593	June 8, 2009
Anyone ever have a malfunctioning computer "heal" itself somehow? In My Humble Opinion	48	1702	February 26, 2018

Computer “glitches”. Why?

Related topics