What happens when a computer freezes up, and why can't we make computers that don't?

When your computer totally freezes up and has to be turned off, what has actually happened? And with all our technology why can’t we make a computer that won’t freeze?

It can be a number of things. Poorly designed or defective hardware, buggy system software, overloading the processing capability of the system.

We can make computers that are extremely reliable and almost never freeze. The problem is that there is a very limited market for that type of computer. Most people want computers that are fast, cheap, and brimming with the latest bells and whistles. They don’t ask for MTBF or availability numbers, information on how the computer was designed and tested, or what fault-detection features are present. The thought process of the average customer is “Ooh, Shiny!”.

And even the computers that don’t stop, sometimes do.

I have a customer who managed to stop a NonStop system. First time it had been done in the country too. How? Their datacentre wasn’t very good (temperature, whiskers etc). Sometimes, without a highly controlled environment external factors will play a part.

The average PC sits under people’s desk, gets dust in it, got cooked that hot weekend last summer and keeps getting kicked accidentally when they turn around. None of that is good for the hardware.

And then there’s the software faults…

most of the computer freezes and the infamous blue screen of death are caused by third party drivers that have bugs or glitches.

Don’t recall ever having a complete freeze on my cheapo laptop which is now a year old and runs Windows XP.

This is actually a central problem in Computer Science. In essence, it’s impossible to always tell the difference between a program thats frozen and one that is merely running for a very long time.

I thought that the halting problem was not being able to predict whether a program would halt or run forever.

It is, but many other problems cannot be solved because doing so would imply that the halting problem could be solved.

Yeah it is… but Shalmanese was also more or less right, in terms of turing machine architecture. (Maybe joking a little too.)

‘Program will halt’ - in Turing machine terms, this is saying that will give you some kind of results, and possibly accomplish something productive. (Not every turing machine has been DESIGNED to do anything productive, of course.)

‘Program will run forever’ - This is saying that the turing machine has gotten stuck in some sort of infinite loop or a series of feedback conditions from which it will never emerge and return results for you. This is their equivalent of a situation when you need to turn the turing machine off and boot it up again… :smiley:

and as Shalmanese said, it’s fundamentally undecideable which of the two is occuring unless you’re willing to wait around forever.
I’m not sure that these results are directly applicable to modern-OS computer freezes though. It’s early here in ontario. :wink:

We can. They’ve been around since the 1970s at least.

Hardware wise you can make a high reliable PC by using the best components, running them well below their stress points, and making sure the PC doesn’t suffer electrical surges or excessive heat buildup.

Based on my experience with XP systems the vast majority of hangups are software related and seem to occur where you have devices (USB, CD burners, song ripping applets etc) that get hung up trying to process data. Programs are often written sloppily, and over the last few years I have discovered lots of things (Mostly AV stuff) that can really jam XP. In those scenarios even if you can shut down the misbehaving app the system is still wonky until you do a restart.

The Windows XP OS is among the most complex machines (and make no mistake software is a machine) ever devised. Given its complexity and what it’s asked to to it’s amazing it works as well as it does.

I think that an explanation more in line with the tone of the OP might be: computers rely on an exact, deterministic process (i.e., syntactic computation). If one link in the chain of computation is broken, the process breaks (this is not accounting for endless loops and the like, which may or may not be “broken” – for instance, it may be due to a programmer’s logic error). For example, a computer must “know” where every piece of information is; if an instruction is located at memory address 10, but it is told the instruction is at memory address 50, it attempts to process nonsensical data and breaks.

Every level must be coherent; at the lowest level, an electrical signal out of specified range is enough to “flip” a bit, which can make the number 8 (1000 in binary) appear to be 12 (1100 in binary). Which leads to the next question:

As others have said, we can – assuming controlled circumstances. That is, under specified environmental conditions (e.g., 70 degrees F +/- 20), or limited signal ranges (e.g., 0-5 volts), or tightly regulated input (e.g., dates between 01/01/1900 and 01/01/2000). Proving correctness within specified conditions is absolutely possible; however, the conditions must be limited to such an extent as to make such a system practically unrecognizable as a computer by today’s standards (more like a calculator). Furthermore, the world is generally unpredictable and uncooperative as far as these things go – wires break, circuits fry, people feed computers unexpected input.

The time and effort needed to guarantee correctness exacts a price (in various ways) that few are willing to pay.

Many non-MS OS based systems are quite reliable.

E.g., I had a Sun Sparc for many years. Was amazingly reliable. I finally found a way to get it to hang: installed a sound program to play music. Start up a song. Machine hangs. Tried it again for verification. Removed program.

So Yet Another Example of a third party program that directly accesses the hardware which causes problems.

But basically the kernel of Sun OSes are extremely robust. Apps might go south, but the machine would keep running and you could kill the bad app.

MS just doesn’t consider writing bullet-proof software a necessary thing. So the third party programmers don’t bother writing good software, etc. The bar has been set very low in that world.

You indeed cannot write perfect complex software, but you sure can do a reasonable job. Sun has been doing it for years.

Running Win XP SP2 on my new HP computer and also on my old HP computer. Never had a Freeze on either of them.

Oh the irony. In the middle of reading this thread my XP machine crashed. I’ll have to admit though, OS crashes rarely happen compared to Windows 98 because you can mostly save it through quitting an unresponsive program.

A computer running a program is a little like a person walking a thin path, and if they step off that path they get stuck.

It’s reading a list of instructions, and performing those instructions. Sometimes, after running an instruction, it simply runs the next instruction in the list. Often, the instruction it runs tells it to go to a different instruction other than the next one.

In either case, it must go to a “runnable” instruction (i.e. something on the path), and not onto data that is not to be interpreted as instruction (i.e. off the path). For example, it might accidentaly be told to run instructions in a particular area of memory that does not hold instructions, but data.

In other words, it’s like you were expecting a list of tasks, but instead were accidentally given a list of random items or letters. You might try to interpret these as instructions, but you’ll likely get stuck pretty quick.

As to why it’s hard to make programs that don’t freeze up, to put it simply, it’s simply very, very easy to make errors when programming computers. It’s the nature of the task.

Let’s say I blindfold you and put you behind the wheel of a Ferrari. Your ears are stuffed with headphones, and I dictate to you where to turn or accelerate or stop, based on my projections of where you’ll be, where the other cars will be, and how the streets happen to be laid out.

If one of the streets has some buckled pavement, that could throw off a little red Audi and cause it to nose into another lane and if that’s the lane I instruct you to turn into, maybe WHAM and that’s all she wrote, but even if not, even if you’re still conscious and the car is still running and rolling, the instrux you continue to receive via headphones don’t take that little collision into account. I say “OK, slow down and turn left” based on my projecton of where you’ll be, but that’s not where you are just yet, so my instructions would steer you into the side of a building.

Similarly, the programmer of MiniSoft Weird says “OK, allocate the following block of application memory and stuff into it the values you get when you ask the system memory for the parameters of the contents of folder XYZ, those parameters being held at the following blocks of system memory”. But meanwhile the programmer of Hellish Packrat USB Printer Driver has said “OK, pass any application print request to system memory in the following format”, and the programmer of the operating system’s NextCallRoutine has stated “Whenever the values at system memory address are ABC, allocate a new block of system memory and stuff it with the results of calculation XYZ”, and unfortunately for you, the request received from MiniSoft Weird didn’t account for those values being moved there so it reads those in instead of what the MiniSoft Weird programmer expected to find at that memory address, and so MiniSoft Weird gets a chunk of printer queue excessive-spooler overflow data crammed into a memory address instead of the recently-accessed directories info.

No problem so far, your computer keeps trucking right along until you hit the “Save” button. But then it tries to write your saved file to a file path composed in part of printer spooler information and xsj k s92ui69d2g 289???

splat.

Look, it’s not easy when you’re a programmer to anticipate every single possible situation. In Israel, the fighter jets’ computers kept crashing every time they flew low over the Dead Sea. A programmer had written a simple formula for navigation based on relative-altitude and land-altitude parameters. This simple formula utilized division. Programmer forgot to consider the possibility that plane could be flying below sea level. Who the hell flies their plane underwater? Ah, but certain areas of the world’s surface are indeed below sea level. And guess what happens when you try to divide by zero?

My computer rarely if ever freezes. It slows down, and some websites cause my browser to crash, or just slow to a crawl, but that’s easily solved by ctrl-alt-delete. I’ve had very few freezes-most of the time, I’ll get a warning that my memory is overloaded and so I simply shut everything down and restart the computer.

What I’d like to see is sort of an emergency shut off-for times when a storm comes up suddenly, and I need to get off quickly, without worrying about closing everything down.

We make computers that don’t freeze *all the time. They’re in devices all around you. It’s not hard to make a computer like this. Your car is full of them. They have dedicated software and real-time operating systems that are tightly controlled.

The problem with desktop computers is that they are designed to be general computing devices. The team building the operating system has no way of knowing what software will be run on the machine. And the software has to be given fairly low level access to machine resources for performance reasons.

Today’s PCs also operate in a hostile environment with viruses, code executing from web pages that was written by amateurs, many, many devices being connected to it with device drivers, etc.

Another thing about PCs is that they have an open architecture where different cards from different manufacturers can be plugged in to extend them.

In general, it all works amazingly well. Windows XP is extremely stable.

Computers “freeze” for generally one of two reasons.

The first is a hardware problem. Most often this is caused by the CPU overheating, but there can be other causes. We can get around this. Do a search on a “fault tolerant computer” though I’ll warn you in advance they aren’t cheap. Instead of paying hundreds of dollars for a computer you will be paying tens of thousands of dollars.

The second reason is that the software goes screwy. On older operating systems like windows 98, a misbehaving program had access to the entire computer’s hardware, so if a program went sour it could trash the entire operating system with it. In this case the computer isn’t “frozen” despite the fact that nothing is moving on the screen and you have no control over anything. Actually, what is happening is that the system is running around like mad doing useless things because it is executing garbage instructions.

It is more difficult to get this kind of freeze on Windows 2000 or XP (or Linux, for those of you who don’t like microsoft operating systems) because they have “protection layers” built into the operating system, meaning that when a program misbehaves (because of a bug or something) then that program doesn’t have access to the entire computer, and instead of bringing the whole thing to a halt, the program just causes a protection violation and the operating system shuts it down.

For Windows 2000 or XP to completely freeze, it’s generally a bug in something that doesn’t have these protections, like something inside the operating system or a driver. Of course, that’s not to say it’s impossible to lock up these operating systems. One easy and common way to do it is if your program has a memory leak. Programs are often grabbing a chunk of memory, doing something with it, then releasing it back to the system. When they are doing a whole bunch of things at once and the program is very complicated, it’s very easy to grab a chunk of memory and lose track of it. This memory never gets freed back up, and the longer the program runs, the more and more memory it uses because of these lost bits of memory. This is called a memory leak. Eventually the computer runs out of memory and the whole thing grinds to a halt.

How do you prevent these types of freezes from happening? Well, you have to get all the software writers to write millions and millions of lines of code without a single mistake in there anywhere. It’s not an easy task.

I happen to design computers that run in manufacturing plants. These have to be rugged, reliable, and can’t freeze. We don’t run windows, because it’s not stable enough. We use redundant fault tolerant hardware, so that if any piece breaks the computer keeps on chugging. We also have watchdog circuits attached to the computers. The computer has to tell the watchdog that he’s alive at least once a second, or the watchdog resets the computer. These computers will often run for 5 years or more without freezing and without resetting.