Why can't windows be restart-free like linux?

I’ve played with Linux a few times and I really love the way I can install, uninstall, and update software and even the OS itself without having to restart afterwards (except in rare instances). Windows, on the other hand, seems to want to restart after everything.

I know a large part of this difference is due their architectures and how programs, plugins, libraries, etc. interact with the kernel.

But why hasn’t Microsoft remodeled their architecture (in one of their major overhauls that seem to happen ever 2-3 versions) to enable this restartless functionality?

**Context: ** I’m an experienced techie with an MCSA and A+, so I’ve got a fairly solid understanding of how they work “under the hood” (and if not, I know enough to find and understand an explanation of something new).

I suspect there’s a good reason for this state of affairs and am curious about what it might be.

I always credited annoying MS procedures to bad design and lack of sophistication. Novell had reboot-free installations on their servers in the 1980’s. I can’t imagine having to inform 100 networked users to save their work and exit just because I installed a tiny utility on the Novell server and had to reboot a few times per day. If Novell could do it, why not Microsoft?

True story: Nobody I know can ever recall a time when they had to reboot a Novell NetWare machine. Those things would sit and hum quietly for decades.

There’s a recent Ars Technica article documenting a NetWare server being taken down after 16 years of uptime. Because the bearings on its drives were giving out.

That Netware server was still running 3.12 - it had never been updated. You most certainly had to reboot Netware if you applied an update. 3.12 was reliable, but it was also pretty simple, and that 16 year old server cannot have done much (or quickly).

And you do have to (generally) restart Linux if you update the kernel (I have a kernel upgrade due on my server, after ~50 days uptime). There are companies that produce inplace patches for kernel bugs (I think for Oracle and maybe Redhat Linux) but it is very tricky - you have to be able to lock the subsystem to prevent calls, patch the entry points to redirect them the new routines, and then swap the in-memory images without damaging any data structures. Not all patches can be done this way, but I think the concept will eventually work it’s way into the kernel/distrobution mechanism. It is very hard to keep things consistent while doing this, though. Linux does have better module handling (in-kernel features like device drivers) - you can update most driver modules on as running system, but you are still going to have to reboot for deep-kernel stuff. Linux is also getting better as more stuff is pulled out of the kernel to userspace - in particular, graphics drivers and the X-server.

As for Windows, you don’t always have to reboot that, either. I see some pretty impressive uptimes on our large estate. But it is heavily developed and security patches are regular, so applying and rebooting is a good idea. In terms of server systems, I don’t think it is much worse that Linux, and the architecture is really all that different (although low-level driver replacement isn’t as good as linux modules). Windows OSes still retain a lot of privileged services (in particular, graphics drivers are in-kernel for performance) which means that updates to these require reboots.

The ultimate for reliability/updatability are microkernel systems (such as minix or GNU/Hurd). These have a very small kernel that does a bit of scheduling and basic memory management - all other OS services including i/o and memory management are run as userspace apps launched by the microkernel. If a service goes down (failure or update) the microkernel just respawns a new copy that should hopefully pick up and carry on. But they are generally just research projects currently, although OS X does have a microkernel-ish core.

Most of the time in Windows you can ignore the restart warnings. That is to say, “You must restart your computer before using this software,” has more to do with customer support than any actual technical requirement, in my experience. YMMV.

Oh, and to answer a further question in the OP …

Modern processors have a number of privilege levels - the highest (ring 0) has the most power and the least restrictions. The core of the OS kernel runs at ring 0. The least privileged level is userspace, where user applications run. There are intermediate levels as well.

Any operation that crosses a privilege boundary is called a mode switch - these can be expensive operations for the CPU (particularly if you need to save the process state, called a context switch) - they take time and resources. You can reduce the number of mode/context switches by running most of your services at ring 0, but none of them are isolated, so a problem with the disk driver could corrupt the scheduler, and thus the entire system. If you fully isolate the drivers into user space, you have process protection but more mode switches and more performance issues.

Designing an OS is a matter of balancing those competing situations. In older versions of Windows server, the graphics drivers were moved out of the kernel into a lower privilege level. Eventually they were partially moved back, because the performance cost was so great.

Linux and Windows have different philosophies about this split. Linus Torvalds seems (in my opinion) to want to get stuff out of the kernel (particularly new services), but is very mindful of the performance issues, so wants people to focus on that as well. And he doesn’t like breaking existing stuff, which means a slow process of change. Microsoft are very mindful of backwards compatibility (which they have done incredibly well), so they support a lot of older drivers which limits the deep changes they make. New versions of the OS now require signed drivers, which gives them the chance to move interfaces forward to improve reliability and maintainability.

Conversely, there’s a story (it might be only an urban legend) that an early version of Windows had a serious bug that made the clock stop after 22 days, but it wasn’t discovered for years because no one could keep Windows running that long.

My Win7 workstations frequently run in excess of 100 days before restart (usually manual, for things like swapping a UPS or something). My Win2K3 server runs many hundreds of days between restarts.

The Linux jiffies bug was similar, except it required a reboot every 497 days and that actually did piss people off.

Windows has improved significantly the amount of things you can do without a reboot but it still lags behind linux for historical legacy reasons. Windows was originally designed as a consumer OS where minimal amounts of downtime were not a huge deal. Linux came from a unix tradition of workstations/servers where downtime required planning as was expensive so there was motivation to design things from the beginning to support rebootless updates.

However, the tradeoffs of enabling updates without rebooting are that it adds significant complexity. Think of it like changing the tires on your car.

Most people change tires like this, you stop the car, change the tires and then start the car back up again with new tires. It’s simple, reliable and most people can live with the delay.

If you really need your tires changed faster, you might re-engineer your car so you can do something like this. It’s more work to change a tire but at least you can get it done fast enough that the user barely notices (most software updates don’t really happen seamlessly. There’s a small gap as the old service is spun down and the new one is spun up but it’s short enough that you don’t notice).

However, if downtime is really, completely unacceptable and you need the car to be running the entire time, then you might have to change your tires like this and that’s not something you want to do unless you’re truly crazy. There are languages like Erlang which use Dynamic Software Updating to achieve this but you have to write your code in a very particular way to support this.

TL;DR: Windows is a Toyota Camry that’s being modified for F1 purposes and Linux is a F1 car that’s being modified for Toyota Camry purposes and neither is perfect at being the other.

Those videos are awesome, and perfectly illustrate the point - in place no-reboot patching can be done, but it is tricky, takes more work, and it is usually easier to just shutdown and restart. Or follow the VMS cluster approach - you could move processes around to upgrade and reboot nodes in the cluster without ever taking the entire cluster down - some of the VMS cluster uptimes were huge, and covered multiple VMS upgrades (and even hardware upgrades - the VAX-11 cpu could emulate older PDP-11 hardware until software upgrades were available).

To be fair Shalmanese, while Windows 1-3, 95/98/ME were workstation only OSes, all versions (server and workstation) beyond that point were built on the NT 3.5 core, which was a proper server class OS designed by Dave Cutler (of VMS fame). However, MS didn’t have the vision to drive the NT core into a clustered OS at the time even though VMS had been doing it for years at that point.

It was Windows 95/98 that had the timer overflow bug (49.7 days) - as a consumer workstation OS, long uptimes was never a part of the design. It was only as people started using them as a low-spec workgroup server that people found the bug - and they did stay up and running for 50 days to trigger that bug, so they were doing pretty well. The server based OSes (Windows NT and following till the Windows 2000 workstation/server merge) never had this issue.

One of the small differences is that MS will normally request a server restart, when only a service restart is required.

(I have heard that ) in the last couple of years, it has become common for linux/gnu updates to automatically stop/restart server deamons as part of the update process.

When I was a lad, ordinary *nix users didn’t even attempt to update services: like everything about *nix, knowing what to stop and how to start it required guru knowledge. They just waited, and installed a later version of the OS.

For an end-user, an update that automatically stops and starts daemons seems like a good thing.

For a server admin, not so much.

Which brings me to the obvious question: what is this fascination people have with leaving their workstation running? Even when I want to talk to my workstation at work, I just use WOL to wake it up, and most people never even need to do that.

For myself I always remember a fidonet discussion 25 years ago, about if it was better to leave your computer turned on, to avoid switching it on and off, or to switch if off and on, to avoid leaving it turned on. The discussion was terminated by two posts in quick succession:

One said: I was working at my computer one day, when a gout of flame errupted from the back and ignited the curtains. Now I always turn my computer off when I leave the room.

The other said: I was working at my computer one day, when the side of the monitor turned brown and caught fire. Now I always turn my computer off when I leave the room.

That’s less important now, but I still have no good reason to leave my workstation turned on for more than 10 hours at a stretch.

If you ignore the warnings, then you haven’t actually done the update…

I seem to recall MS DOS from early days and Windows 1.0, and then on and on.

My take is that major obstacle was a dynamic of three concepts: (1) ease of use, (2) backward compatibility and (3) initial limited design.

One of the primary examples of where does sticking to these principles lead is various non-standard operations Windows team had to come up with as they moved from simple to more complex computer architecture: http://searchwinit.techtarget.com/definition/thunk

However, on the bright side, those three concepts supported their business model and that’s why Bill Gates is the richest dude.

How long does it take for yours to start up? Mine is five minutes (on a good day).

I’m truly surprised that no one has mentioned the obvious: It’s a feature that appeals only to the geek niche. It would never be mentioned in mass advertising, so why bother?

Windows has gotten better about reboots since the old days. I remember when upgrading a video driver required you to:

  1. Uninstall the old driver.
  2. Reboot in 640x480 resolution.
  3. Install the new driver.
  4. Reboot again.
  5. Rearrange all your desktop icons that had been randomly tossed around due to the resolution change.

Now, I can just install a new video driver. The screen blanks out for a few seconds and that’s it.

So, back to the original topic, the answers I’m getting is that:
[ol]
[li] Windows started out that way and can’t change now because of backward compatibility
[li] Restartless updates require much greater complexity in the OS and any software that uses that update process (extrapolating a bit there). (Thanks to Si and Shalmanese for these two)
[li] Restarting isn’t a big issue for non-mainframes, and so it was never included
[/ol]

Am I missing anything, or is that it?

Software updates: Generally speaking, when an application installer asks you to reboot it’s a lying liar who lies. (Even device drivers.) Now, that said, it may require you to log out and log back in… and it may be excusable in that “restart your computer” is easier for the layman to understand than “log out and back in”. (A lot of Windows users don’t even know it’s possible to log out without shutting down.)

You’ll note that installers by companies who actually have competent programmers (for example, Microsoft themselves) never require a reboot. Don’t confuse “OS runs software made by bad programmers” with “OS is bad”.

OS updates: Linux needs to reboot for these also, but Linux distributions usually trust the user to know for sure whether they do or not and don’t reboot automatically. Why does Windows ask you to reboot after installing OS updates? Because most applications are using the same few shared code libraries… to explain:

Say there’s a security flaw in MSHTML.dll. Your computer is running 6 applications, all of which are some way reliant on MSHTML.dll. When the OS updater runs, it has no way of killing off those 6 applications without the real risk of the user losing data. And it can’t replace the file without those applications quitting first. So the solution Windows uses is, it places a little marker in the boot-up sequence that says “after you boot but before any applications run, swap-in this updated version of MSHTML.dll”, then forces you to reboot.

Once that boot-up message is in place, there’s nothing technically requiring you to reboot except the danger of running a .dll file you know contains a security flaw.

Now Linux works somewhat differently-- when a program in Linux uses a shared library, the OS won’t give it the version on disk but will instead copy the library into memory and let the program use the in-memory version. The advantage of this is that Linux updaters can switch-out the insecure library without having to first quit all the applications using it. The disadvantage is the applications using it will still be using the insecure version of the library (from RAM) until the computer’s restarted anyway.

So the practical different is nil: Windows requires a restart for security updates, and so does Linux.

So why do people think Linux doesn’t? Because Linux updaters, generally speaking, don’t automatically restart for you, and also don’t tell the user that they need to restart to have a secure system. Linux OSes trust the user to be savvy enough to read the list of patched files and determine for themselves whether or not they need to restart affected applications.

Note also that some people promoting Linux will bend the truth and say things like, “you don’t need to restart Linux to apply updates-- you just need to quit every single app and service and log out all users.” What’s the difference between that and restarting? Not much, practically speaking.

I hope that helps answer the question.