CPU tailored to OS?

I don’t know a lot about the technical details of CPU design or inner workings, so forgive my ignorance displayed in this question, but I’ve always wondered… time and monetary issues aside, could there be an enormous benefit to designing a CPU with a particular OS in mind? For example, I’m sure particular OS behavior, GUI and all, requires certain types of complicated repeated algorithms that to a general purpose CPU are each a complex and arbitrary task. But does it make any sense to think of a CPU that has some of those complicated tasks built-in? I realize that this is done on a small scale – like a math component of a CPU might be able to take a square root directly and faster than relying on a slower software algorithm, but could it be done on a much larger scale? Similarly, does it make sense to design a CPU whose instruction set is, instead of, say x86, literally a language like c++? Would it make sense to design a CPU that itself contained the OS completely, sort of like an advanced version of a BIOS chip? Would it be able to be enormously faster?


Reduced Instruction Set Computing, I gather. Has the instruction set ever been reduced to something where the cpu can handle an instruction like "cout << "hello world
“” directly? Where the reduced instruction is is literally just the C language. Couldn’t that profoundly improve speed (at the cost of relying on one static version of a programming language)? Possibly the instruction set could be expanded to also include some static build of the Windows APIs, etc…
I’m totally out of my league here, so have pity on me. I’m curious.

There a range of things that make a CPU’s ISA (Instruction Set Architecture) a good fit to an OS, but there are a few things to set down first.

You want to define what you think an OS is. The modern rather simplistic thought is that the OS is what comes on the DVD when you “install the OS”. Which includes all manner of things that are not what we would technically think of as the OS.

Next you need to look at the what performance issues are. The majority of time spent by almost all computers is not spent doing complex functions, but mostly just shovelling data about. The RISC (Reduced Instruction Set Computer) came about in part when it was realised that the effort needed to work out how to do complex function in the CPU was costing performance due to a range of issues. Just decoding the instruction took a lot of effort, and the resources needed to execute it caused all sorts of issues in making the CPUs run faster. The aphorism “make the common case fast, and the uncommon case correct” become a critical one in the progress of RISC designs. This is mainly to illustrate that putting complex things in the CPU isn’t always a win. But if the action truly is common, then maybe there is value. But The OS mostly concerns itself with moving data about and controlling devices and controlling sychronisation and controlling access.
So things that can help. If the OS defines a particular virtual memory paradigm ISA support can be anything from useful to vital. The VAX architecture and VMS were closely coupled here, and MULTICS was dependant upon support for its memory management on the hardware. The DEC Alpha processor has specifc features that allowed it to support VMS’s memory architecture. Closely related is support for the security models. VMS made specific use of the VAX’s architecture here. (Or rather VMS dictated the features.)

The x86 is a total mess of stuff, that has grown over time. There are some happy accidents. If your OS allows multiple threads of execution on a multi-processor in a single address space (as most do) you need to be able to provide a way of identifying threads that is preserved with context switches easily. The x86 has a few redundant segment registers that can be used for just this.

Other useful things you can imagine. Hardware assist for sychronisation. This can go very deep. Something like the Tera MTA which provided full/empty bits on each work of memory allowed for a radically different programming model and essentially latency free sychronisation (with a bit of magic.)

There have been not so successful attempts at putting a huge amount of smarts in the CPU. The Intel Iapx432 was perhaps the biggest disaster. A full OO programming model in hardware. But the complexity of the chip was such that it simply ran so low that it was useless. The Linn Rekursiv provided for callable microcode that supported an OO model - including the garbage collector - in the CPU. Only about half a dozen were ever made. The great D machines from Xerox Parc provided some other idea. The Dorado was microprogrammable to provide exactly what you ask about. Different OS designs or programming paradigms could be supported with specialised instructions. Or the various Lisp machines, that ran Lisp natively in the hardware. Even the OS was written in Lisp.

There are lots of amazing ideas that have been lost in the mists of time. Sadly the single biggest danger to progress in operating systems design is Linux. It popularity has almost totally destroyed research in the area. Similarly the x86 ISA has all but destroyed architecture research. Maybe the ARM will provide some opposition, but even it is decades old.

It is actually exactly the opposite.

A CISC machine (Complex ISC) has instruction like MOVC3 - which can move an arbitrary string from one location to another in one instruction. Sounds good? Then consider the complexity. How does the instruction handle arbitrary byte boundaries? The CPU will work with 32 or 64 bit words, but the string locations will sit on byte boundaries. What if the string crosses a page boundary and a page fault occurs in mid instruction? What if the CPU is interrupted by an IO request, what if the target area for the string crosses into a protected page of memory? So the CPU has to be able to stop, save, and restart the instruction. Also, it burns three registers whilst it performs the work, which need saving and restoring.

What the RISC guys found was that if the CPU could be freed from worrying about all this complexity it was smaller cheaper and ran faster. You could write a byte copy routine that was faster on the RISC hardware than the specialised instruction on the CISC hardware. For general programming having the compiler spend the effort working out exactly how to optimise the program once was vastly more productive than trying to work out how to make hardware that had to try to get it right in one or two clock cycles. The compiler is able to apply much more knowledge about what is going on than the hardware designer has access to.

RISC architectures dramatically reduced the complexity of instructions available.

Why is this? Is there simply not enough profit motive to sustain that research?

I guess you could call support for VM architectural support for OS’s, but it is kind of generic and the same concept extends within the CPU below the place where the OS sees it.

The problem with concentrating on the OS is that you need to think about application programs too - back when this work was being done, the OS was not quite as bloated as it is in Windows machines. So, there were two areas of special purpose architectural design.

The simplest was to measure the instructions used in a set of applications, including pairs of instructions, and implement special purpose instructions in microcode to combine them. There were a couple of papers on this subject, but it never really caught on.

Second is to implement instructions to support a high level language, so any program written in that language would run faster. The Burroughs D-machine was a computer of this sort. In some sense the Patterson version of RISC did that, with the vertical microcode needed to support C and UNIX influencing the design. Dave did his dissertation in the microprogramming field, and did some work on it for Digital right after joining Berkeley.

And, as mentioned, CISC machines had this kind of thing also. IBM360 instructions, very complex, existed to do block memory moves which used lots of code. The problems with this approach have already been mentioned.

I suspect it is because you need to come up with something that will be significantly better than what is out there now. It is the same thing with programming languages. When I was in grad school SIGPLAN notices would have one or two new languages each issue. Now there are very popular languages for almost every niche, why invent a new one? I say this as someone whose dissertation was about a new language - for microprogramming, in fact.

BTW, an excellent history of the mistakes we made in this area can be found in chapter 3 of “Processor Design,” edited by Jari Nurmi (from Springer.) This chapter called “Beyond the Valley of Lost Processors: Problems, Fallacies, and Pitfalls in Processor Design.” It is by Grant Martin and Steve Leibson of Tensilica. Tensilica is a company which allows you to design custom embedded processors by designing an instruction set and letting their tools synthesize the processor. I’ve not used it myself (and am not associated with the company in any way) but as a former architect I can say it looks pretty good. So, if you want to play with OS-centric design, you can.

Francis Vaughan: Do you have any evidence Linux did anything to OS design research Unix hadn’t already done a decade before? Besides, it isn’t as if the Hurd doesn’t exist, or Xen, or even Plan 9.

Additionally, Linux isn’t a research project and it was never meant to be one. It was a hobby OS that grew into a useful one. Why is it even mentioned in the same sentence as research OSes?

Linux has its roots in Minux which has its roots in Unix. Unix used to be free for universities. Then when it started becoming popular, AT&T realized they could make money for it and no longer wanted to give it away for free. So a bunch of folks got together and wrote Minux, which is a stripped down version of Unix used to teach OS theory. There were two competing groups of Minux users, those who wanted to keep it small and simple for teaching purposes, and those who wanted it to be a full featured OS like Unix. Linus Torvalds was in the latter group, and re-wrote Minux to form Linux as his master’s thesis. He also chose to release it to the world, and thus Linux as we know it was born.

I don’t know if I’d call it a “hobby” OS, but it certainly was Linus Torvald’s pet project. He also did release it to the world with the intention of it becoming a fully functional OS and not just limited to educational use.

You can also say that Linux is a research OS because there certainly is a lot of experimental stuff that goes into it. It’s a nice fully functional OS, which makes it a great platform to experiment with. A lot of real time experiments have been integrated into it over the years, and some of these have been so successful that they have been integrated into the mainline Linux kernel. NASA has even experimented with it and has created flight certified versions of Linux, which is a pretty significant achievement. Flight certified operating systems have pretty strict requirements, because let’s face it, you can’t just press ctrl-alt-del on a satellite orbiting Mars.

I do get the point though. A lot of new operating systems aren’t experimented with simply because Linux works so well and is so easily available. People would rather modify Linux to do what they need than create their own OS which doesn’t suffer from the peculiarities of unix/linux.

Up until the popular introduction of Linux there was a lot of OS research. So say in the mid 90’s it started to go bad. In the preceeding decade you had had a lot of work. Off the top of my head, OSes that I had encountered or used included. V-kernel, Plan-9, Mach, Chorus, Choices, Amoeba, Mungi, Grasshopper, Clouds, Opal. Lots more.

Of these, Mach was picked up by Gnu for the Hurd, and became Darwin that forms the basis of MacOS, and some Mungi guys fed stuff into to L4. But in the 15/20 years since these projects, it has been an almost desert.

Linux isn’t a good platform for research. The kernel is very large and not very modular. Linus is very conservative in his approach to change. (This isn’t a bad thing - stability in the kernel has been critical to its success). An OS is not a trivial thing to create. And you need a serious ecosystem of tools to build it. There is really only one viable toolchain in existence - GNU. LLVM might start to supplant it, but not yet. Then you get all the user expectations. Window manager, lots of support facilities, networking. Everything is welded into an ecosystem that assumes the default paradigm of what people think an OS is. Writing yet another OS that provides trees of files with textual names, processes in individual virtual address spaces, the same old same old same old, is a waste of time. There is half a century of experience. Windows, Linux, MacOS, all based on ideas little changed since the 70’s. So, research in OS requires a very significant leap of faith, and a large investment to get something actually innovative done. You can’t get research funding (as a rule) to create tools. The currency of research is publications. The ubiquity of a free and satisfactory OS (Linux) and the high barrier to entry to create something different conspire to push people out of OS research.

There are some influences this way.

The early Cray supercomputers had one CPU instruction added just because a specific customer (NSA) wanted it. [Popcount – count the population of bits that were on in a word.] Apparently that was quite important in some of the cryptographic work they did. This was contrary to Seymour Cray’s RISC design*, but they were his biggest customer.

A few years back, there was an idea of making more effective supercomputers (not necessarily faster) by making the hardware re-programmable on the fly. By re-programming microcode to make the CPU instructions more tuned to the needs of the specific problem program, and possibly by rearranging the interconnections between the CPUs. So a problem program might spend some time re-programming the hardware to run that program more effectively, before it actually starts executing the program. Something like re-burning the PROM chip holding the BIOS in a PC system. I haven’t heard if this was actually done, or how well it worked out.

  • The old joke was that RISC actually stood for Really Invented by Seymour Cray. He’d been designing that way since the CDC 6600, about 1963 – long before anybody gave a name to RISC.

Not quite. Since the 1950s, AT&T had been operating under a consent decree as a result of an antitrust trial: Instead of proceeding with the case, which AT&T was likely to lose, the government offered AT&T an out in the form of accepting certain restrictions. One of those restrictions was a prohibition from making money on operating systems. This bit AT&T pretty hard when Unix, invented in 1969, became really popular throughout the 1970s. When the consent decree timed out, the formerly very liberal licensing was reversed.

To begin with, it’s called Minix. :wink:

Secondly, it isn’t ‘stripped down’ so much as ‘founded on a completely different technical footing’: It’s a microkernel, not a monolithic kernel, which means it moves most of the ‘OS’ stuff out into userland software. This is a radical redesign and for the past couple decades it’s been the wave of the future.

(Incidentally, that microkernel thing was one of the things that utterly destroyed Ken Brown when he tried to claim Torvalds stole code from Minix. You don’t build a gasoline engine by nicking parts off an electric motor. The ‘wronged’ man speaks and yes, Brown’s a moron.)

It wasn’t for a thesis, it was for a computer: Torvalds had just gotten a 386 and wanted to test it out. Linux was certainly never intended to be a serious project; in fact, it started out as a terminal emulator.

Also, I really don’t get where you can claim Linux (a conservative monolithic kernel (initially) initimately tied to the 80386) was a ‘re-write’ of Minix (a radical microkernel designed to be (reasonably) portable).

No. He released it to give people who’d grown tired of how functional Minix had become a new toy to hack on and write drivers for. He never intended it to take over from the Hurd, which everyone knew was going to be the OS of Tomorrow. I think his announcement of the project on Usenet sums it up pretty well.

I don’t get this, though: If you want to do OS research, and there are people who do, why does the existence of Linux stop you? Obviously, you don’t think it’s good enough in some essential respect; otherwise why would you be doing OS research? Does the existence of a cheap Cessna stop people from doing research into new aircraft designs?

This is a rather detailed history of Unix and Linux, focusing on the history of Linux.

This is a very basic timeline of Unix history.

Sun invested considerable time and money about 10 years ago into Java CPUs, that could execute JVM bytecode on silicon. The problem is that the rate of performance increase in general purpose CPUs was faster than the development cycle for specific purpose CPUs, and so by the time the Java CPUs were ready, it was faster and cheaper to use general purpose CPUs.

This is often still the case (apart from some specific spaces, like DSPs and GPUs).


While there was no real distinction between OS and application, one of the [formerly] most widespread examples of this was the 1ESS CPU, used in the first production computer-controlled telephone switch. It had single instructions for things like scanning call registers. In addition to speeding operation of the system, it also eliminated a whole set of software architecture issues - if these instructions had been implemented as software subroutines, the system would have had to deal with interrupts in the middle of the subroutine.

Much later, the architecture of Digital Equipment Corporations’s Alpha CPU was designed in tandem with the port of the VAX/VMS operating system.

This was happening long before Linux arrived on the scene. It just happened to be a Unix-like operating system that was available in source form without paying a license fee, so lots of manufacturers jumped on the Linux bandwagon.

In the “old days”, companies that designed computer hardware also designed an operating system that ran on that hardware. [Originally, customers were expected to create their own OS / application. The time and expense involved in doing so led to the creation of many of the original computer user groups, so customers could share their work.]

This of course meant that it took longer and cost more to bring a computer to market, since the operating system work couldn’t really get going until there was at least a functional simulator for the new hardware. That was the main disadvantage from the point of the manufacturer. Customers who changed computers had to re-write their software to a greater or lesser degree. While this helped the manufacturers retain customers, the customers didn’t appreciate it.

Of course, since the hardware and software were from the same manufacturer and tightly coupled, there could be a number of useful innovations in the operating system due to that.

When Western Electric (and later entities) offered Unix licenses to various computer manufacturers, operating system “development” became a lot easier for manufacturers - mostly just creating device drivers and dealing with any unresolved quirks in the original Unix code that didn’t port cleanly to a new architecture. This work was often done by the new generation of microprocessor manufacturers (for example, Genix from National Semiconductor for their 16032). This also meant that users could more easily move their applications between systems from different manufacturers.

This didn’t go over well with manufacturers who had already invested substantial time and money in creating operating systems from scratch - those companies were very late to adopt Unix on their hardware. For example, DEC had a huge number of operating systems, even on one particular CPU family. Unix was not one of their offerings until much later on, despite much UNIX development work being done on DEC computers originally.

See, that’s interesting. I wonder about the possibility of something similar on a much larger scale. It sounds like one of the main stumbling blocks is how quickly OS’s need to be patched or otherwise are changing (like linux). Perhaps in the far off future the technology of processor design and production will be such that we will have the tools to quickly translate a high-level instruction set into a producible cpu design in an automated fashion. Sort of like a compiler, but on the hardware side rather than the software side.

This is close to the OP’s case of interest. I’m also going from 35-year old memory here, so this’ll be accurate at the high level but will muck up at least some details. …
Back in the 1970s HP introduced their minicomputer series, the HP 3000. It ran a bespoke OS called OS/3000.

The machine used a stack memory model with memory segments (& segmentation registers) for code, fixed data, and the stack area. There were no programmer accessible registers in the x86 or IBM360 architecture sense.

It came with a C-like language called SPL/3000 (SPL=“Systems Programming Language”). Like C, it compiled almost directly into machine code and had a bunch of idiosyncratic operators which mapped pretty much one-for-one into machine instructions. Like C, you could drop directly into inline assembler at any point in your code.

The original beta hardware + OS had truly pitiful performance, even for its time. They had implemented the OS memory management scheme as a header+arena model with all the headers forming a linked list whose origin was pointed to by a fixed location in low memory. Every *malloc * or *freemem *operation required traversing the linked list from the base to locate the relevant header. This took forever.

The solution was to create a new priveleged-mode instruction LLSH which did linked list search. The instruction understood the format of memory arena headers and had two modes. One would help implement *malloc * by searching for free-area headers larger than the Top-of-Stack value. The other mode helped implement *freemem *by searching for the header which contained the address stored in the TOS.

When the CPU was first released to market it had the LLSH instruction and ran tolerably fast compared to its competition.

However … A moments’ thought about that simplistic *malloc *algorithm (i.e. first fit) should trigger alarm bells in anyone who took undergrad OS design. That algorithm is just about the canonical worst-case for rapidly creating memory fragmentation.

For a machine which mostly processed long-running batch-type apps that wouldn’t be so bad. But the 3000 was intended to be a timesharing system running several (10-20) independent user code streams plus a dynamic set of short-running batch apps. The result was they ended up adding a quasi-garbage collector to the OS which would move the arenas around to compact the free space when needed. Which was real often. The only good news was that it was easy for them to implement & pretty easy for applicatin devs to live with because of the rest of the segmented hardware architecture.
Bottom line: The CPU could be made smart enough to do a linked list search, but couldn’t be made smart enough to do a good job of the larger-scale meta-task: memory management.

As someone upthread said, it’s better to implement these complex algorithms in the OS or the compiler where a lot more brain power can be applied with no time constraint.

P.S. Don’t ask about the bespoke non-relational database system which implemented record storage on disk as a set of multipley-linked lists for the various key fields. Given the slow HDs of the era this too had a nasty (non-!)performance curve as row count expanded.

Later on the HP 3000 & it’s successors got to be a pretty good systems. But the very first ones off the assembly line tried our patience to say the least.

Both VHDL and Verilog meet this description. The development process from taking a Verilog program, for example, from source code to silicon is a lot more complex than doing the same for software, but it gets done all the time.