Is this kinda sorta how computers work?

It takes a palindromic quine to seperate the geeks from the nerds.

Back in my day, we just had 1’s and 0’s. And sometimes we didn’t even have 1’s. I once wrote a whole database application using only 0’s.

Luxury! We only had tiny pieces of bits to work with. The whole family would pool their’s together at the holidays to make a single 1 or 0.

If you ever coded for a 68HC05 (Motorola 8 bit embedded microcontroller, little brother to an 'HC11) you pretty much had to do that to implement a look-up longer than 256 bytes, or any kind of jump table.

The index on an indirect access was only 1 byte. To go longer, you needed to calculate the required direct access instruction, load that in memory, follow it by a return instruction, then call the direct instruction you had created as a subroutine.

For a jump table, you read the address out of the table and used it to create a jump instruction which you then jumped to.

After the first couple times this seemed routine, and I’d start out reserving a small block of RAM to implement such bodges within. Eventually I made some macros to make it easier for other people to read my code. Good times!

The way the LGP-21 worked was that it had this stop code you put at the end of a chunk of data you wanted to read into the computer from either the console or paper tape.

I disagree that interrupts are important in understanding the execution of programs. They are important in understanding how the computer interacts with the outside world, but in the broadest sense they are just an asynchronous subroutine jump - complicated by priorities, etc. Very little of a CPU deals with them. You get blocks controlling instruction execution, or arithmetic, or the cache, or branch prediction, but they are handled in a small part of an I/O block, I assume.

Right, I agreed with your disagreement in the penultimate line of that quoted post. (At least, I think I did…)

Take a look here, specifically the section called “Real Programmers Don’t Write Fortran Either.” The LGP-30 was the vacuum tube version of the LGP-21, and the stuff about sometimes picking the location under the disk head is very true. It came with a timing wheel to let you do this.
I never was quite as crazy as the guy in the story, but since my tic-tac-toe program ran out of memory, I figured out a way of switching code from computing its move to the opponents move by changing a bunch of Adds to Subtracts. The initialization portion made sure they were all Adds to begin. After this experience I never had any trouble at the very lowest level - and it is probably why I became a microprogrammer.

It’s probably worth forgetting about a desktop computer, which are insanely complex these days with pipelined CPUs and parallel processing on the GPU, and instead focusing on a simple processor — a computer in miniature. I’ll focus on the design of the Intel 8051/8052 8 bit processor, which is now 30 years old yet still in use almost everywhere and widely manufactured (and also because I’ve just been tortured by having to write an emulator for it).

First, some terms: a bit is a boolean value, 1 or 0. A byte is a vector of 8 bits, and a word is two bytes. Sometimes you may see the term “nibble” which here means four bits.

The 8051 has two types of memory: internal and external. Internal memory is kept on the silicon die itself and can be accessed quickly, whereas external is, well, external, and the specific size of the external memory can be varied depending on the manufacturer of the chip.

Internal memory is split into various classes: the 8 bit registers R0-R7 (think of these as a scratchpad area for storing intermediate results), an expandable stack (a last-in-first-out datastructure that grows as needed) and the SFRs (special function registers) which contain all sorts of control bits for configuring interrupts, timers and so on. An SFR of special note is the PSW (program status word) which despite its name is 8 bits long, each bit being a flag dedicated to signalling some aspect of the processor state. For instance, two of the bits in the PSW correspond to the C and OV flags — the carry and overflow flags respectively, which signal a carry and overflow after 8/16 bit addition. Of additional note are several more elements of the PSW: the PC (program counter, which points to the memory location of the next instruction to be executed), SP (stack pointer, which points just past the top of the stack) and the ACC and B registers, which act as a pair of accumulators where the results of additions and multiplications can be stored. More on these later.

External memory is also split into two. The first part is the code memory. This is where the program code of the program to be executed is stored. The second part is general purpose external RAM (XRAM) which can be used to store whatever is necessary. Now, programmers typically will program the 8051 in either C, or assembly. For instance, here’s a small C program which I was using to test our emulator with, which computes an integer square root:



unsigned int isqrt(unsigned int n){
  unsigned int a;
  for (a = 0; n >= (2 * a) + 1; n -= (2 * a++) + 1);
  return a;
}

int main()
{
  int i = 0;
  i = isqrt(16);
  return 0;
}


Compiling this with a C compiler for the 8051 produces 8051 assembly code along the following lines:



...
	mov	ar6,r4
	mov	a,r5
	xch	a,r6
	add	a,acc
	xch	a,r6
	rlc	a
	mov	r7,a
	inc	r6
...


Which is assembled into the standard Intel HEX format along the following lines:



...
:0A00A400900010120064900000228A
:06003700E478FFF6D8FD9D
:080015007900E94400601B7A48
:05001D00009000B27824
:030022000075A0C6
:0A00250000E493F2A308B8000205FE
:08002F00A0D9F4DAF275A0FF7C
:08003D007800E84400600A7934
:030045000075A0A3
...


This is basically just a huge numerically coded version of the previous assembly program. Each possible permutation of assembly instruction plus operands and addressing mode is assigned a fixed integer. This mapping can be looked up in manufacturer’s data sheets which are freely available — so in theory you could hand disassemble this HEX file by hand into an 8051 assembly listing. Using this mapping, every instruction is converted into a 16bit binary number and spat out as the above Intel HEX file. When you want to load a program onto the 8051, this series of numbers is mapped into the code memory, where it is stored waiting for execution. Of course, before execution this mapping must be reversed.

Now, how the processor works is pretty simple. Each instruction cycle (corresponding to 12 clock cycles on the standard Intel 8051) the processor looks up in code memory the next coded instruction (pointed by PC, the program counter) to be executed. After decoding, the processor does something different depending on the particular instruction. For instance, the opcode corresponding to the instruction:



mov R0, #5


Tells the processor to move the constant integer 5 into register 0. Other instructions “push” a value onto the top of the stack, “pop” a value off the stack, “add” two 8 bit integers together setting the various carry and overflow flags in the PSW as necessary, perform “no operation” etc. etc. (on the 8051 there’s just less than 256 distinct opcodes). Some other types of instruction manipulate the program counter directly. For instance, jumps of various forms add a constant to the program counter to move it forwards/backwards to a particular instruction. “Calls” (jumps to a subroutine/interrupt handler) do something similar, but remember the previous value of the PC in order to reset it after the subroutine has finished executing.

(A minor point: some instructions, such as multiplication, may take more than one instruction cycle to complete, in which case everything is held up on the 8051 until finished. Modern processors have smart ways of getting around this problem.)

This whole process keeps going forever on the 8051: fetch-decode-execute/writeback-fetch-decode-execute. There’s no way to stop the processor from executing.

This is a decent description of a “pure” processor. However, the 8051 also has the ability to communicate with the outside world through a serial line, and 32 input/output lines. There’s special registers that hold values sent to the processor over the serial line. When a character is received, a bit is set, signalling so, which the programmer must check. Alternatively, the 8051 can be configured to issue an interrupt when a value is received over serial. Interrupts do just what they sound like, in stopping all current execution and jumping directly to a fixed address where the code for the corresponding interrupt handler is held. Once this interrupt handler finishes executing, either the next interrupt handler runs (there’s a fixed priority on the 8051 amongst various different types of interrupts, and also the programmer can further modify this priority), or normal execution of the processor proceeds.

So, in actual fact, the 8051’s continuous cycle looks more like this:

fetch-decode-execute-check interrupts-perform I/O-fetch-decode-execute-check interrupts-perform I/O-etc. etc.

Err, I suppose that’s all can be said about the 8051. I’m sure I’ve ballsed something up, or not been entirely clear. Basically, aside from the interrupt and I/O stuff, the 8051 performs remarkably like a Turing machine with a more developed instruction set (i.e. not just move left one/right one, write, etc.) and a separate tape for the loaded program (corresponding to the 8051s code memory). Even the Intel HEX stuff is similar to Turing’s numerical encoding of A-machines when he demonstrated that he could construct a universal Turing machine.

Thanks.

What I’m particularly interested in is something that may be over my head, but I’ll ask it anyway. What I was musing about was exactly how the thing decodes instructions. What gets it from the fact that a “mov” is stored in its memory to the fact that the machine, in fact, executes a "mov"ement?

That’s too much, isn’t it? :wink:

The program adjusts the IP (Instruction Pointer) to point to the particular spot in memory that has the “mov” instruction. The hardware cpu then “executes” the “mov”

You have 2 high-level different contexts for “interpreting” the bytes that are sitting at a memory location. One is the DP (Data Pointer), and the other is IP (Instruction Pointer).

Some examples,
Internet Explorer executable is iexpore.exe
Notepad is notepad.exe

Embedded inside those .exe files are thousands of “mov” instructions. However, when the computer starts up, they’re just dormant bytes on disk. You first have to copy those bytes of “mov” instructions into working RAM. Even after copying them into the active RAM, no IP (instruction pointer) has been adjusted to start interpreting those RAM bytes as “instructions.” The operating system (Windows) does a little internal housekeeping and then it changes the IP to point at the specific bytes of iexplore.exe (in RAM). All of this takes milliseconds but before the IP has been adjusted, all the bytes of iexplore.exe are just arbitrary data. The IP increments billions of times per second. Also, you have thousands of “mov” instructions sitting simultaneously in your hot active RAM at this very moment but they aren’t executed because the IP doesn’t point to any of them.

An analogy…

You have a piece of paper with a recipe to bake a cake. There are some “words” on it such as:
Step 1: add 4 cups flour
Step 2: add 1/2 cup sugar
Step 3: mix in 2 eggs

You could say there are 2 ways to “interpret” those words.
You put your finger on the first line and “copy” (hand rewrite) on a different piece of paper the words “add 4 cups flower.” In this case, your finger is a “data pointer” – no execution of the “words” take place.

On the other hand, you could put your finger on the first line and then actually pour real flour into a bowl. Then you shift your finger to “step 2” and add in sugar. The context has changed. Your finger is acting like an “instruction pointer.”

The analogy is not perfect because you’d use the same human finger to keep track of what step you were on whether you were interpreting the piece of paper as data to be copied vs actual instructions. On the other hand, computer processors have separate dedicated “fingers” (in the Von Neumann architecture that treats arbitrary memory as either “data” or “instructions”)

Computers are complex, so it’s hard to give very simple explanations. But I’ll give it a shot.

Computers start out executing a set of instructions in order:

  1. Apply shampoo to wet hair
  2. Work into rich lather
  3. Rinse throughly
  4. Repeat if necessary

Each instruction tells the computer to move some data from one location to another, or modify data at a location. Step 1 is a move instruction. It says get the shampoo (data at location shower shelf) and move it to your hair (data at location your head). Step 2 modifies data. It says lather (modify the data), and the location (your head) is implied. Step three is another move. It says move the shampoo (data at location your head) to the drain (location bottom of the shower). Step 4 changes the execution order conditionally. It says examine hair (data) for cleanliness, and move True or False into the HairIsCleanFlag (data somewhere in your brain). Then if HairIsCleanFlag is True, change the next instruction to execute back to 1.

Note that the computer is sequentially maintaining an Instruction Number as data. So after each instruction executes, it adds one (modify) to the Instuction Number (data location in the computer), except in step 4, which may move 1 into the Instruction Number instead. Your brain does something similar, you just don’t have to focus on it.

So the computer is always fetching instructions in a regulated sequence, and any of those instructions can say go use other memory for data.

I have to make a phone call about some computer instructions, and then wash my hair, but I’ll try to add some more information later.

The best way to understand this is to start very simple. Go search the internet for how to construct an encoder/decoder, like if you had a breadboard, I am guessing there are probably emulated breadboards that you can do on the screen.

Once you see that in operation for something very simple, then you can mentally extrapolate more complex versions of the same.

Thanks, I’ll look into this.

Again, if I understand what you’re asking, you’re concerned with microcode, which:

A computer is really just a complicated version of a thing called a “finite state machine”. One simple way to make a finite state machine is to use a ROM and a register, and a clock to latch the register. The ROM’s address comes from the output of the ROM plus your circuit’s input lines. The ROM’s output contains not only the address of the next state, but also the outputs that you want from the circuit at that particular state.

Here is a really simple example. Let’s say we want to make a sequencer with 3 outputs. At every clock cycle, a different output line will be turned on.

In other words:
State 0: 001
State 1: 010
State 2: 100

So our ROM needs 2 address bits (to hold 3 states). Let’s make the first two bits of our ROM the next state, and the last three bits of the ROM the output.

Line 0 of the ROM is therefore 01001 (01 for the next state, 001 for the output). Line 1 of the ROM is 10010 (10 for the next state, 010 for the output), and line 2 of the ROM is 00100 (00 for the next state so that it will go back to the start and keep cycling through, and 100 for the output).

So you power up the circuit, and since the register starts with everything cleared out, it starts out at state 0. The address fed into the ROM is 00, and after some settling time, the ROM’s data outputs 01001. At the next clock cycle, the 01 gets latched into our address register. After some settling time, the ROM outputs 10 for the next state, and 010 for the output lines. The address register still has 01 in it though since it holds its value until the next clock cycle. At the next clock cycle, the address register latches in the value 10, and after some settling time, the ROM outputs 00 for the next address and 100 for the output lines. At the next clock cycle, the address register latches in that 00, and after some settling time the ROM outputs 01 for the next address, and 001 for the outputs. The state machine has now cycled back to its initial state and it is going to keep cycling through these states over and over.

To make your control circuit even fancier, you can add input lines into your address register. Obviously, when you do this, you need to add an entry into your ROM for every possible address and input line combination, but you can very quickly build up a very complicated circuit. And you don’t have to cycle the next address in order, like the sequencer does. You can put the addresses in any order you want, as long as you put the next addresses location in the ROM to point to the appropriate next state.

To make a computer, you take a simple finite state machine like this, and you attach some registers to it. At a minimum, you are going to need a register to hold the instruction, and a couple of registers to hold data, and a register to hold the program counter.

The basic states that your state machine will cycle through are instruction fetch, instruction decode, execute, and write back. Each of these is going to be broken down into further states. For example, the instruction fetch is going to output the program counter onto the address bus and set the read line active, and then on the next clock cycle it is going to latch whatever data is on the data bus into the instruction register. When it gets to the execute states, you will find that the bits in the instruction opcode are probably set to directly drive the control lines of the ALU so that you tell it what you want it to execute. The write back stage is going to store the ALU result somewhere, either in a register or maybe in the program counter in the case of a jump instruction.

Computers don’t actually work this way any more, but more on that in a moment.

Using our fairly simple computer, the first state is instruction fetch. It is going to load the binary code for your mov instruction into the instruction register. Your state machine then goes to the next state, instruction decode. You didn’t specify what your move is, so let’s say you are moving the immediate number 12 into memory location 7. There are probably several different mov instructions on your machine, which if you wrote assembly code you would write mov for all of them and let the assembler figure out which one to use, but in this case you’ll end up with the “mov immediate into memory” opcode. Since we have that, the state machine knows it needs to fetch the immediate value and stick that in one register, and fetch the address and stick that in another register. So now the instruction is complete fetched and decoded. We have the mov code in the instruction register, we have one register containing our immediate value (12) and another register containing the address to move it to (7).

Our state machine goes to the execute state, but there isn’t anything for the ALU to do here. If this was an add or sub opcode then the ALU would perform whatever arithmetic operation needed to be done. The state machine goes to the write back stage, and the address from our address register (7) is put onto the address lines, the data from our data register (12) is put onto the data lines, and the write command line is set active. And voila! Our instruction has been executed.

Now that’s all fine and dandy, and that is how computers used to work many years ago, but these days computers use what is called a pipeline to make things faster. Instead of having one set of hardware that cycles through all of the states, each state is given its own set of hardware. So, the instruction is fetched by the fetch stage of the pipeline. Then it is passed to the decode section of the pipeline. While the decode section of the pipeline is decoding the operands and fetching them, the instruction fetch has gone and fetched another instruction. The first instruction goes to the execute stage, the second instruction goes to the decode stage, and the instruction fetch stage fetches yet another instruction. Then the first instruction goes to the writeback stage, the second instruction goes to the execute stage, the third instruction goes to the decode stage, and a fourth instruction is fetched. Now, the pipeline is full, and from here on out, the processor is processing four instructions simultaneously, with each instruction in a different stage of the pipeline. This makes the processor four times faster.

And in the real world it gets even more complicated. Your instruction fetch wants to grab the next instruction from memory, but your decode stage will often be reading data from memory, and your writeback stage will be writing to memory too. There’s only one memory though, so these stages all have to wait for each other. To get around this problem, caches are used. Once the caches fill up, the pipeline goes a lot faster. Processors actually use a lot more than four stages too.

But hopefully this will give you the basic idea of it all.

Yes, thanks! Sorry I missed your previous reference to this.

Oh, no – no reason to apologize. The “again” was meant to attach to the subordinate “if I understand” clause, not the independent “microcode” clause; in a previous post I said a similar thing when referring to Harvard vs. Von Neumann architectures.

But I’m glad you found the reference informative (assuming you did).

Intel processors still use microcode. RISC machines don’t - in fact RISC came out of microcode, in that the first RISC machines, from IBM, came from an environment where they did a lot of microprogramming, and Dave Patterson did his dissertation on microcode verification, and in fact did some microprogramming work for DEC after he started at Berkeley.

Bear with me - I used to be a World’s Leading Authority on this. :wink:

When you get down to it, instructions on a computer are executed by hardware. In a microprogrammed machine, the hardware implements a set of simple microinstructions. The normal machine instructions are “interpreted” by running microcode. One of the reasons IBM took it up was that in the 360 they were able to emulate an older computer with microcode, so people who owned one did not have to throw away all their software, and actually had a machine that executed their old code faster.

So, here is how you execute a MOV instruction in microcode - let’s say mov R3, Mem(1234) - move the contents of memory location 1234 to register 3.

One of the registers in the microcoded machine is the program counter. The microprogram begins by copying the contents of the PC to the memory address register, incrementing it to point to the next word, and then issuing a memory read. You wait a bit, and the result comes back in a memory data register, which you then copy into an instruction register.

There is probably hardware to decode the operation. You take the bits of the word corresponding to the opcode, and often do a jump through a table in ROM to the beginning of the microsubroutine to handle a MOV.
You look up the destination, see that it is a register, and then look up the source, In our simple example all you have to do is move the address to the memory address register, do a read, and put the result in a temporary register. You then move it to the destination register, and return to read the next instruction. If you have complicated addressing you add the offset to the register holding the base location, or whatever.
Microcode was very popular back in minicomputer days, and Burroughs had a line of nicely microprogrammable machines. I used the Lockheed SUE for my initial work.

There were two varieties of microcode. Vertical microcode looked more or less like normal assembly code. This more or less led to RISC. Horizontal microcode was made of a bunch of microoperations that you could pack into a microinstruction, and thus do lots of stuff in parallel. This led rather directly to Very Long Instruction Word machines. Josh Fisher built his work on microcode compaction across blocks on our work on compaction within a basic block.

Hardware implementations do the same thing, but everything is built in. The advantage is that it is faster, the disadvantage is that it is more complex and less flexible. But the old dream of dynamic microcode is long gone.

I suppose that is true in a sense, but the way computers are implemented is by a whole bunch of small FSMs. Modern processors have hundreds of thousands of flip-flops, so considering it as one FSM is not very useful.

In fact the real division is between control and datapath. Data path is found in stuff like the ALU, where the data moves between register stages and gets fairly clean stuff done to it. The control part is usually implemented as an FSM, and has a bunch of control signals which control movement of data along the datapath, as well as other things. Back 15 years ago when I was closer to the design team than I am now, datapath circuits, which are complicated but don’t have to be very fast, were commonly synthesized from a register transfer level description. Datapath blocks, which have to be fast but were relatively easy, were designed by hand and often used all sorts of interesting and nasty custom circuit design techniques. The types of blocks were distinguished by a naming convention, they were so different.

Well, the conversation has taken a turn, this is now about how instructions turn into the internal operations in a processor. With micro-code, its just another turtle in the stack, a simpler processor executing simpler instructions to emulate the function of a more complex processor. But there is one turtle at the bottom, and he is a Rube Goldberg construction of lots of simple components that have mechanical analogs.

Going back to the model train example, if you have lots of track (wire), and track switches (flip-flops), add some of those cute devices that drop coal (electrons) into a car or dump it out as the train passes, and you have the basics. You have to have the coal dumped from one car end up being loaded again at some other point on the track to get a programmable computer. Adding coal to the loaders is your initial input, and some crossing gates can be your output. It will be a lot easier to parallel a lot of independent trains and tracks where the coal can be dumped from one train and loaded on another, but it’s possible to do it with only one train.