Don’t you need a language to begin with to write the interpreters and such? Were the first programming languages forged entirely out of 1s and 0s? How is this even possible?
The first programming langauges were 1 and 0, or machine code. This is the most basic thing the computer understands.
A programming langauge, at its core, consist of operands and opcode. An operand is a command or an operation, and an opcode is usually soem value which you are operating with.
I don’t really know how machine code works, but at the lowest level, there is a chart of opcode. So 0 could be a move instruction, 1 could be add and so on. So you just use binary to tell the CPU which operand you are using.
Assembly is the next higher level up. If I get my theory right, when you are with a high level programming langauge, everything you do is just translates into assembly or machine code. High level programming langauge has operands and opcode too, as well as some other structures. So the code is broken down into smaller units called token, and through some algorithm, the tokens are compared to the list of operands, opcodes and pre-determined language structure, and translated into the lower language such as assembly or machine code.
Hence, what goes into writing a programming language is the design of its syntax, a parser which is able to recongise language constructs and a translator which translate the recongised language constructs into machine code/assembly.
Pretty much, but the method of input may have been different. Gears, levers and switches can represent on and off, or 1 and 0. Early machines were programmed physically, and in some cases physical programming is still used (DIP switches, etc.). Remember back in the day when you had to program a remote by flipping the little switches on the back? Same idea.
Step 1: the computer only uderstands machine code; you have to translate assembly-language instructions into pure machine code and enter it as a load of numbers.
Step 2:Programming in raw machine code, you write a program that does the above for you (an assembler), or you write a compiler that translates high-level language instructions into machine code. (or you wait for someone else to create these utilities).
Step 3: Once you have a fully-functioning compiler/assembler, you (in theory, at least) never need to touch machine code again; you can work entirely at the higher level.
In some cases, version X+1 of a programming language/compiler/IDE is created using version X of the same language/compiler/IDE - this is certainly the case with Delphi - Delphi 3 was written using Delphi 2, and so on.
In the early days of computers it was necessary to write the first language for a new computer architecture in machine language. It’s no longer done this way - when a new machine architecture comes along such that existing software won’t work on it, developers simply write a “cross compiler” - a compiler that runs on one machine/architecture and produces object code for a different machine. For example, if Intel came out with a Heptium XVII chip that couldn’t run code for any existing chip, they would write a compiler that ran on one of their existing chips (e.g. a Pentium) that would produce object code (machine language) for the Heptium.
This is actually step 2, and John von Neumann’s most significant contribution to the development of computing. In the early days, as XJETGIRLX alludes to, only the data lived in memory. The program was stored somewhere else, generally as hardware.
There are quite a few ways of doing it. You can write an assembler in assembly language, and then compile it by hand into machine language. Then, you can use it to produce new versions of the assembler. For high-level languages, a popular approach is to write a cross-compiler on system A that produces machine code for system B. That allows you to use the existing tools on system A to bootstrap a compiler on system B. Some systems, like microcontrollers, have no native tools. All of their software in cross-assembled or cross-compiled on a larger and more sophisticated system. This can get complicated. For example, I used to write programs in assembly language for a very specialized computer. To convert my programs into machine language, I used a cross-assembler, written in FORTRAN, that ran on an IBM mainframe.
In the really old days, the typical procedure for starting from scratch on a new system was to write an assembler for the new system in assembly language, and hand-assemble it into machine code. Once that had been done, it could be used to build new and improved versions of the assembler and other system tools. Compilers for high-level languages could then be written in assembly language and built with the assembler.
Today, it’s easier and cheaper to write the new tools in the high-level language of your choice, often C, and cross-compile them to the new system. The use of assembly language to write software has been declining for decades.
The first assembler I ever used I wrote myself, so I’ve actually done this. It was in high school for the wonderful LGP_21. .
There are several levels at which an assembler helps you program. The first is the translation for mnemonic op codes for machine instructions into binary. On the LGP-21, the few opcodes were cleverly designed so that the representation of their letter in binary was the opcode. Thus the coding for a B(ring) (000101) truncated to 4 bits was a Bring instruction opcode. (Astute readers will wonder about the coding - this was pre-ASCII.) So, this was done in hardware. There was a simple routine to translate decimal addresses into hex, for convenience. So, to write machine code, you put it on paper tape, ran it though this simple program which translated the decimal and put it into memory, then you pushed the right button to start the program running at location 0.
The real benefit of an assembler is allowing you to jump to labels instead of absolute addresses. Without this whenever you added an instruction you had to change every jump - so you usually inserted a branch to another section of memory, added the code there, and jumped back. This was real spaghetti code! My assembler was a simple two pass one, that read through the program , figured out the label definitions, then inserted the real locations in the instructions. More sophisticated assemblers working with linking loaders that allow you to put the code anywhere - mine wasn’t that good.
The third benefit of an assembler is allowing you to use macros, chunks of code, like subroutines, that get expanded in line.
I used to teach PDP-11 assembler, and was quite good at it. I wrote a nice simulator in it. You can write recursive programs in assembler also.
Now, I’ve also designed and implemented a high level language for my dissertation, so we can talk about that also.
There is another way of doing this (not a better way!) which is writing a translator from one language to another. When I was in grad school Pascal was compiled by translating it into PL/1, which was the more or less native language for the Multics system we ran. I was the first person to get the Pascal compiler to compile itself. This was Wirth’s original Zurich compiler, which, though advertised as being portable, was written on a CDC 6600 and had many dependencies on there being a 60 bit word, especially with sets. After I got it to work I hacked it into a compiler for the language I designed for my dissertation.
…and if all of these explanations are still too high level, realize that 1 and 0 represent electrical signals in some way. The exact representation doesn’t matter, but 1 could be +5 volts and 0 could be no volts, or they could be positive and negative, or whatever.
Below the opcodes and operands that ExtraKun was talking about, you’ve got your CPU. This is a (usually silicon) chip that actually implements the instructions. Each instruction that the CPU can execute needs to be individually implemented in hardware, out of transistors, gates, and other similar electonic doodads. These take wires (OK, silicon traces, but let’s pretend they’re wires) containing those electrical signals, perform some task on them, and put the output on other wires. Since these take up physical space on the chip and have interactions with other instructions, there’s a limited number of them: perhaps a few tens of instructions in the old days, a few hundred now.
A strange hardware/software layer called microcode acts as the interface between the physically programmed instructions and **ExtraKun’s ** opcodes. If you want more details about that, search for “Computer Architecture” on your favorite online bookstore and grab an introductory book. Older books are generally simpler to read and understand in this field.
Skipping to a higher level than everybody else, the process of compiling a programming language has several steps, the biggest of whch are:
Tokenizing - taking the input language and converting it into a list of “tokens”, such as: “Variable name mickey” “assignment operator” “variable name mickey” “plus operator” “constant literal 1”
Parsing - turning that list of tokens into something meaningful: assign the value of “mickey” plus 1 to “mickey”
(various steps skipped)
Code emission - emit the assembly/machine/byte code necessary to implement the parse.
These steps can be done in any language, so as others have mentioned, you just use some language you’ve got lying around to do it. If you don’t have one (generally a historical rather than modern case, because of cross-compilers), you bootstrap: Use pure machine code to write a simple assembler, use the simple assembler to write the very basics of your parser, enough to implent variable declaration, assignment, and math, say. Then once you have a compiler for a simple subset of language “Gonzo”, you can simply write Gonzo in itself, adding capabilities as you go. Supposedly this is largely how C was developed; after C, most compilers were (and still are) written in C.
You can write recursive programs in any language, but in some you have to write a lot more of the required machinery yourself. Some computers provide more of that machinery than others as well. (Compare the VAX to the PDP-8.)
(Voyager, I’m guessing you know this. I’m doing this for the benefit of others.)
To write recursive code, you need a stack. Think of a stack as a stack of plates: You can push objects onto it, pop objects off of it, and everything remains in the order it was pushed.
It’s easy to see how it works in a diagram:
_ <- empty stack
Now, push A onto the stack:
A <- top of stack
_
Push B onto the stack:
B <- top of stack
A
_
Push C onto the stack:
C <- top of stack
B
A
_
Pop something off of the stack:
B <- top of stack
A
_
And the value you get back is C, which is what was pushed last.
So, what do you need to push onto your stack if you want recursive code? Numbers, called addresses, that tell the CPU where to look for the next opcode, or instruction the hardware understands, when it is done with the current stretch of code. That works like this: Code is structured into procedures, and for our purposes a procedure can do two things: Call another procedure, or return to the procedure that called it. When the first procedure returns, the program is over.
_ <- empty stack
This is how the stack looks before the first procedure has called any others.
A <- top of stack
_
The first procedure has called another, and A is the address of the opocde just past the opcode in the first procedure that made the call.
B <- top of stack
A
_
The second procedure has called a third. Again, B is the address of the opcode just past the opcode in the second procedure that made the call.
A <- top of stack
_
The third procedure has returned. The address B has been popped off the stack and loaded into the PC (Program Counter), a special area of memory inside the CPU that tells it where to look for the next opcode. The CPU continues executing opcodes in the second procedure.
_ <- empty stack
The second procedure has returned. The address A is popped and loaded into the PC, and execution continues in the main procedure.
It is even possible for a procedure to call itself. This is commonly useful when dealing with complex data structures, like lists or trees.
Now, a stack is really just an area of memory, just like software is really just patterns of electrical charge in a matrix of semiconductors. Many CPUs provide special opcodes that allow you to more easily treat a given region of memory as a stack. In the ones that don’t, you have to write the equivalents of those opcodes in the opcodes the CPU does provide if you want to have a stack. High-level languages either manage the stack for you, meaning the compiler will generate the right opcodes in a transparent manner, or, if you have one that doesn’t, you have to write a stack yourself using the facilities the language gives you. This is more difficult than doing it in assembly language. High-level languages insulate the programmer from the information he needs (that is, the addresses) to write a stack that allows recursive code.
In some machines. There were computers before the invention of microcode and computers after microcode had been ditched in an effort to make CPUs faster. The versions of the PDP-11 are examples of the former, and the versions of the SPARC are examples of the latter. The various chips made by Transmeta, AMD, and Intel that implement the modern x86 ISA (that is, the Pentiums and clones thereof) use microcode, but they are hardly in the majority of CPUs made worldwide.
Um, always silicon, unless you know something I don’t. Even if you meant GaAs as an alternative, they could never support enough gates for even a simple CPU.
The limiting factor in area for today’s CPUs is not instruction count, but rather cache and the control to allow scheduling of instruction execution to maximize throughput, including speculative execution and all that implies.
Microcode is only used for CISC machines, not RISC. (No microcode in Sparcs for instance.) The microinstruction set is very different from the user visible instruction set. I used to be a World’s Leading Authority ™ in this area.
You forgot one of the most important ones today - Optimization. Itanics, for one, basically don’t work without an optimizing compiler that understands how instructions have to be rearranged for maximal efficiency. I’d guess that optimization is about 90% of compiler complexity these days, and where most of the effort is spent on porting to new architectures.
[QUOTE=Derleth]
You can write recursive programs in any language, but in some you have to write a lot more of the required machinery yourself. Some computers provide more of that machinery than others as well. (Compare the VAX to the PDP-8.)
(Voyager, I’m guessing you know this. I’m doing this for the benefit of others.)
{/quote]
One of my fondest memories is writing a recursive routine for factorials in PDP-11 assembler from scratch in front of my class. It worked too.
Excellent tutorial snipped.
PDP-11s were good at this, having a special stack pointer register. Return addresses for subroutine calls got pushed, and it was fairly simple to push parameters also. There were also auto increment and auto decrement instructions. When I first saw the Bell Labs Tech Report on C I kind of pooh-poohed it as looking too much like PDP-11 assembler.
It’s true that hard coded machines are faster, but I think the real reason microcode got ditched was that as gates got cheaper the tradeoff between gates to implement the ISA and ROM to store the microcode favored the gates a bit more. The second reason, which I mentioned in my simulpost, was that RISC architectures took some of the simplicity of microarchitectures and moved them to the ISA. Microcode is really cool for implementing things like IBM 360 block moves and the like.
I called this trend - I decided not to look for a microprogramming job when I got my PhD because I figured the field would be more or less dead in five years, and I was about right.
I vaguely remember hearing about ruby being used in radiation-resistant chips for NASA. I could be wrong.
To define these terms for the less technical among us:
[ul]
[li]cache: Memory physically inside the CPU. This is both a lot faster and a lot more expensive than memory outside the CPU. If you can fit a substantial part of your program in cache, it will run a whole lot faster.[/li][li]scheduling of instruction execution to maximize throughput: Modern CPUs don’t execute instructions in precisely the order they appear in the code. Instead, they take batches of instructions and use special hardware to reorder them so the resulting code is more efficient. Modern CPUs are scary smart. ;)[/li][li]speculative execution: Code often makes the CPU jump around a lot to find the next opcode. Problem is, modern CPUs are a lot more efficient if they don’t have to jump around. So CPUs will guess where the next jump is going to take them and start executing code there even before they know whether they guessed right. That’s what ‘speculative’ means.[/li][/ul]
RISC chips changed a lot about how CPUs are designed. The big philosophical shift, it seems to me, is that people decided that compilers were smart enough to do the work hardware (especially microcode) had been doing in older computers. This has knock-on effects in other areas and other factors were at play (accessing off-chip memory was now slower than ever relative to how fast CPUs were running), but I think that was a big reason.
Intel still calls the Itanic the Itanium, but it did sink pretty quickly. And this quote tends to support my statments above about how compiler advances (among other things you mentioned) made RISC more attractive.
The 11 does have a wonderful ISA. It is well supported by simh, which also has emulators for a load of other computers. Check out the software kits. There are some real gems.
Everyone mentions this, but C is actually a simplified version of B and was first implemented to generate code for the PDP-7.
There were versions of the PDP-11 that used microcode. The PDP-11/60 even had a writable control store. One of my PDP-11 programming books has an example of how to write your own PDP-11/60 microcode for accelerated FFTs. I’ve always been disappointed that so few computers supported user written microcode. It looked like a lot of fun.
mks57: Well, color me mistaken. I thought the VAX was the first DEC mini to use microcode.
Here’s a random little factoid that I like.
According to Bjarne Stroustrup, the first ever C++ compiler was written in…
…C++!
He cheated, of course. He wrote a little preprocessor to convert C++ into C and used this on his compiler’s code.
Brian Kernighan and P.J. Plauger did something similar with Ratfor (Rational FORTRAN), which was a preprocessor for FORTRAN, written in Ratfor.
When I referred to non-silicon CPUs, I was thinking more esoteric things like mechanical, analog, and fluidic CPUs. I’ll admit that none of these are particularly, uh, mainstream.
Voyager’s other comments are correct, of course; although I’d claim that in the historical computers that the OP is talking about optimization is minimal or nonexistent. I left out a bunch of steps in that discussion, mostly to simplify the answer (lexical analysis? parse tree generation? symbol-table construction?). Reading some of the later replies, I agree that I oversimplified. Sorry about that, take it out of my consultation fee.
I suspect that OP is probably sorry she (?) asked by now, and will smite us with noodly appendages at any moment.
Most of these replies, while informative, have little to do with the OP. There were no cross-compilers or more primitive languages. Around 1955, Penn acquired a Univac 1 and I knew a lot of the people who programmed for it. In retrospect, it was an odd machine. But it was, after all, one of the very first commercial machine. For instance it operated entirely in binary coded decimal. It had a built-in interpreter that converted mnemonic instructions into machine code; the programmers had no access to machine code. Its word size was 72 bits, organized as 12 6-bit bytes. It had 1000 memory slots, each storing one of these 72 bit words. The memory consisted of 100 mercury delay lines, each of which stored 10 words as acoustic signals. Transducers at each end converted the electrical signals to sound and v.v.
Enough of this. Instructions were like this:
A 452 P 347
(two instructions to a word), which meant add what is at memory 452 to the accumulator (the data register) and put the result in 347.
This lies somewhere between machine language and assembly language. Addresses are absolute, but it is not a string of binary numbers either. The interpretation was done in hardware and I guess from that point of view, the engineers who designed it wrote this interpreter. later on, a true assembler (and linker) came along that allowed you to use relative addresses (still not named) and compile in subroutines from a library.
The programs were fed in on punched tape and the memory was supplemented by mag tape. The tape servos were a real piece of work.
Incidentally, MTBF was about 5 minutes and the machine dumped its entire memory to tape every minute or so so it could take up where it left off when it crashed. These crashes were caused by vacuum tube failures. The arithmetic registers were in triplicate and the results were considered correct if any two agreed and when all three differred, it crashed.