How are computer languages written?

Hari_Seldon · April 23, 2009, 3:09pm

I was around (actually operating an analog computer) when people in the same lab were hired to write programs for the Univac I that Sperry-Rand had donated to Penn (it was already obsolete in 1956). The machine came with storage for 1000 double words (each double word being 12 bytes of six bits each–no byte did not invariably mean 8 bits). The machine automatically interpreted double instructions like
A__647?__734
where A means add, ? means my memory is failing and _ represents blanks to mean
add what is in memory location 647 to what is in the accumulator and store it (if I could remember the store mnemonic–I do know S meant subtract) in memory location 734, You could think of this language as halfway between machine language and assembler. Soon, Univac provided an actual assembler. There were two real differences. First it allowed symbolic rather than explicit addressing. The second was that it had a linker that allowed you have a library of procedures that could be incorporated in any other program. Obviously the second feature was heavily dependent on the first.

AFAIK, the first real compiler was Fortran (formula translator) created by IBM for their computers. Although primitive and heavily weighted towards arithmetic, it was a true compiler in the sense that you didn’t have to know machine language to use it. I assume that an assembler (itself programmed in machine language) preceded it. During the 60s, there were probably over 100 languages created, most of which have since fallen by the wayside (including Forth, my peronsal favorite language).

ftg · April 23, 2009, 3:25pm

A few words on the design of programming languages, if the OP is at least partially interested in that. Some languages such as Java were originally designed by a single person (in that case James Gosling). Many other languages are designed by committees, some quite large. Examples include Cobol, Algol and Ada. And of course everything inbetween. C was desgined by a single person, it’s followup C++ was by a single person but recent changes and standardizations are by an international committee. Wikipedia can fill in more about this.

Each language has certain features and flaws that reflect the attitudes of the designers. Niklaus Wirth designed Pascal and Modula. One of his quirks is that he doesn’t understand how to set the precedence of operators correctly. So

x<y and z=3

is parsed as

x<(y and z)=3

which is a compiler error. When everybody and his uncle knows it should be

(x<y) and (z=3)

Sheesh.

Among more recent languages, many of Gosling’s choices in Java are “interesting” to say the least. I’ve got a lot of problems with it. E.g., it doesn’t allow first order call by reference, supposedly for safety reasons, but does allow second+ order call by reference (which has all the usual safety problems). Hence you can’t write a function to swap two things! This alone makes programming in Java a nightmare. You have to create references to everything and pass references do to normal everyday things. He also considers operator overloading a bad thing unless he’s doing it (e.g., “+” for string concatentation). It also didn’t originally have generic types, making it essentially a non-object oriented language. I think I better stop.

The reasons there are so many languages is that there are a lot of different tasks that you want to write programs for but certain tasks are done in certain ways better than others. Ralph Griswold co-wrote SNOBOL to make string processing easier. Years later he developed Icon to update things to a more structured approach. It is a really easy language to write searching/backtracking type programs in. Short, easy to code programs that take forever. A lot of tradeoffs to consider in designing a language.

Bear_Nenno · April 23, 2009, 3:34pm

You entire post was one of the most informative I’ve ever read on anything. A personal mystery solved. Thank you!!!

griffin1977 · April 23, 2009, 4:26pm

If you wanted to do it from “first principles” you could write your compiler(or interpreter) in asssembly code. However in by far the most common way to do this is to use on one the numerous high level languages and tools designed to help you write compilers (by taking the grammar that describes you new language and generating the code necessary to process it). The most commonly used one (in my experience) is Lex and Yacc. In my current job we use LLVMto do alot of the leg work (in particular the optimization).

Bear_Nenno · April 23, 2009, 5:07pm

Using only the wooden type processor and bearings shown upthread in Sage Rat’s link, how big wood a computer have to be to possess the computing power of the ENIAC or UNIVAC?
What about a 1985 computer? How big would one that powerful have to be if it used wood and bearings instead of silicon and electrons?

Really_Not_All_That_Bright · April 23, 2009, 5:20pm

There were personal computers before IBM. There were no PCs. The modern meaning of PC is IBM-compatible: what we’d call a Windows/Linux/Ubuntu/Unix (?) box today.

The Apple II, Amstrad, C64, Amiga, etc. were microcomputers, personal computers, but not PCs.

KneadToKnow · April 23, 2009, 5:30pm

Bijou_Drains · April 23, 2009, 5:31pm

I think a Mac is a PC but maybe I am weird. To me PC is a generic term that doesn’t mean it is IBM compatible.

I know a lot of people say “Do you have a Mac or PC?” so maybe I am in the minority.

Chronos · April 23, 2009, 5:58pm

One other note about bootstrapping to write a compiler in its own language: Often, for just compiling the compiler itself, all you need is a stripped-down, bare-bones version of the language, so you don’t have to write the whole thing in the simpler language first. For instance, I would consider any C compiler to be woefully incomplete if it couldn’t handle the sin() function, but you never need to use the sin() function just to compile a compiler. So when you’re writing the very first C compiler ever, in assembly language, you don’t bother to include sin(). The next version, though, the one that’s compiled in your brand-new proto-compiler, does need it, so that’s when you build in support for all of the math functions.

misling · April 23, 2009, 6:25pm

When I studied Computer Science at Michigan State, 1978-1982, our junior year was spent mostly writing code in assembly language and writing device interrupt handlers (for example, the program that takes your keytaps from the keyboard, interrupts processing, and passes the keystrokes in a form the machine can use), and our senior year was spent writing a lexical analyzer and a language compiler.
I’ve never used any of that knowledge since. But I guess it has made me a better programmer.
We never once wrote an end-user application of any kind. Nor did we ever design a database for an end user application. Odd emphasis in that degree.

PatriotGrrrl · April 23, 2009, 6:49pm

The C compiler I’m using doesn’t include a sin() function. You have to call a sin() function from a math.h file. That calculates it using… well I’m not sure what algorithm it is but it doesn’t use any higher math functions. The math.h file came with the compiler but it is just a source code file, not actually part of the compiler.

Chronos · April 23, 2009, 8:23pm

Well, I should have said part of the compiler system-- You’re right that it’s not part of the same binary as the compiler itself. It’s not actually in math.h, either, though: That just contains the prototypes and such telling the compiler how sin() ought to be called. The actual code that calculates sin() (probably using Chebyshev polynomials) is in a precompiled library.

Derleth · April 23, 2009, 8:24pm

More-or-less. (IBM doesn’t control the standard anymore.) It means the hardware is commoditized and built to open standards anyone with the money can implement. This was first done with computers built around the S-100 bus beginning with the Altair 8800, with CPUs running from the Intel 8080 to the Zilog Z80 to some others, I think. All of those computers could share a common base of peripheral hardware (text terminals, printers, hard drives, etc.) and other components. They also shared a real OS: CP/M, which was built on top of a hardware abstraction layer called BDOS to allow a single version of CP/M to serve multiple related hardware families. Most application software (WordStar, for example) could be used on all computers of that kind with relatively minimal hassle.

This is in opposition to how other micros were made. Apple, Commodore, Atari, and other companies, the ones who built computers around the 6502 and clones, each made their own kind of computer, which wasn’t compatible with any other company’s model and couldn’t use commodity parts. Some software could be shared, but not a lot, and there was no such thing as a real OS for such systems: They each had their own BASIC (or, in some cases, Forth) monitor in ROM. Each little company was an island and each userbase was isolated from the others.

Ubuntu is a Linux distro, meaning it’s the Linux kernel plus a lot of other software to make it usable. Linux is a variety of Unix but don’t tell the lawyers that: The name ‘Unix’ is owned by The Open Group and the Linux development team hasn’t licensed it. In every way that matters, though, Linux is a Unix.

The compiler doesn’t contain the sin() function. It allows your programs to call it via a somewhat complex process involving other programs called a linker and, eventually, a loader. sin() resides in an external library usually called libm.a.

Keep in mind that the above is a horrible lie. In reality, most C compilers for modern general-purpose hardware just replace all calls to sin() with a single occurrence of the SIN opcode (whatever the CPU involved wants to call it). It’s a lot faster that way, and if the function is wrong people can bitch at Intel or AMD, not the poor library writers.

math.h just tells the compiler that sin() expects a single floating-point number and returns a floating-point number. It doesn’t implement it at all: .h files merely define the interface. (This is in contrast with C++, where .h files might contain the bulk of the program.)

PatriotGrrrl · April 23, 2009, 8:37pm

Not in this compiler (CCS) it isn’t. Math.h is the whole math library, not just prototypes, and it’s source code, not a binary.

I have go now, but I’ll post the source when I get home.

Stealth_Potato · April 23, 2009, 8:45pm

Well, I’m with you on the operator overloading, at least. By “second order” pass by reference, do you mean the passing by value of references? Strictly speaking, everything in Java is passed by value, no exceptions. IMHO this isn’t really a bad thing – it’s just that Java’s designers thought the language would be better with simplified calling semantics. A well-defined system in the Java philosophy shouldn’t rely on a method being able to futz directly with its calling context’s variables. Bending over backwards to replicate that kind of behavior is really just striving against the language, and in that case it’s probably best to just use a different one.

Superfluous_Parentheses · April 23, 2009, 9:07pm

In C, as in many (probably most) languages, a lot of “built-in” functionality (like the sin() function in C) is actually part of the standard library. The difference between the standard library and the language is that the standard library is at least conceptually written in the language itself - so sin() is written in C. This means that the C standard library conceptually contains only functions, structure definitions and probably macros (though AFAIK macros are not really part of the C language, and many implementations of C standard libraries probably contain a lot of assembly code for optimizations).

In other words, the language proper provides all the functionality you need to implement the standard library, and the standard library contains the stuff that makes it a lot easier to write useful programs. For example, the C input/output routines are not part of the language itself, but you really want them in the standard library, because they abstract the operating-system specific ways of doing I/O.

Now Perl and Ruby and even C are already pretty complex languages even without the standard library, with all kinds of special constructs and syntaxes (like function definitions, while() loops, all kinds of branching and logical operators) but it’s possible to create languages that have an extremely low amount of primitives and write pretty much all of the stuff that’s usually thought of as “language”, in the standard library (so, in the language itself). For instance, a complete Common Lisp system (and CL is a pretty large general programming system) can be built using only about 15 or so primitives. “Real” implementations of CL probably contain more primitives, again for optimization etc, but it does make implementing, bootstrapping and porting Lisp variants probably a lot easier than most other languages.

Superfluous_Parentheses · April 23, 2009, 9:21pm

Oh, another tip, if you’re interested in this subject and have some Java knowledge: a newish Lisp variant called clojure is implemented in what’s really a pretty small amount of reasonably easy to understand Java code with the rest written in clojure itself. See http://clojure.org/ with the source code at Google Code Archive - Long-term storage for Google Code Project Hosting.

PatriotGrrrl · April 23, 2009, 9:34pm

OK, so sin() is defined in terms of cos(). But they’re both in math.h.



////        (C) Copyright 1996,2003 Custom Computer Services            ////
#define PI_DIV_BY_TWO	1.570796326794896
////////////////////////////////////////////////////////////////////////////
//	float cos(float x)
////////////////////////////////////////////////////////////////////////////
// Description : returns the cosine value of the angle x, which is in radian
//
float cos(float x)
{
	float y, t, t2 = 1.0;
	int quad, i;
	float frac;
	float p[4] = {
		-0.499999993585,
		 0.041666636258,
		-0.0013888361399,
		 0.00002476016134
	};

	if (x < 0) x = -x;                  // absolute value of input

	quad = (int)(x / PI_DIV_BY_TWO);    // quadrant
	frac = (x / PI_DIV_BY_TWO) - quad;  // fractional part of input
	quad = quad % 4;                    // quadrant (0 to 3)

	if (quad == 0 || quad == 2)
		t = frac * PI_DIV_BY_TWO;
	else if (quad == 1)
		t = (1-frac) * PI_DIV_BY_TWO;
	else // should be 3
		t = (frac-1) * PI_DIV_BY_TWO;

	y = 0.999999999781;
	t = t * t;
	for (i = 0; i <= 3; i++)
	{
		t2 = t2 * t;
		y = y + p* * t2;
	}

	if (quad == 2 || quad == 1)
		y = -y;  // correct sign

	return (y);
}

////////////////////////////////////////////////////////////////////////////
//	float sin(float x)
////////////////////////////////////////////////////////////////////////////
// Description : returns the sine value of the angle x, which is in radian
//
float sin(float x)
{
	return cos(x - PI_DIV_BY_TWO);
}

Chronos · April 23, 2009, 9:37pm

I should really know better than to oversimplify like I did, in this crowd. I did not know, though, that trig functions are implemented in hardware nowadays… That’s interesting. Nor did I know that c++ stores actual meaty source code in the .h files.

Sage_Rat · April 23, 2009, 10:16pm

The building block for a processor is the transistor (similar to how the marble adder had a teeter-totter thing). But transistors work off of electricity. The nice bit about electricity is that you can -remove- electricity from higher up in the chain fairly easily. How you suck a marble up and move it back to the top of the machine is going to be a bit more cumbersome. But so say that we can create a transistor out of wood and marbles, and a way to link all our transistors together in a way such that the marbles can be fed back into the machine–and the logic to do this is also run by the machine. And so each transistor takes up a 4’X4’ square. Currently, CPUs are up to having something like 2 billion transistors. That’s about 45,000X45,000 transistors, which works out to about 2.8 miles per side for our marble adder.

Topic		Replies	Views
How do you develop a programming language? Factual Questions	36	2135	August 18, 2006
how does a computer understand a language? Factual Questions	24	12947	January 19, 2001
A couple basic (hah) computer programming questions Factual Questions	38	1661	March 20, 2005
Explain the concept of "computer language" to me. Factual Questions	58	6244	June 15, 2004
Need a FORTRAN compiler Factual Questions	39	1550	November 4, 2004

How are computer languages written?

Related topics