how does a computer understand a language?

I’ve just begun to teach myself Java and C++ (well, it’s actually the books doing the teaching, not me) and a question occurred to me last night:

(I may have difficulty wording it, so bear w/me)

Any given language has a certain set of commands and functions that must be coded in the correct sequence in order for the computer to make sense of them. However, what is it that actually MAKES it a valid language? In other words, when someone creates a new language, how does the computer understand it? Why do the languages need to be complex? If people can create new languages, why can’t they just write a language that mimics a spoken language?

I know this will come across as a simplistic question, but what is the basic unit of a programming language, past the on/off bit?

Sorry in advance for the poor wording

Simply put: What gives any given language actual meaning?

Every computer language has a compiler, which converts the high-level commands you write into low-level machine code, which the CPU understands. It’s possible to write code in assembly language, which is a symbolic form of machine language. Here, every instruction you code cooresponds directly to machine code on the target CPU.

Computer languages are far simpler than spoken languages … it would be very difficult to make a compiler to turn English into machine code.

You can learn more about machine language at How Stuff Works

Arjuna34

smoke

Well, I took a course in writing compilers in college, so I’ll try to dredge some of it back up.

as Arjuna34said, the CPU of the computer understands machine code, and built into that is something that translates assembler (into machine code) Assembler is termed a low level language becuase it requires little translation to be made into machine code.

Now, for any other language, some one (usually a moderate sized team of someone’s) has to write programs that translate what you write into something the machine understands. The more english like the computer language being developed, the more effort has to go into that translation. The thing which gives a perticular set or letters meaning making it a command in the language (i.e. IF (something) THEN (something else)) is the folks who write the compiler. If they really felt like it, that same thing could be ( HW (something) DSF (something else) ) because they wanted to be unique and write something no one would be able to read and understand. Thankfully, most successfull languages use the letters IF instead of the letters HW, because it lets them leverate the fact you likely already know what the word “IF” means.

One thing I’ve always found interesting is that many compilers are actually written in the language they compile, using a method calle “boot strapping” (i.e. hauling yourself up by your own boot straps) where a small amount of the compiler is written in assembly language, the purpose of which is to translate some really simple statments, the next part uses those simple statements, it now knows how to translate, to instruct the computer how to translate some slightly more complex statements, and on up the line.

If I’ve muddied the waters more than clearing them, feel free to ask for clarification.

Not true. Some high level languages use interpreters. The compiler derives its name from the way it works, looking at the entire piece of source code and collecting and reorganizing the instructions. A compiler differs from an interpreter, which analyzes and executes each line of source code in succession, without looking at the entire program.

Interpreters can execute a program immediately while compilers require some time before creating an executable program. Programs produced by compilers run much faster than the same programs executed by an interpreter. Compiled programs are much easier to debug because in a sense the compiler is the language defining which instructions are acceptable.

Some interpretive languages are BASIC in all its flavors and OCL for IBM’s S34 and S36.

Translaters is the general term covering both interpreters and compilers.

Translaters is the general term covering both interpreters and compilers. Assembly language is translated by an assembler but, functionaly, this is a compiler. It is true to say that all languages have a translater

Translaters is the general term covering both interpreters and compilers. It is true to say that all languages have a translater. Assembly language is translated by an assembler but, functionaly, this is a compiler.

Wow, how did that happen? sorry

Um, yeah, but an interpeter is, in one sense, a compiler (in the sense that it converts high-level code into machine language). BTW, most modern interpreters look at more than one line of code at a time, for optimization purposes, although from the user’s point of view the model still holds (mostly).

Whether a language is interpreted or compiled really isn’t inherent in the language itself, but rather a function of what tools are available. For example, Microsoft Visual BASIC can be compiled into an executable (so can the ancient BASIC I used for the Apple IIe years ago, in fact), although traditionally it is interpreted.

If you’re refering to this IBM OCL, then I don’t think that’s actually a programming language …

The difference between an assembler and a compiler is that an assembler turns assembly language into machine code, while a compiler turns a high-level language into machine code. The difference between assembly code and high-level code is that in assembly code, there is a one-to-one coorespondence to an assembly statement and the resulting machine instruction (op-code) on the target CPU (and lets not bring up a Great Debate about macros ;)) This isn’t necessarily true in a high-level language, where one statement can turn into pages of machine code.

Arjuna34

Here’s my answer for you… I hope that you find it helpful, but I also hope someone will come along and clear up all the mistakes I’m going to make.

The simple answer to your question “how does the computer understand it?” is that by itself, the computer can’t understand it. In order for everything to work, the creator of a programming language must provide some way to translate his language into a form that the computer can understand by itself.

Your computer speaks only one language- machine code. Machine code is almost completely unreadable by human beings; it’s just a series of numbers. These numbers tell the processor and all the other internal hardware of the computer what to do. Machine code is entirely dependant upon the hardware to the computer, so different types of computer have different machine code languages. This is one reason why a Windows program won’t run on a Macintosh, and vice versa.

Programming languages, such as C and Java, were invented because trying to write computer programs in machine code is sheer madness. These languages allow programs to be written in a form which is far easier to read than a series of 1s and 0s. This is great for human beings, but not so great for the computer. Remember that by itself the computer can’t understand C, Java, or any other language other than its own machine code, so it needs a little help.

C is a compiled language. Since you are learning C, I guess that you know somthing about this. You write a source file, which is basically a text file containing your C code. Before your program can be run, it has to be sent through the C Compiler. The compiler is a program which reads through the source file that you’ve written, figures out what it means, and converts everything to machine code which can then be run on your computer. The machine code file usually comes with a “.exe” extension, meaning “executable”.

I should mention here that another bonus of using a programming language like C is that you can write a single program file which can be compiled on many different types of computer. A Windows .exe file can’t be run on a Macintosh, but if you recompile your .c source code using a mac compiler, you will generate a Macintosh program which functions basically the same as the PC version.

Following me so far? I really hope that I’ve written this up well…

There’s another type of programming language known as an interpreted language. This type of language does not require a compiler, and is never actually converted into machine code. Instead, you need a special utility called an Interpreter in order to run your program. The interpreter is kind of like a compiler: it loads your source file, and figures out what it means, but instead of converting the source into machine code the interpreter acts upon it immediately.

One of the troubles with interpreted languages is that anyone who wants to run your program needs to have the interpreter installed on their machine. Another trouble is that they are notoriously slow; when you run an .exe file generated by a compiled language, all that needs to happen is for the machine code to start doing its thing. When you run an interpreted language, the Interpreter needs to scan your program, figure out what it means, figure out how to implement it, check for errors, and eventually get around to doing whatever it is that you asked it to do.

Java is kind of an in-between case. It’s a compiled language, but it is compiled into the machine code for a “virtual machine”. This virtual machine isn’t hardware, it’s a software package: the Java runtime engine. So, the JRE is kind of like an interpreter for Java’s virtual machine code.

What makes a programming language a valid language? Why do they need to be so complex? Well, these things depend upon what purpose the language was created to fill. C was created as a general-purpose utility programming language, so it has to be pretty well capable of doing just about everything. Other languages are more specific in their roles, such as PostScript. This was a language developed by Adobe just to handle page formatting for printers and desktop publishing.

Why doesn’t someone write a general purpose language that mimics a spoken language? Well, for one, writing the compiler or interpreter for such a language would be sheer hell… no, seriously. I’m quite sure that this is one of the fifth circle punishments for evil math students…

Also, it seems to me that in a computer language, a designer would want to avoid such things as vagueness and ambiguity which are inevitable in human language. If I tell the computer to do something, then I want to be absolutely sure that it’s gonna do exactly what I said. Also, I want to be absolutely sure that what I told it to do is what I wanted to tell it to do. And I want the language to be exact enough that if in doubt I can check to see what the actual meaning of the stuff I told the computer to do is.

IMHO, C++ is a nasty and vile language. It is hideous in the same way as a bulldog… beguiling to some people, perhaps, but repulsive to others. If you’re just doing this for a hobby, I would reccomend learning Pascal first. A Pascal program looks much more like spoken English than a C program, and the language itself is less fricklesome. I use Pascal for most of my programming; a good (free) compiler can be found at http://www.freepascal.org.

Java is supposed to be a good choice. If you’re new to programming, you’ll probably have an easier time learning it than I have. I started programming back on the C64, and it’s been tough to bend my old habits around new paradigms, like OOP.

If you want the immediate power to create snazzy looking programs without having to tackle a steep learning curve, either Delphi or Visual Basic might be worth a look. If it’s games you want to program, check out the webpage for Blitz Basic. I’ve been playing around with the demo version for the past few nights. It allows you to write and run programs, but it won’t compile standalone executables.

I hope you’ve found this information either useful or interesting. And now it’s 2:30 in the morning. Good night…

Neat. In the time it took me to write my post, the question was answered and discussed several times over.

Thanks everyone for your great replies. Pyrrho, your explanation made sense- thanks for taking the time. Also, thanks for the sites and references.

  1. The program is written in a high level language. [The language was created because it is so much easier to code a few IF statements or borrow a few functions than it is to code seemingly endless lines of machine language.]

  2. The code is run through a pre-processor (usually built into the compiler program) where it is linked with any needed header files (header files are already established code functions that you “borrow” to use in your own program…no point in coding a function that calculates the sum or product of two integers if that function already exists. For example, if you plug (X = 4 * 4)into your program, it will return X = 16 because the pre-processor pulled in and used the math.h file which holds the code that explains multiplication to the CPU.)

  3. The compiler then translates the code into machine language. For every character in your program, there is a corresponding ASCII code. For example, A = 00010011, B = 0010001 etc. (Note = these example codes are not acurate. Anyone who has them memorized is totally hard core.)

  4. The program, when executed, is presented to the CPU in machine format. The CPU sees 00010011 0010001 00100110 etc which it interprets as a series of on’s and off’s, or more precisely, positives and negatives.

  5. When your program is installed on the PC’s hard drive, magnetically customized iron particles (positve on one end - negative on the other) are aligned on the disk in the proper order. For example, A = 0010011 = two negative iron particles, one positive, two negative, two positive. (This is why it is generally considered a bad idea to run a magnet up and down the outside of your tower.)

Please be aware that this is only my interpretation based on a few articles I read out on the web. I’ve used the C language in this example. If my understanding is way off base, someone please intervene and stomp out some ignorance.

Just a bit of trivia - in the early days, being a computer programmer meant sitting down and manually translating programs into machine code. Can you imagine what that must have been like? Granted programs of that era were nowhere near as big and complex as they are today, but damn!

When some smart programmer somewhere undoubtedly got sick and tired of all the 0’s and 1’s, he wrote a program that would allow the computer to do the translations. When he asked his boss for computer time to test it, he was reprimanded for suggesting the use of a multi-million dollar (and slow as hell)computer for a mere clerical task.

When you hear your PC churning away, with the little green light spasmatically blinking on and off, that is the armatures in the disk drive flying back and forth across the disks arranging and re-arranging iron particles to their neccesary magnetic charges to correspond to whatever program you happen to be running. Needless to say, all this happens very fast.

Two corrections to Kent4mmy’s posts: First off, the compiler doesn’t need the math library (actually stored elsewhere than math.h: That’s just the function declarations and such) for multiplication, which is a primitive enough operation that it’s already available in machine code. Special math libraries are, however, needed for things like sines and logarithms.

Secondly, when that smart programmer got sick of the 0s and 1s, he didn’t do anything about it… she did. The person you’re thinking of there is Grace Hopper, who was eventually made a Rear Admiral in the U. S. Navy for her work with the targeting computers used by the military.

Thanks Chronos. My apologies for assuming the aforementioned programmer was a man. I should have known it would’ve been a woman.

Nope, I’m talking about operation control language, developed early 80’s I think, for IBM’s midrange business line of computers. You can use it to do a lot of stuff with files and libraries and printers and such. Essentially it is used to establish system and session environments, control job streams, order data, etc. You also compile other HLL pgms on these boxes by using OCL.

Chronos: You’re right about Hopper; she belongs in the Hall of Fame with Hollerith. And didn’t she find the first bug (a real one)?

The only addition I can make to the excellent responses you’ve received is to touch on “tokens.”

There are a few translators out there that don’t actually read your code as a collection of ASCII characters, some use tokens to represent keywords, thus saving disk space. While disk space is hardly an issue anymore, it used to be an extremely important one. With tokens, the translator doesn’t read the keyword “Print”, it just sees a single high end ASCII character, thus saving several bytes of storage. The problem with tokenized languages were that you couldn’t use third party editors, they had to be edited in the native environment.

      • Computers don’t understand hardly anything. ~ Suggestion to the OP: get a beginner book on Assembler; even if you don’t want to learn how to program in it, the book will explain the surprisingly limited capabilities of computer processors. - MC