I remember way back when I had a Spectrum in the 80s guys were always using assembly language or machine code to write programs. They called it hitting the hardware. I even tried to learn assembly language myself but gave up in the end.
Do people ever still write in really low-level languages or is it completely unnecessary now given the speed of modern cpus?
Modern compilers nowadays can often do a better job than assembly language programmers because much of the benefit of modern CPU design comes from instruction pipelining. This allows the processor to begin processing one instruction before it has completed the preceding ones leading to much greater throughput. To achieve this the processor needs to predict which instruction will be next and is affected by how they are sequenced. Very often there are several choices of instruction sequences which achieve the same result, but one sequence may lead to better predictions by the processor and therefore run faster. It turns out that compiler software is usually better at finding optimal instruction sequences than humans.
I have in recent times. The reasons one needs to are usually much of the same ilk.
Modern languages and compilers only produce a subset of a CPU’s instructions and access to its capability. If you need to access those instructions that are reserved for privileged modes, you will need to generate machine code more directly. Also, in implementing low level language constructs - so language run time, you often need to directly manipulate the machine state in a manner that high level languages cannot.
Examples include -
device drivers - where a lot can be done in a higher level language, but by no means all.
Operating system implementation - where you need to access system state, manage things like exceptions - which may need access to privileged registers as well.
Bootstrap systems.
Implementing concurrency systems - where you need to be able to manipulate the state of the stack, modify things like return addresses, saved parameters, save and restore entire stack frames.
As much as you can you tend to write systems that abstract over the problem. For one task, the first thing I wrote was essentially a library that when used was a stream of function calls, that at first sight, looked like a symbolic assembler source. But it generated dynamically generated code that could be injected into the running program. This created a dynamic co-routine system that was silly fast.
Some languages implementations (such as gcc) provide hooks that allow you to cause the compiler to emit machine instructions as part of the compiled program. However there are some significant caveats with this, and the support has become spotty.
Many, many years ago I played around with a free assembler. It had the same interactive approach as GwBasic. You could type in a few assembler command and execute it.
A great way for beginners to learn assembly before buying a compiler.
Assembler was easier on the 8086/8088. You could pretty much do anything. Including crashing MS-DOS.
They added protected mode I think in the 286? or 386? You couldn’t move values to certain registers. Basically they were protecting the computer from bad programming practices.
Memory management got more complex. It just got harder with each new microprocessor.
It’s definitely a lot rarer though than in the heyday.
As implied in the above responses, the main reason for getting this low-level now is to create an abstraction layer so no-one else needs to go so low-level.
You wouldn’t develop a whole app this way (unless the device is very simple) and there’s not so much need (or efficiency gains, thanks to better compilers) in writing critical routines in assembler.
I work on device drivers for a living. It’s been a while since I’ve coded anything in assembly.
However, I would say that compilers are still not that great, if absolute peak efficiency is the goal. Fortunately, modern compilers support intrinsics, which are a way of accessing low-level instructions within higher level languages (like C++). In particular, these give you access to the SSE instruction set, which is mostly vector floating-point math. Compilers are not that great at vectorizing math, but it’s fairly easy to do the right thing with intrinsics. It’s messier than normal high-level code but not nearly as bad as straight assembly.
Also, while I do not write much assembly, I read and step through it almost daily. It’s still often the only way to really understand what’s going on in a program. Also, crash reports from the field frequently come in the form of memory dumps. There may be no symbols (a way of relating assembly back to the original program code), at least not reliable ones. So debugging in pure assembly is the only option. It’s not too difficult, but does require patience.
It’s very easy to write inefficient assembly code. Memory leaks for an example are a notorious example. A sloppily written program doesn’t release allocated memory. A device driver may start using more and more memory. Until it’s shut down and restarted. Badly written TSR’s could be really frustrating.
Modern compilers in C or other languages are much more efficient in most situations. That’s why most serious coding is in C or another high level language.
It’s fun sometimes to instruct a C compiler to produce a assembler listing. It’s amazing to see how many instructions are required for various high level commands.
I would say C or a high level language.
C is IMHO little more than a glorified symbolic assembler. (Did my time writing systems in it C - if you need to hack bits and bytes it is the right tool.)
Seeing what compilers generate will often disabuse people of the idea that compilers will always outrun human coders. Compilers optimise the stuff they have rules of how to optimise. In general it is a bad idea to do anything other than write straightforward code that captures your intent, and then allow the compiler to get its teeth into it. But sometimes you are doing something that really isn’t going to allow the optimiser much traction at all, but where you do have a very good idea of where the real optimisations can be found. But this is very rare.
A middle ground can be found in compiler hints. Branch prediction hints can sometimes be valuable, but modern CPUs have quite large branch prediction tables, so it isn’t the win it once was. Forcing prefetch instructions into critical bits of code can also help. But you really need to measure carefully. Sometimes the improvements are marginal at best.
Sometimes there at things where you should let the compiler well alone. My favourite example is Duff’s device. On the face of it this is a really neat optimisation in loop unrolling. The reality is that, depending upon the language it may actually be a ruinous counter optimisation. Here you are much better off letting the compiler do its own thing. (In addition to the issues the Wiki article notes, I have seen a compiler save and restore register state for each case target - because it was unable to guarantee that there could not be a jump to the label from elsewhere in the code. )
Just because the high level code looks slick does not mean it will turn into fast code. And just because there are a lot of emitted instructions does not mean it is slow. Some instruction streams will just vanish into the pipeline as fast as the CPU can vacuum them up. Yet other times the pipeline will choke on something that looks trivial at first sight. If you don’t measure it, you don’t know.
I spend a lot of time writing code in assembler, but I’m a bit of a dying breed.
The company I work for makes industrial control systems, and the controllers run a proprietary operating system and custom code that is written mostly in x86 assembly code with a bit of C.
In general it is much less common than it was in the 1970s and early 1980s.
In prior eras a fair % of programmers were aware of machine architectural details, sometimes acutely so. Many programmers working on a given machine would have CPU reference manuals on their desk. Here is one example of the Motorola 68000 Programmer’s reference manual: www.nxp.com/files/archives/doc/ref_manual/M68000PRM.pdf
Working at this level was so common that trade press ads would often tout how programmer-friendly one CPU instruction set was vs another. This was one factor which drove increasingly higher-level, complex instruction sets – because so many programmers worked at that level. The more work each instruction did, the less instructions they had to write for a given task.
In that era some end-user programmers (typically on minicomputers) would even work below the assembler level and do microcode programming. At this level they would write their own machine instructions, held in a “writable control store”. This gave full access to the CPU internal architecture and enabled better performance for small “hot” code paths, much like the relationship between hand-written assembler and higher level languages. That isn’t done anymore since modern CPUs don’t support it. However the modern spiritual successor to microprogramming is end-user FPGA programming where end users can add their own logic to a CPU: https://www.ece.cmu.edu/~calcm/carl/lib/exe/fetch.php?media=carl15-gupta.pdf
In prior eras machine resources were more limited, requiring hand-written assembler of performance-critical code regions, not just for device drivers. Also software and compiler technology was less sophisticated and programmers tended to work at a lower abstraction level. In those days even programmers who never wrote a line of assembler were still often aware of machine architectural details.
Today that is much less necessary – CPUs are vastly faster, compilers are better, and there are many conceptual software layers between the programmer and the hardware. Those layers facilitate productivity, reliability, code reuse, and software functionality. Increasingly apps are written in, packaged and distributed in machine-independent form where they will run on any CPU with an internet browser. E.g, Google apps (Maps, Docs, Calendar, etc) are written in JavaScript and their Closure Library: https://developers.google.com/closure/library/
The improvement in C compilers over the decades has been huge. The worst human-written assembly code in the 1970’s was much faster than compiler output. By the 1990’s many compilers would outperform the best humans in many cases … but not all.
I’ve been retired for a long while, and my last job was high-level design: my final output was in the form of patent applications rather than source code. But earlier I did a lot of assembly coding (Francis Vaughan’s experience sounds very similar to mine) and even a few bytes of machine code, e.g. to patch a running kernel or to do experiments on an OS for which I lacked source code. Circa 1990 I wrote assembly language code for Huffman decoding that sharply outperformed C compiler output. (To give some idea of how perfectionistic I was, I used very different approaches on Intel and Motorola because of a difference in the way shift instructions handled counts > 31 :smack: )
In the 1980’s I wrote a lot of code for device drivers and kernels. The amount of assembly language coding required was miniscule, and often could be done with a compiler’s inlining facility.
In the 1970’s I wrote code for custom processors where the first step was to write an assembler. And, yes, I also wrote microcode to implement new macroinstructions. On one occasion, techs ignored my instruction: Re-IMPL before returning machine to customer. Great fun!
Sometimes C code or assembly code would outperform standard ops or library functions! For example, coding your own Duff’s device in C was much faster than calling memcpy() on the Sun-3 workstation. (Easily explained when you examine the assembly-language source for memcpy() on Sun-3: “This loop is optimized for the instruction buffer on the Sun-2.”
Oh dear, you reminded me of one of my more interesting problems back when I was using a Sun-3. bcopy() on these machines suffered from a very strange bug, one that took me a solid week to find and fix. The bug wasn’t in the routine - which simply used the 68030’s byte copy instruction. Rather there was a really silly error in the kernel’s exception handling code. If the running copy instruction crossed a page boundary and triggered a page fault the kernel incorrectly restored the stack state and the instruction resumed with the running counter out by one. You could copy a slab of memory and mysteriously a single byte would be missing from the middle of the slab. Hilarity would inevitably ensue.
Amusingly the fix was to elide a whole slab of code from the kernel’s exception handler. I do wonder if this code suffered from a similar issue - maybe some of the code was a hang over from the Sun-2 or even earlier. There was still Lucasfilm code in the kernel.
One of my co-workers in 1987 had just left a job programming for a bank. I remember him saying nearly all their code was in assembly on a IBM mainframe. They had to finish their nightly batch runs before the bank opened for business. Assembly saved them enough time that they could meet that deadline.
As recently as about 2010 I was coding in OpenVMS’ assembly language which was called Macro. My career was coding primarily in higher level, 3rd generation languages like Cobol and C. But for the job that I had at that time, we were supporting a “legacy” application that ran on VMS. It had been originally designed when the current version of VMS was 5.5, when VMS clustering was avant gard and nobody else had yet figured out how to do that (roughly in the late 1980’s). At that time, the company had sold two versions of the application: one written entirely in VMS Macro and another one written in a combination of Macro and Cobol. So just before I left the company I was assigned to a project adding some enhancements to a customer’s Macro system. I actually quite enjoyed it.
Two things that have come up specifically in my work:
[ul]
[li]Modern processors run so much faster than their system memory that cache misses are a much larger contributor to execution time than producing slightly faster code. So reorganizing to improve data locality is generally better than switching to assembly to make the code itself faster.[/li][li]Many places where we would previously resort to assembly for speed, we now use LLVM-IR and let LLVM do the low level code generation. It probably can’t replace all uses of assembly, but it does a lot of things.[/li][/ul]
I write a lot of code for 8051 microcontrollers. I used to do it in ASM 100% or the time. Now the C compliers have gotten good enough, and the 8051s fast enough (and, with enough RAM) that I write in C most of the time. I do occasionally code critical pieces in assembler.
I worked on a project where we had a custom CPU we designed integrated with our hardware. Since it was our own design there was no compiler for it so we spent a couple weeks creating a development environment for it with a debugger and programmed it in assembly. It was a fun project.
The 8051 is classically a microcontroller, not a pure microprocessor. It incorporated a lot of hardware functionality that a pure microprocessor wouldn’t have, but rather depend on other chips in the design to support. In other words, an embedded controller for not-completely-general-purpose computing. Like, the controls on a coffee maker.
Embedded controllers used to be one of the last preserves of hand-coded assembly, just because the processors and their working memory sets were so tiny and the high-order-language compilers were so inefficient and sub-optimal that you couldn’t do as much useful stuff if you wrote your logic in HOL.
Not any more. Embedded controllers are now huge and blazing fast compared to the 8051, and compiler technology has improved to the point that hand-crafted assembler is like pushing a hand-cart down the freeway when you could be driving a Tesla.