How, exactly, does binary code actually become what is shown on a computer screen?

Ah, OK. Maybe it’s just that C has a culture of treating bitwise operators as a basic part of the language that every C programmer should know about, where in most other languages, they’re an advanced topic that wouldn’t make it into an introductory course. There are certainly cultural differences like that between languages, hence the old jokes about how a C programmer can write C code in any language, and a Fortran programmer can write Fortran code in any language.

I had a bug crop up once in one of my programs that I think might have been due to a cosmic ray: One of the variable values was set to 0 while running. The amusing thing was that it was a virtual Rubic’s Cube, and that variable was storing a color of a square, and 0, in that graphics implementation, was the code for black. So the net effect was that one of the stickers peeled off of my cube, and it continued to behave for the rest of that run exactly like a real cube that had lost one of its stickers.

In all seriousness, it probably is a good idea for a specialist in any given layer to learn at least the basics of the layer below and (if applicable) the layer above, because leaks in the abstractions do occur, and you need to be able to recognize and deal with them.

If you want a hilarious language, try Javascript. You cannot perform bitwise operations on actual locations in memory. This is despite the fact that the language supports all the usual operations defined on 32 bit integers. But the language doesn’t have 32 bit integers. It only has numbers. All numbers are stored as 64 bit floating point. If you want to perform a 32 bit operation the runtime converts the 64 bit float into a 32 bit integer, applies the operation and then converts it back to 64 bit float.

The language cannot store an integer value over 9007199254740991 aka 2^{53}-1.

They added a language extension, Bigint, that can hold a 64 bit integer. But it isn’t type compatible with ordinary numbers, and requires explicit conversion on every operation if you mix them together.

IMHO Javascript is only just behind C++ as a language that should have been strangled at birth.

Javascript actually uses floating point numbers to index arrays and as loop counters.

My mistake about the DC. :man_facepalming:

The low voltage differential makes sense. The follow-up posts touching on rationale and consequences of doing that were quite interesting, too.

Brilliant! Like The Longest Day (1962), where the Allies notify multiple French resistance cells, each with a different codebook, by broadcasting all the messages in turn over a single radio frequency. I didn’t realize that was a form of multiplexing. I had previously only thought of multiplexing as separating channels by frequency (like cable TV).

So at a programming level I’m pretty sure the prevailing video standard is VGA from the IBM PS/2 computers in 1987. Under that standard I understand there are certain IO ports reserved for controlling the graphics adapter, for example to put it in 256-color mode, where 128kb of addressable memory is mapped to the video buffer starting at the address 0xA0000. If the BIOS or other software wants to display the manufacturer’s name and logo on the screen, it will instruct the CPU to write to the corresponding IO ports to put the video adapter in graphics mode, then it will copy a bitmap to the memory corresponding to the video buffer. I think it is possible to put a picture on the screen through IO operations alone, but that would be much slower - a limitation born out of hardware optimization that bleeds through the layers of abstraction.

If I understand you correctly, at the hardware level the video display controller itself will be listening in on the system bus. When the CPU drives the system bus (i.e. address bus) with a signal representing 0xA0000, and control lines to indicate a write operation, the video display controller (if in the right mode) will come out of tri-state and prepare to receive data to its video buffer at offset 0. The next clock phase the CPU drives the system bus (now acting as a data bus) with a signal representing color data. The video display controller latches on to this signal and copies the color data to its internal video buffer, which is in turn polled asynchronously with respect to the CPU to produce video output at the monitor’s refresh rate.

I think my mental model is mostly grounded now, the only question left is about the nature of the system bus. I’m not clear on this - is the system bus an abstract object? I’m thinking of a telephone line hooked up to multiple handsets, and that’s my model for a system bus connected to multiple controllers. But looking at computer guts, I couldn’t point to a likely freestanding cable or otherwise visually identifiable cluster of lines on the PCB.

I think of the telephone line where anybody hears the ring and can physically pick up the handset to listen or start speaking. There are rules to make this work in real life. The ringing ceases when the first person answers the call, said person has the responsibility of routing the call if necessary, and everybody else knows it is rude to listen in.

How would this work as applied to a computer’s system bus? As you said we can’t have the graphics controller and the memory controller driving the bus at the same time when the CPU wants to read from memory-mapped space. Is the controller on the RAM chip programmed not to ‘listen in’ when the CPU attempts to read/write to memory-mapped IO? Or does the bus maybe snake through a pull-down resistor and transistor after connecting with each controller, to prevent the ‘ringing’ from being propagated further if the controller ‘took the call’?

~Max

The idea of tri-state is about driving the bus, not for reading. Everything listens all the time unless it is driving. But only one thing should drive the bus at any given time. When not driving the bus, the output of any element goes into a high impedance state, which is the third aka tri-state. If needed, a tri-state pin can both listen and drive, so when in the tri-state is can also be listening.

So, this should be partly answered by the above.

Everything listens. It recognises that it is being addressed by the address on the bus. If the bus has a limited number of wires and needs to use the same wires for both addresses and data the first part of the bus write cycle delivers the address, the appropriate device recognises that the address it part of the range it provides. It will latch the address, and then on the next part of the cycle, when the data is placed on the bus that element will be listening for the data, and the other elements on the bus won’t be doing anything, and just ignoring what is going on, with their bus drivers in tri-state.

A read works the same way. The address is driven onto the bus by (say) the CPU, and on the next clock tick the memory or device writes the data onto the bus, and the CPU (which has relinquished drive to the bus and is now listening) receives the data.

This is all a bit simplified, bus protocols can get very involved.

Managing propagation over a bus is a critical part of how they work. Yes, every connection needs to manage this to avoid ringing. Again, it can get complicated. In the simplest form you terminate the ends of the bus proper and drive the bus with a controlled impedance. Listening on the bus is notionally done with a high impedance, so it doesn’t affect the bus, and multiple listeners can co-exist, all listening at once.

Multi-device buses are Considered Harmful these days. I guess that makes them not buses.

SATA replaced IDE for drives. That’s a point-to-point serial connection that replaced a parallel connection with primary/secondary devices.

PCI Express replaced PCI (with AGP somewhere in the middle). Again, a fast point-to-point serial connection replaced a parallel multi-device bus.

USB is also point-to-point, using hubs as splitters. The hubs buffer the incoming packets and shuttle them out on a connection to the computer.

It’s just too difficult to handle multiple devices on anything meant to go fast. Not just for bus sharing reasons, but because the electrical characteristics are guaranteed to be worse. It’ll never work at multi-gigahertz speeds.

Any computer will have a bunch of I2C devices, which do use a shared bus with an addressing system. Those only run at 400 kHz, though. They’re meant for simple things like temperature or voltage monitors.

The CPU is driving the address 0xA0000 onto the bus. It’s a read operation which means one device is supposed to drive the bus in response. The graphics controller is programmed to recognize that address.

0xA0000 is an address in logical memory. If the system has at least 2mb of physical memory then there is a portion of physical memory corresponding to 0xA0000. So how does the controller connected to the physical memory know it is not being addressed when 0xA0000 comes over the address bus?

~Max

With no further wires on the bus to distinguish IO operations from memory operations the answer is that it won’t know, and your computer won’t work. The term is “address conflict”. A billion years ago when computers were built something like this with simple buses, devices and memory, these issues were potentially something you had to deal with. Depended a lot on the computer architecture. But memory mapped IO devices had DIP switches to set the addresses where they lived. Your job to get it right. Again there are lots of complications and a lot of this is oversimplified, but the gist is there. (Usually IO space was, by convention, always away from memory, but all things are possible.)

As noted above, modern computers don’t expose such things, and the various buses that are seen are much much higher level. But peer inside many SOC (systems on chip) designs and all the mess is still there.

It doesn’t know it so you don’t do that. A memory-mapped device is no different than data storage memory as far as the CPU is concerned. Computers don’t work if different memory hardware responds to the same physical address for memory operations.

Okay, that answers my question! And the newer scheme is apparently to do away with busing altogether and use serial architecture, as Dr.Strangelove wrote.

Thanks to all of you. It’s been a very informative thread for me.

~Max

Some CPU architectures had distinct I/O and memory maps, including dedicated I/O opcodes. For instance, the Zilog Z-80, circa 1976.

ETA: I just realized this was an architectural choice Zilog inherited from the Intel 8080 it was designed to be compatible with.

AFAIK, that architecture is generally outdated in modern processors except maybe for microcontrollers, which still seem to treat I/O very differently from memory access.

Some busses have an I/O status and a completely separate set of I/O addresses. Video controllers may map to a shared section of main memory with random access ability to hold the graphic image. The bus has to allow shared access to memory by coordinating the read and write cycles to memory to prevent conflicts address by address to allow random access. The same thing is done with conventional Direct Memory Access devices which may lock up some range of memory while copying in an entire block of data interleaving access to individual addresses like in random access sharing. And the use of multiple CPUs including special processors like FPUs also required coordinated memory access. Those are still just basic bus configuration concepts.

Right. It hasn’t been worth it to dedicate the real estate to a separate logical I/O bus. Microcontroller I/O processing is usually more complex than simply another address space so there is much more logic already dedicated to that functionality.

Even in PCs, memory-mapped devices are universal (just not exposed to the user). In fact, there was a recent case of this becoming important.

Graphics chips (over PCI or PCI Express) can map their graphics memory into the CPU address space. As you note, this shows up as an address in physical memory, even though it’s actually a device over a bus. In this particular case, it’s called the “BAR1 mapping”.

20 years ago or so, graphics chips had tens of megabytes of RAM and CPUs were 32 bit (giving 4 GB of address space). So there was no problem squeezing this mapping in to the available address space. Somewhere along the way (not sure when, but it might have been in the PCI spec itself), the BAR1 was limited to 256 MB, basically because much more would start to impact systems with gigabytes of memory.

Eventually, GPUs were produced with >256 MB of VRAM. The BAR1 mapping wasn’t enough, but that was sorta ok; you don’t actually need to access all of video memory directly most of the time. The BAR1 could act as a sliding window, and could be put where it was most useful.

Years later and we have multi-gigabyte video cards and 64-bit systems. But the BAR1 window is still 256 MB. There’s no address space limitations any more, and the window is feeling very tiny compared to the total amount. Furthermore, low level APIs like DX12 are giving applications new reasons to write to video memory directly.

So in the past year or so there has been an effort for “resizable BAR”, which allows expanding that window to cover all of video memory. Unfortunately, there’s a long history for that small window, and lots of hardware out there that’s still limited. You need a new CPU, motherboard, OS, and so on to get it working.

Many ancient systems had a region of regular memory that was constantly being scanned by the graphics controller to generate the display. It was common to use a small patch of memory to hold byte codes for the character-based display: the CPU would just change the value of the bytes and the graphics controller would produce different output on its next pass. I seem to recall something about a raster sync signal that helped maintain visual order: the CPU would wait for the graphics controller to finish a line on the screen before changing values on that line, to prevent momentary garbage from appearing on the display.

The original Macintosh had a bit-map of the screen in memory (most earlier computers would switch between character mode and bitmapping, the latter being used mostly just for games). The Mac CPU was fast enough to draw bitmap images of various fonts onto the screen, which signalled the advent of proportional text (where characters take up appropriately varying widths) on consumer-grade computers.

One advantage to having the graphics controller share memory with the CPU was that it simplified DRAM refresh. Dynamic RAM content is kind of fragile and will start to fail if columns of memory are not accessed regularly, but it is hugely cheaper and more compact than SRAM. Since the screen memory has to be scanned constantly, there you have your graphics controller taking care of that issue as a side effect.

This is still the case. The display is just a region of memory that gets scanned out. Sometimes in dedicated video memory, but most graphics controllers are integrated and share system memory.

In most cases, you use what’s called double buffering. That means you dedicate two memory regions to the display, and render the graphics to one while the other is being scanned out. When the new image is ready to display, you just write a new address to the controller so that it scans from the buffer you just finished. Then you can start rendering graphics to the other buffer.

For best results, you typically wait until you hit what’s called “vblank” (vertical blank) before switching the addresses (called “flipping”). In the old CRT days, this was when the electron beam was physically moving from the bottom back up to the top of the screen. It wasn’t active then and so you wouldn’t get any artifacts by flipping then. LCDs don’t have that characteristic, but they still leave a short vblank period so that the flip can take place.

If you don’t wait, you can get an artifact called tearing, where you see a portion of each buffer on the screen. It’s called that because it looks like you have two slightly different photographs stacked, but the top one has been torn off partway.

On most modern systems, I believe that method is largely passé. The display hardware is embedded in the GPU, the CPU sends it instructions and the GPU renders images to the screen (sometimes bitmaps, sometimes output that it computes itself) using sync logic that is built into the hardware. Frees up the CPU from having to do a lot of grunt work. Even copying image data from memory to off-board VRAM is handled by graphics DMA (in non unified-memory system).

Having the CPU write directly to visible buffers is largely passe. Double buffering (and triple buffering) is not. The GPU still takes time to render the frame, and you don’t want to see the rendering in progress. So you have one buffer that you are rendering to and another that’s being scanned out.

In fact there will be a large number of buffers that will be rendered to, which get composited down in various ways to the scanout buffer. You then flip to that buffer. But you still need one to scan out from, and one to render to, and those get reversed on a flip.

The sync logic is handled in various ways. Modern GPUs have several small CPUs integrated on them, and they might handle things like flipping. Classically, the method was usually for the GPU to trigger a CPU interrupt, and then CPU would then write the new registers. That’s still a perfectly fine way of doing things, and still in use for various functions. It’s only a few bytes of data, and only happens at a low frequency, so not a big deal.