Or as one of my EE colleagues is fond of saying, “When you look at anything close enough, it all becomes analog.”
It’s not that simple. If, due to voltage droop say, or crosstalk, your input signal rises slowly enough the output of a gate may switch later, and in some cases the wrong value will be captured at a flop being driven by that gate. We can (and do) test for gate defects having this behavior, but crosstalk can look like a gate transition fault in certain circumstances.
When I was at Bell Labs there was a famous analog expert down the hall, and he loved to say that.
I’ve wondered if a good way to achieve multi-processing with a single processor is the way it was done on the CDC 6600 PPU. The single processor ran eight threads concurrently: on cycle (8j+k) it fetched inputs, including next-instruction-counter, from register bank k — imagine each register to be one of 8 in a rotating barrel. Note that this eliminates the need for register-conflict detection which complicates high-speed processing of a single thread.
Even a good signal can have the “wrong” value if clocked at the wrong time. In some cases it doesn’t matter if it’s 0 or 1 — an action can just be delayed for a cycle — but it must not be in a “metastable” state (e.g. “½”) where, for example, it might appear as 1 to one gate, and at the same time appear as 0 to another gate, or perhaps worse, cause such a gate’s output to also be in a metastable state.
Recall IBM’s 370/158 Multi-Processor where the two CPUs used independent oscillators, and there was no known timing relationship to allow one processor to sample signals from the other processor. With careful design only a small number of signals will be of concern but those signals were clocked through about 4 DQ flip-flops in series to reduce the chance of the 4th output still being metastable to about zero.
Such problems can be addressed with asynchronous self-clocking logic. I’ve been away from circuit design for decades; did asynchronous logic ever catch on?
Its not that different to hyperthreading (like the MTA architecture does it, rather than just 2 threads like Intel’s offerings.) The difference is that you context switch every clock cycle. For the 6600 you had a fixed number of threads, and thus you got essentially virtual processors. The MTA context switched every clock cycle, but there were a variable number of threads (up to the number of register files) ready to run, so your resume time was not fixed. (A big part of the MTA’s design rationale was that there were the same number of thread register files as the number of clock cycles to get a memory cycle done. So each thread could essentially operate with no memory stalls. As always, it was the lack of compiler technology to take advantage of this that was the Achilles Heel.)
This style of coroutined threading still exists. For instance look at the XMOS XCore 200 series processors. They use it in much the same manner as Seymore did for the 6600. Each core is hardware scheduled across 8 threads. It is amusing how often you see something very much like the 6600 PP architecture in embedded processor designs. It makes a lot of sense, particularly when you have real time issues to address.
The metastable state problem is a real one, but we have tools to deal with it, so I’ve never seen it as a problem in real designs. There are good static timing analysis tools to look for problems. Long paths have buffers that serve as repeaters to make sure the signal is in good shape when it reaches its destination. Dealing with global signal paths takes a lot of work. These tools look for things like race conditions also.
The transition fault model is done with 1s and 0s, but it covers cases that would be much too painful to simulate also.
Serdes logic, used for most high speed I/O these days qualifies, I think. But I’ve seen lots of claims that internal logic would be made asynchronous, and it never pans out.
At an SRC meeting a while ago someone from HP said that they looked at this in depth, but decided that there was no good way of testing it.
Part of our verification protocol was to take some early parts, apply delay tests at increasing clock rates, and make a list of the flip-flops that failed (captured the wrong value) for each increment. Looking at common failures across samples was a very good way of finding timing problems. We found one, missed by static timing analysis, in just a day. It turned out STA missed finding a bad layout that slowed down a signal because the model for that cell was missing.
I think asynchronous logic introduces more problems than it solves. Except for high speed I/O.
Yeah, the 6600 was an awesome architecture. Thornton’s book on the CDC 6600 was one of the things that got me interested in computer architecture.
Let’s bring back one’s complement and -0s!
I should have mentioned, for young’uns following along, that the 6600 PPU was NOT the blazingly fast 60-bit CPU which ran at the then breathtakingly-fast speed of 0.01 GHz. That CPU did have a complicated register-conflict “scoreboard” to speed up performance.
The PPU was a 15-bit machine (also running at 0.01 GHz) to direct I/O and do certain “kernel” functions. Googling suggests that I misremembered — there were 10 slots in the PPU’s barrel, not 8.
The 6600 was a superb machine.
Another good read is a A few good men from Univac.
I’m rather proud to say that I have a rack of logic modules from a 6600 sitting in my lounge room. Discrete transistors, each designed by Seymore Cray. Hand wired wire wrap backplane too.
For a class project my senior year of college I wrote an implementation of the 6600 using PDP15 modules. I think - it was almost 50 years ago. Luckily for me you couldn’t simulate anything in those days, so I never had to worry about bugs in my design.
Yes, Virginia, there was a PDP 15. Though I never used or saw one.