That seems a little misleading. In a modern VLSI design, you can multiply two reasonable-sized quantities in as many or few cycles as you want; it’s just a tradeoff against die area. A typical modern CPU really can multiply two 32-bit numbers in much less than 32 clock cycles; it’s not that they’re doing a really good job of executing other stuff during a 32-cycle delay.
Hardware multipliers typically start with an approach called “Booth multiplication”, which is similar to grade school shift-and-add but more efficient (and, since we’re here, a nice demonstration of two’s complement’s good properties). The partial products may be summed in successive clock cycles, or summed in a single cycle at the cost of lots of die area, or anywhere in between.
Addition is often still cheaper than multiplication, since (a) the CPU designer doesn’t necessarily parallelize the multiplier all the way to single-cycle latency, if that would take unreasonable area and/or add enough levels of logic to slow the whole processor down, and (b) in a processor that can execute multiple instructions simultaneously, even if they have a fast multiplier, some other instruction might be using it. It’s usually a factor of ~2, though, not a factor of the operand width.
Regarding “shift and add” on old electronic calculators (early 70s vintage).
They didn’t do math in binary at first! It was all decimal. (No need to convert bases at all.) The shift and add method was like the grade school method except to do one step like multiply 836554 by 7 you did six additions of 836554 to itself. Slow, but not a lot of transistors. It’s mind boggling how “badly” you can get by doing arithmetic if you’re willing to sacrifice speed for circuit simplicity.
Ah , yeah like 4004 and the 6502 and the 8088 and others had the BCD maths,
one digit in 4 bits (or two in a byte for the 8 bit cpu, 4 in a word for the 16 bit… ) …
so they did it much like you do on paper…
The reason to move away from that ? Sure its one instruction, but it took 24 slow cycles to do all this ! Besides which , you already have a proper binary add for calculating addresses (relative and indexed addressing modes…)
The calculator on the 4004 could be programmed in purely absolute addressing modes, but thats only working for simple ,very short, very little data , algorithms… its not feasible for larger algorithms…
Interesting. I didn’t know any of this, which goes to show how specialized the field’s become; hardware people and software people don’t interact much, and while each is aware of something of the other group’s doings, a lot of the details get lost.
And back in the days of mainframe computers that were run by professional computer operators, neither hardware nor software people bothered to interact much with the people who actually ran those machines.
We had reel-to-reel magnetic tape in various bit-densities. The programs we ran didn’t know how to set the correct density for the tapes they wrote or read. But the tape drives had buttons on the control panel to select which density to use. It was the duty of the operator to select the correct density every time we hung a tape.
Comes now, one day, a new model of tape drives that didn’t have density selection buttons on the control panel. Oops. Now, we had to run a separate program on one of the other machines (with the older tape drives) to upload files from certain non-standard-density tapes and copy them over to the machine where we needed to use the files.
ETA: One day, I took the executable image file of that program (for uploading files from a tape onto disk), which I didn’t have any source code for, and poked through an octal dump of the code. I found the right relevant place, and made a binary patch to set the tape density automatically.