Besides higher clock speeds, there’s other improvements that make newer chips faster than older chips. For instance, the P4 has, I believe, a 20 stage pipeline, whereas the PII - if I recall correctly - had only a 6 stage pipeline.
The classic analogy for pipelining is doing laundry. One could do a load of wash, then dry it, then fold it. That’s the standard sequential, non-pipelined way of doing things. Or, one could do a load of wash, move it to the dryer, and put another load in the washer. When the 1st load is dry, it’s folded, the second load is put in the dryer, and another load is put in the washer and so on. Each individual load takes the same amount of time, but the total time for doing three loads is faster because each part of the “pipeline” is being used at all times.
It’s worth noting that Moore’s “Law” - which is a trend and not a law - states that the number of transistors on a chip doubles approximately every year, not that the processing power or speed does. This is important because a clock speed that is twice as fast does not mean that twice the processing gets done. The increase in the number of transistors allows more “stuff” (good technical term, right?) to be done in a single clock cycle.
For instance, using the instruction WillGolfForFood brings up, an IMUL instruction may be faster on a particular chip, but if it’s not used there’s no speedup. Also, more die space allows the addition of caches - the difference between the original Celerons and the PIIs was in the L1 cache. The Celerons were the same exact processor without the cache. The cache is really important for processing speed - I can’t find a reference, but I remember it as being as much as a 20% speedup.
By the way, an MMX processor had additional, specialized hardware designed specifically for executing multimedia instructions faster than non-specialized circuitry. On a separate topic, the Pentium Pros were supposedly much faster than the then new PIIs - I’ll go try and find a reference and the reasons and post it later…
Kramer