Computer processors... clock cycles?

Okay, so the megahertz (or gigahertz) of a CPU tells how many cycles it has per second (1 megahertz is a million cycles per, yes?). Okay, I have that down.

So, what, exactly, is happening during each of those clock cycles? How come a 1600 megahertz Athlon XP processor can outperform a 2000 megahertz P4?

It’s an extremely complex subject, but I’ll give it an extremely short overview.

Originally, up until pentiums or so, CPUs could only do one operation per cycle - one number added, multiplied, etc.

After that, ‘superscalar’ chips were developed - that is, chips that could do more than one operation per second. Since then, we’ve been both ramping up clock speed and the number and types of operations that can be done per clock.

As a simple example, if you have a 100 mhz CPU that can do one add per cycle, and a 100 mhz CPU that does 2 adds per cycle (2 “pipelines”), the latter will double the performance of the first, even though they’re the same clockspeed.

Added in the mix are special instruction sets made for a series of processors that can assist the CPU in performing multiple calculations in one cycle, such as SSE, 3dnow, etc.

If you want a more detailed view of precisely what’s going on under the hood, you’ll have to do a search on “processor architecture” or something similar.

A single instruction takes several clock cycles to execute. A simple ADD instruction, for example, may need to retrieve data from memory then do the calculation before finaly writing the result back to memory. Different instructions will take more or fewer clock cycles to execute depending on their complexity. The same instruction can also be implemented in different ways by the chip designer, resulting in a different number of cycles needed to complete it. Finaly, modern processors use a ‘pipeline’ architecture whereby a several instructions can overlap as each one is at a different stage of it’s process.

The overall speed of a processor will depend on the average number of cycles required per instruction as well as the clock rate. It is part of the skill of the designer to optimise the instruction set so that the most popular instructions will take the fewest number of cycles.

It also depends on how much work is done by each instruction. Multiplication may be a single instruction on one CPU, while another CPU may lack the “multiply” instruction and need to do it as a series of additions.

As you probably figured out by now, it’s meaningless to compare clock speed of different CPU designs. There’s no guarantee that a 800MHz CPU from one company is faster than a 500MHz CPU made by someone else.

Presumably the Athlon and P4 mentioned in the OP have the same instruction set, otherwise they could not run the same programs.

You’re right, I didn’t read the OP carefully and didn’t realize it asked about Intel vs. AMD specifically. Though I think they each have their own extended “multimedia” instructions, which might make a difference in games and video playback.

Well, they can accept the same instructions and make the same output, thereby running the same programs, but internally, one can do it differently than the other, increasing efficiency.

The P4 was designed on a radical design that could allow for ramping up the clockspeed at the expense of a very low IPC (instruction per cycle). The upshot is a P4 cannot execute the same number of instructions as an Athlon (or P3) or the same speed in the same amount of time.

Er, make that “radical approach”

People. Please. Use the abbreviations correctly.
MHz=Mega-Hertz=one million cycles per second.
mHz=milli-Hertz=one cycle per one thousand seconds.
GHz=Giga-Hertz=one billion cycles per second.
mhz=ghz=Garbage.

To make things even more complicated, modern processors “pre-fetch” both instructions and data that they expect will be needed for upcoming operations. These instructions and data are stored in CPU cache which can be accessed much more quickly than standard meemory. (This might be realted to the pipelin architecture that ticker mentioned.)

Pre-fetching can reduce idle time when the processor has to wait for data to be retrieved. But if the predictions are wrong, then the pre-fetching was a waste of time.

Here’s an interesting article from How Stuff Works.