Well, I for one would certainly like to hear about it because, so far, I haven’t heard anything about a magic transistor laying technology and I would say I follow the industry quite closely. Your not talking about SOI are you?
Shalmanese, your statement reads like you are mocking my post, I don’t think that helps a debate.
I was referring to dynamic logic circuits. If you do some research you will find the following:
Most circuits today are static circuits which use a clock “tick” to “move” along processing to the next state. The clock is NOT connected to every gate, which reduces complexity, but slows processing due to the requirement that more transistors are required to calculate most logical operations.
Dynamic circuits, on the other hand, use a clock connection to almost every gate, one phase of the clock drives input and the opposite phase of the clock drives output.
Pro’s and con’s:
Static circuits
Easier to design, test and manufacture
Requires significantly more transistors
Slower than dynamic
Higher power consumption which leads to slower clock rates
Dynamic circuits
Harder to design, test and manufacture
Requires significantly fewer transistors
2 to 3 times faster than static circuits
Lower power consumption which allows for faster clock rates
History
Prior to the 1980’s, most circuits were dynamic. Static circuits became the norm for economic reasons (easier to design and manufacture) and dynamic circuits are probably only re-appearing because the effort is now worth the difficulty in design and manuafacture.
Sorry, it wasn’t meant to be hostile. I genuinely am interested. Could you provide some links?
I googled and found some stuff.
Here is a decent general description from some other company using this technique:
http://66.102.7.104/search?q=cache:ZYakq-CseJYJ:www.intrinsity.com/technology/docs/public/Intrinsity_Fast14_Technology.pdf+advantages+of+dynamic+logic+circuit&hl=en
Here is an article talking about IBM using dynamic logic in the Cell:
http://www.realworldtech.com/page.cfm?ArticleID=RWT021005084318&p=3
So if they are using it in the cell, and both the cell and the xbox360 are using a 976 variant, it seems possible they are using it in the xbox proc.
Yes, the only references I found was that realworldtech article and a couple of whitepapers at IBM. From what I can make of it, it seems like a relatively minor optimisation to a specific area of the chip to meet a specific design goal instead of some overarching major overhaul. I doubt it would have a significant effect on performance otherwise it would of been mentioned more.
Well, when I google “dynamic logic circuit” I get about 500,000 hits.
Mostly from the following sources:
IBM
HP
Other companies making fast specialized circuits
Research papers from people at universities like MIT, Columbia, Washington, etc.
2 to 3 times faster is minor?
reduction of up to 50% of transistors required for a circuit is minor?
It can be used however they feel like it.
If they use it in the areas that are the bottlenecks, that is significant is it not?
If you would like to have an accurate understanding of the performance impact of dynamic logic circuits, I would recommend you read. Google has tons of information.
A small random sampling from those 500,000 hits:
Prof. Sechen from MIT
“In 1997 Prof. Sechen’s research work shifted focus to ultra high-speed digital circuit design, as well as low power digital circuit design. In 1996, together with student Gin Yee, he developed a new single-rail dynamic logic family called clock-delayed domino, which was used extensively in the development of the first gigahertz processor, as reported by IBM”
“Prof. Sechen and students Tyler Thorp and Gin Yee recently reported on a new RTL synthesis flow based on the use of alternating dynamic and static gates (DS domino), which has delivered circuits 60% faster than those optimized using static CMOS.”
“Together with graduate student Gregg Hoyer, he demonstrated a new locally clocked dynamic logic family that delivered a 715 MHz multiplier in a .0 micron MOSIS process. This is the fastest multiplier ever reported”
SOCcentral.com
“Dynamic CMOS logic circuits are widely employed in high performance VLSI chips in pursuing very high system performance”
IBM’s Usage as of 2000 and Plans for More
The following article, from the year 2000 talks in detail about IBM’s plans to use circuit design (including dynamic circuits) as a way to increase performance. At the time of the article they had already implemented it in the Northstar Power Processor used in servers.
http://www.findarticles.com/p/articles/mi_qa3751/is_200011/ai_n8914764/pg_3
And this paragraph details the extent to which they used it at that time:
“In the Northstar processor, custom circuits comprise approximately 600K transistors of a total of 2M transistors used to implement control logic and dataflow (the remainder of the 12.5M transistors were in instruction and data caches). About 170K transistors are found in dynamic circuits and the balance in custom static circuits. The comparatively small number of transistors used in dynamic circuits reflects the fact that fewer transistors are needed to implement a function in dynamic circuits, because the complementary p-FET network required for static circuits is replaced by a single precharge device in dynamic circuits. The use of custom dynamic circuits in the Northstar processor provided a significant performance advantage over CMOS static circuits.”
So this is where we are:
- IBM’s processors currently outperform all others from AMD/Intel/SUN
- In addition to outperforming them, they do it at a slower clock speed
- IBM is aggresively using circuit design to boost performance, including using dynamic logic circuits to which there is no debate in the EE community that they are 2 to 3 times faster than static circuits.
- IBM is 4 years ahead of AMD/Intel in multi-core chip design (Power4 was dual core in 2001)
- IBM is customizing a processor for MS to be optimized for games
Again, I don’t see any data that indicates the new Xbox proc, will perform at a 1:1 (on a per core basis) compared to AMD/Intel at the same clock speed.
There is simply no data to support that position.
When I research AMD I read good things about their multi-core design, memory controller, etc. Sounds like they are doing some good stuff, but even that is only placing them in front of Intel, they still have a ways to go to catch IBM.
When I research Intel I wonder what is going on at that company. Probably the same condition that IBM had prior to the 90’s, they were making lots of money and didn’t have to try very hard.
Final point:
All processors are not the same. You seemed to imply this when you said “it fundamentally all comes down to transistors.” For the same number of transistors and the same general functions, you could get wildly different performance from different companies. Just look at Intel’s HT. Nobody is impressed. AMD doesn’t have any kind of HT like tech and they outperform Intel. IBM’s SMT not only outperforms HT, they aren’t even in the same ballpark.
Circuit design makes a difference.
Yes, obviously dynamic circuits are a big thing, I’ve even made a couple of simple ones before. What I’m looking for is the application of dynamic circuits as specifically relating to the Cell/Power5 Processor. As I said, the only things I could find are the realworldtech and some IBM whitepapers.
cite? As specifically relating to the Cell/Power5 Processor? I would think if IBM had a technology that made their processors 3x faster, it would be plastered absolutely everywhere and every techie would have an instant hardon for it. The very dearth of information about it indicates that it’s nothing to get excited about.
From everything I’ve read and the cites you just gave me, dynamic circuits, as used in the Power 5 was about optimising critical paths. All this means is that you can increase the clockspeed that the processor can hit. Fortunately, one of the very few things we know about the Xbox is what the clockspeed is.
No, but all processors are very damn similar. The performance of a processor = clockspeed * IPC. To increase the performance of a processor, you either need to increase clockspeed or IPC, theres no other way around it. The IPC, as a rough estimate, is determined by a combination of the depth of your pipeline and how good your branch predictor is. If the G5 is indicative of the PPC architecture, then all the (non-baised) benchmarks I’ve seen put the G5 at roughly the same IPC as the Athlon. Obviously, IPC depends on application usage and perhaps a games optimised processor could do better but I doubt by very much.
Now, dynamic circuits increase clockspeed, which means it has absolutely no effect on IPC. This means that, assuming the IPC of both the Xbox and the Athlon are rougly equal, a Xbox processor will perform about the same as a Athlon of the same clockspeed (real clockspeed, not PR).
All the most obvious ways of increasing IPC need a lot more transistors, more cache, better branch prediction etc. And each extra transistor adds to the cost of a chip. This is what I meant when I said that IBM has no magic transistor laying technology.
Again, I would love to see a cite that proves me wrong. I’m hardly more than an interested amatuer with hardware. I’ll check with some EE friends with mine to see if they know any more info.
It is far more complex than what you have written. There are trade-offs everywhere you turn, size of pipeline, number of lookup tables for branch prediction, size of tables for branch prediction, size of each cache, location of cache, number of registers, SIMD or not, how many ALU’s, multi-core, multi-thread, etc. etc. etc. Each of these will impact the real world performance for a specific problem space (like games).
There is no static IPC number for a processor, only theoretical maximums and averages.
Each company takes different approaches to solving all of these problems and many many more resulting in a unique processor. They do not perform the same.
If IBM is using dynamic logic circuits to reduce the number of transistors, it leaves them more transistors available for other functions, like better branch prediction, or whatever else they feel will improve performance.
P.S. The G5 is based on the Power4 architecture, not the Power5, that is not a very good comparison.
Yes, obviously, but all those things, branch prediction, lookup tables, cache etc. are probably going to have to be taken out of the cell processors going into xbox/PS3’s to get it anywhere near the $300 price point. I recall a source saying that each P4-northwood processor took $53 unit costs to make, not including the billions spent on the fab. Obviously, shrinking dies will make this slightly cheaper but theres still a fundamental limit to home many transistors you can cram onto a chip for $300.
PS: Theres a PS3 article at Anandtech and, personally, I think it looks much more impressive than the Xbox hardware wise.
Looks like we will have to wait and see what happens.
Ok, forget all the details, speculation, etc., here are current specs:
These are for CPU only, GPU not included.
Cell, 3ghz 218gflops
Xbox2, 3ghz 115gflops
Dual Core AMD, 3ghz 12gflops
Gflops have traditionally, always been a crappy measure of real world performance.
But I think the more important figure is PS2 GPU: 1800 Gflop. Gflops are a rather useless measure of performance because the GPU so massively dominates any CPU that you would be crazy to run any streaming, math intensive code on the CPU. I think both MS and Sony know this and the Gflop thing is pure marketing hype.
In reality, the CPU is probably going to be called on to run very branchy code, AI and the like which depend more on your branch prediction and pipeline than on raw flops. I would dearly love to see some branchy code benchmarks but I doubt Sony/MS would like the release such in depth info.
The Cell processor was designed with the opposite in mind.
They have given up trying to increase single threaded performance using branch prediction etc. for parallel operations, SIMD (Single instruction multiple data, this means 1 instruction operates on multiple pieces of data at the same time, gaining significant throughput).
Read all the stuff out there, they are relying on the compiler to order the code efficiently.
http://www.research.ibm.com/cell/
"The SPU is an in-order dual-issue statically scheduled architecture. Two SIMD instructions can be issued per cycle: one compute instruction and one memory operation. The SPU branch architecture does not include dynamic branch prediction, but instead relies on compiler-generated branch prediction using “prepare-to-branch” instructions to redirect instruction prefetch to branch targets.
The SPU was designed with a compiled code focus from the beginning, and early availability of SIMD-optimized compilers allowed development of high-performance graphics and media libraries for the Broadband Architecture entirely in the C programming language. "
Interesting, it sounds a lot like the EPIC architecture found in the itanium.
It looks like my original intuitions were correct. From Ars Technica:
(emphasis mine)
Isn’t the cell processor supposed to be strong in those particular 2 areas (AI and physics) ?
If that’s the case, it should be interesting to see if these advantages are leveraged.
As far as AI goes, the vast increases in cpu power over the last 20 or 30 years have not caused anything close to a proportional improvement in AI quality so I wouldn’t hold my breath. Realistic physics, on the other hand, are making great progress.
You don’t think AI has improved since DOOM? Rodney Brooks of the MIT Robotics lab and one of the most influential AI researchers of this decade has stated that more than anything else, more than the decades of research and clever algorithms humans have put in, the some 1000x fold increase in computer power has done more for AI than anything else could have.
Academic AI research has almost nothing to do with game AI. In fact, it’s misleading to even refer to what game characters do as AI.
Most game AI is dirt simple because it plays better that way. It’s not that hard to program a squad of AI marines to take cover, flank the player, and ambush him from behind. But where’s the fun in that? It’s much more entertaining to have them stay in front of the player and move in and out of cover in a manner that looks realistic but is tactically stupid.
There **are ** bottlenecks in AI, but they tend to involve frequently called low-level routines: line of sight checks, for example, or path-finding, or collision detection. They don’t tend to involve the high-level strategic code.
If there are problems with the branching on the 360 it may affect the **number ** of AI entities that can be instantiated at any given time. But it’s unlikely to affect the **intelligence ** of each particular one.
Well, Perfect Dark Zero will support at least 49 simultaneous multi-player bots (of which I assume will have fairly sophisticated AI routines, if PD1 was any indicator), while Kameo will support between 4,000-9,000 characters on screen at once (though because of Kameo’s design, those characters surely won’t be that intelligent).
I just wanted to say that the Xbox 360 is no more powerful than my (Top of the line) PC.