I have a project that requires lots of processing power and I have some questions regarding a new PC I would like to purchase or have built.
First, a little background:
Project is custom code
All integer, no floating point
CPU is the bottleneck, not memory
Already written for multiple processors (just need to tweak it a little for multi-core to remove some of the communication overhead)
Because of the nature of the problem I could use as many cores as I could fit in a machine, but money and technology will obviously limit my choices.
I was quoted about 5k for a dual proc quad core xeon server (8 cores), but I want to see if I can do better. It’s a server box with ecc memory, etc. which I don’t really want if I don’t have to.
Questions:
Could I build a box with 4 Q9450’s?
If so, where? I’ve googled for 4 proc motherboards and I don’t see much, other than servers
Is there something about the server motherboards that improve throughput? Or do they just have more sockets.
Is there something different between the xeon’s and the q9450’s that have an impact on SMP performance?
Which companies are reliable for building these types of boxes? I’m not looking for the cheapest, I would rather pay a little more and be sure it’s not going to have problems.
Any other advice regarding building multi-proc w/quad core procs, boxes?
I don’t have any specific answers for you - sorry - but I do know that projects which link together many, many PCs to approach super-computer performance tend not to go for the latest, greatest CPU on the market. They find that using a few more of the slightly lower spec, perhaps older, processors gives better bang for the buck.
I’m not an enterprise guy but IIRC this is something that clusters are good for
It also allows you to use less expensive hardware as well as allowing tasks to proceed even in the face of a machine failure since other nodes of the cluster are still running. The other nice thing is that clusters are scalable. Get more customers and need 40% more data processing, just drop in a couple more machines and away they go.
Exotic hardware solutions are bound to result in tears down the line. If a monster server board cannot be replaced a few years down the line you are screwed. 3-4 clustered quadcore machines on common parts by reputable manufacturers can carry you a long way.
Server boards do often have greater potential throughput
Maybe the answer is for me to purchase multiple boxes, it just seems like I’m paying for all of the components multiple times when I really only want the processors.
My original plan was to use 10 PS3’s but after looking at the SPE instruction set and googling around for examples, I just don’t see how I can convert my slowest routine to vector code. So now I’m back to looking at PC’s and I was assuming I could put a lot of cores in a box now that the quad cores are out.
Out of curiosity, and because you mention integer-only, have you looked into ARM-based system-on-a-board setups? Perhaps something with a PC-104 stack?
While I know of these things, I have to profess ignorance as to their [limits|contraints|capabilities]. I’m just thinking that $5K would buy a whole lot of ARMs; the PC-104 stack would be easily extensible, allowing you to add any peripherals (and only those) you might need or want. There’s also the low power consumption, which is always nice. And I would think that, if you went the cluster route, hooking them together would be similar in complexity.
Thanks, I saw Apple with 8 cores and generally ignored them, but you have me rethinking it. I saw HP with 1 Q9450 for 1500 and figured I could pretty cheaply put more in because the Q9450 is only a few hundred dollars, which is what prompted the thread, but if that can’t happen and I need to spend 5k for xeon server, then Apple is looking good.
I haven’t looked into that at all but it sounds interesting. Because it’s just a personal project and I have limited time, I was even worried about the time it would take to re-write my worst offending routine for the PS3 (I only would have ported the one routine, the PC can do the rest). So I probably wouldn’t go the ARM route if I had to personally build hardware, but maybe someone sells something pre-built, thanks.
I feel your pain about the time. I know there’re a few embedded developers 'round here, so maybe they’d be able to provide an off-the-cuff feasibility prospect (and some leads?).
As far as the Mac goes, if you purchase the “Server” package (I think it was $500), it comes with software to manage distributed computing. This is another thing that I know of but have no experience with. Just another point to ponder…
A Cluster also gives you more redundancy, if you have an 4 inexpensive machine cluster and one goes belly up, you lose speed but you are not shut down awaiting tech support/parts for your 4 socket bad boy. If this thing is going to require extensive processing time job interruption due to a glitch/hardware failure this will be a big concern.
With standard off the shelf consumer boards you will also be running integrated video, lan, etc so the cost redundancy is minimal except for the licences assuming MS products, plenty of free nix server software out there if suitable to the app.
Clusters aren’t necessarily cheaper: you’ve got all the extra hardware for each board, of course, but don’t forget the running costs and the ancilliary equipment - network switches, KVMs, racks, etc. It’s probably beyond your budget but you might look into a half-way house: blades.
I’ve looked at the t1 some, and I’ve gone back and forth between excited and not so excited.
I have seen statements from people that the threads are not executed concurrently and I do’t know if those people work for competitors or not, so that is an open question for me.
I’ve also seen 1 paper that took a compute intensive algorithm and tuned it for Cell, x86, Power and T1. The T1 performed horribly compared to the others. This may be entirely due to floating point, I don’t remember. It’s also possible the researchers are affiliated with IBM, because I’ve seen that in some papers. Again, it’s an open question for me.
So, I like the idea, but I would need to confirm performance first.