I need a few computers to run some calculations. Let’s say I need between 24 and 48 processor cores, current-generation Xeons with decent cache sizes (1-2 MB per core).
What is the most cost-effective way to get these? I do not need high availability, and I do not need massive amounts of storage - so paying for a big, “enterprise” Dell or the like, with 24 drive bays, etc. seems like a waste of money and space.
Dell, HP and IBM do make small, “blade” servers, with a CPU and small disk, but these require a blade chassis and do not appear cost-effective for just a few blades.
How long do you need them for? Is this a long term constant need to do calculations, or a one off project? Blade server are not cost effective, they are made for low machine room footprint and easy maintenance (hot swap etc). Simple white box computers usually beat anything for pure compute for the dollar, but are not always the right answer.
My first cut answer would be to look at Amazon and avoid buying anything. But one needs to know more about your particular compute needs.
Have you looked at Amazon EC2 yet? You can get a crapton of cores for a couple hundred bucks a month, and if you don’t really need much storage or any HA, it’ll probably be even cheaper.
Be careful. If your dataset doesn’t fit inside the core’s cache you may find that your performance does not scale linearly and you would be better off with separate machines. The more cores, the higher the conflict and the lower the performance. In which case you might find something like this of interest.
As is alluded to above, without some clear understanding of the nature of the computation it is impossible to suggest a good cost effective solution. If it were simple embarrassingly parallel task farming, low end whitebox is usually the sweet spot. But you need to understand the amount of locality of data, and the communication patterns to get a grip on what might be the right answer for more interesting computations. Choice of language, whether you even have source code, memory foot print, all comes into play. It is fits into one of the well understood HPC paradigms there are often good off the shelf solutions that it can be slotted into. Signals, linear algebra, these can make good use of GPU systems. Others can’t even begin to gain traction with them.
When I first embarked on something similar, I hadn’t worked too much with parallel systems other than some basic few core or multi-threaded stuff, and I naively thought I could just throw some processors at the problem. I looked at the following solutions and none of them fit my problem well enough:
Multiple PC’s - Too much data needed to be transferred, not enough gain to make it worth it (with the money I could spend)
PS3/Cell - Low level details of SPU prevented the kind of parallelism I needed (depending on approach it’s either sequential reads with random writes or the opposite, but in all cases, the next set of reads or writes could include any address, can’t just process blocks of memory)
NVIDIA GPU - Worse than the Cell, great performance for parallel single pass calculations, but again random writes that impact other calcs got in the way.
Tilera (TILE64, 64 core CPU)) - I talked to one of their tech guys about the nature of my app and how it would fit in with their processor and he said it wasn’t a great fit, it wasn’t going to give me an order of magnitude boost over a multi-core intel like I wanted.
If you are wanting to keep with latest generation Xeons, those top out at two sockets per board. The latest and greatest Xeons have ten cores each, so that will get you to twenty cores per node/server.
If you can tolerate one interconnect in the mix, then you can go with two Intel S2600G series and just do a dual quad-rate InfiniBand expansion modules so you can skip the nutty expensive InfiniBand routers.
According to Intel that board only supports the 8-core xenons, but in total you would have 36 processing cores that you could load up with up to 1.5 terabytes of RAM.
That should solve any locality/speed issues. Dual Quad-pumped InfiniBand is real good at that, especially in point-to-point.
That should give you a pretty insane amount of x86 computing power on the cheap.
When I say on the cheap, I mean under about $15,000, as long as you did not max the RAM. That much RAM will be a significant part of your expense.
All of that could conceivably run next to your desk and do it off of regular 110 outlet power, so you can skip the cost of keeping it in a server room you may or may not have.