Cost-effective servers for computation

Absolute · April 28, 2012, 3:20pm

I need a few computers to run some calculations. Let’s say I need between 24 and 48 processor cores, current-generation Xeons with decent cache sizes (1-2 MB per core).

What is the most cost-effective way to get these? I do not need high availability, and I do not need massive amounts of storage - so paying for a big, “enterprise” Dell or the like, with 24 drive bays, etc. seems like a waste of money and space.

Dell, HP and IBM do make small, “blade” servers, with a CPU and small disk, but these require a blade chassis and do not appear cost-effective for just a few blades.

Any ideas?

Francis_Vaughan · April 28, 2012, 3:35pm

How long do you need them for? Is this a long term constant need to do calculations, or a one off project? Blade server are not cost effective, they are made for low machine room footprint and easy maintenance (hot swap etc). Simple white box computers usually beat anything for pure compute for the dollar, but are not always the right answer.

My first cut answer would be to look at Amazon and avoid buying anything. But one needs to know more about your particular compute needs.

coremelt · April 28, 2012, 3:58pm

a 16 core Dual Xeon E5 system can be built yourself for around $4500 not counting the screen.

I just recently researched this using a ASUS Z9PE-D8-WS Dual Socket 2011 motherboard

TriPolar · April 28, 2012, 4:42pm

If you don’t need high availability why do you need so many processors?

black_rabbit · April 28, 2012, 5:47pm

Have you looked at Amazon EC2 yet? You can get a crapton of cores for a couple hundred bucks a month, and if you don’t really need much storage or any HA, it’ll probably be even cheaper.

Absolute · April 29, 2012, 4:20am

Lots of math to chew through.

I cannot use cloud services, these need to be on-site.

Quartz · April 29, 2012, 7:46am

Be careful. If your dataset doesn’t fit inside the core’s cache you may find that your performance does not scale linearly and you would be better off with separate machines. The more cores, the higher the conflict and the lower the performance. In which case you might find something like this of interest.

astro · April 29, 2012, 8:09am

Here’s one solution
SuperMicro showcases Xeon E5 GPU supercomputer

The catchily named SuperServer SYS-7047GR-TRF is a 4U system based on Intel’s Xeon E5-2600, mounted on SuperMicro’s C602-based X9DRG-QF motherboard. This supports four PCI-E 3.0 x16 slots, each supporting the chipset’s full QPI bandwidth of 8GT/sec – effectively, twice the speed of a PCI-E 2.0 x16 slot.

SuperMicro has filled each of these slots with one of Nvidia’s 512-core Tesla M2090 cards. For single precision calculations, each card is rated at 1.3 teraflops, delivering an aggregate of 5.2 teraflops of computing power (2.6 teraflops for double precision).

On paper this may not sound exceptional: just one of AMD’s domestic Radeon HD 7970 cards is good for 3.8 single-precision teraflops. But while AMD’s card comes with 3GB of onboard memory, Nvidia’s Tesla cards provide 6GB each, providing far more breathing space for computations.

What’s more, these cards are based on Nvidia’s Fermi architecture, meaning they can be programmed in more or less native C++ (with CUDA extensions), helping them slot neatly into a programming workflow.

Francis_Vaughan · April 29, 2012, 9:07am

As is alluded to above, without some clear understanding of the nature of the computation it is impossible to suggest a good cost effective solution. If it were simple embarrassingly parallel task farming, low end whitebox is usually the sweet spot. But you need to understand the amount of locality of data, and the communication patterns to get a grip on what might be the right answer for more interesting computations. Choice of language, whether you even have source code, memory foot print, all comes into play. It is fits into one of the well understood HPC paradigms there are often good off the shelf solutions that it can be slotted into. Signals, linear algebra, these can make good use of GPU systems. Others can’t even begin to gain traction with them.

RaftPeople · April 29, 2012, 9:16pm

Listen to Francis, he speaks the truth.

When I first embarked on something similar, I hadn’t worked too much with parallel systems other than some basic few core or multi-threaded stuff, and I naively thought I could just throw some processors at the problem. I looked at the following solutions and none of them fit my problem well enough:

Multiple PC’s - Too much data needed to be transferred, not enough gain to make it worth it (with the money I could spend)

PS3/Cell - Low level details of SPU prevented the kind of parallelism I needed (depending on approach it’s either sequential reads with random writes or the opposite, but in all cases, the next set of reads or writes could include any address, can’t just process blocks of memory)

NVIDIA GPU - Worse than the Cell, great performance for parallel single pass calculations, but again random writes that impact other calcs got in the way.

Tilera (TILE64, 64 core CPU)) - I talked to one of their tech guys about the nature of my app and how it would fit in with their processor and he said it wasn’t a great fit, it wasn’t going to give me an order of magnitude boost over a multi-core intel like I wanted.

GasPipe · April 29, 2012, 10:44pm

Tyan also makes a GPU computer here: http://www.tyan.com/solutions/gpu_platforms.aspx
Depending on what your precise compute needs are, that might fit the bill.

If you are wanting to keep with latest generation Xeons, those top out at two sockets per board. The latest and greatest Xeons have ten cores each, so that will get you to twenty cores per node/server.
If you can tolerate one interconnect in the mix, then you can go with two Intel S2600G series and just do a dual quad-rate InfiniBand expansion modules so you can skip the nutty expensive InfiniBand routers.
According to Intel that board only supports the 8-core xenons, but in total you would have 36 processing cores that you could load up with up to 1.5 terabytes of RAM.
That should solve any locality/speed issues. Dual Quad-pumped InfiniBand is real good at that, especially in point-to-point.

That should give you a pretty insane amount of x86 computing power on the cheap.
When I say on the cheap, I mean under about $15,000, as long as you did not max the RAM. That much RAM will be a significant part of your expense.

All of that could conceivably run next to your desk and do it off of regular 110 outlet power, so you can skip the cost of keeping it in a server room you may or may not have.

Topic		Replies	Views
Need Powerful PC - Have Some Questions Factual Questions	22	1377	May 16, 2008
Highest computational density available commercially (GFLOPs/cc)? Factual Questions	34	4176	December 19, 2012
Single-CPU Supercomputers? Factual Questions	12	1341	November 25, 2005
Can you spec me a super-fast number-crunching computer? Factual Questions	22	1797	May 30, 2003
Dual core versus dual processor motherboards Factual Questions	17	1546	October 19, 2006

Cost-effective servers for computation

Related topics