How much for a domestic supercomputer?

As seen here:

The guy has:

Ballpark, how much money are we talking about here?

Is there any commercial software that could take advantage of it? Are there any games that could?

Not really. That could change in the future as concurrent programming languages become more popular, but for now the major market for machines like that consist of scientists and their lackeys who need to do massive number crunching.

I have no idea what a machine like that would run you, but people faced with similar problems are using networked Playstations as their supercomputers, so that should give you a ballpark.

Quad core is not exotic at all now. say $1000 for the base computer it is only so expensive because you need an exotic mother board for the 3 GPUs. $100 to get the extra memory.

Video cards silly expensive.
FX 5800 seems to go for about $2500 or so.
The Nividias are about $1500 each.

Apple sells eight-core machines from $3300.
Starting with OS X .6, applications will be able to use all cores and GPUs for computation.

Bad link. (Your final bracket is outside the url)

Rats! I should have checked it. Try again. Lots that was fine and preferable in KDE3 is a heap of **** in KDE4!

The system mentioned in that article sounds like it would be great for running distributed computing applications like Folding@Home Lots of people do do this, and they do it in teams, so it is kind of a recreational thing to do, but it is pretty much a geek-only activity. In the strictest term, I’m not sure it’s really a domestic application, more of a specialist or hobbyist thing to do.

But it looks like this isn’t aimed at domestic use. I think that the target market is labs and businesses that need more computing power than an average desktop, but lack the resources for a supercomputer, mainframe or midrange system.

How is this KDE’s fault, out of curiosity?

(Second link has the same problem. Here: Beowulf cluster - Wikipedia)

This isn’t exactly “commercial” software in the traditional sense, but I work on medical imaging software that is heavily multithreaded and otherwise parallelized, and for some of our more CPU-intensive processes we can definitely use all the horsepower you can get. I personally have an 8-core machine with 12GB RAM as my primary development machine, and the speed and power definitely make a big difference, both for running and testing this type of software, as well as just normal compiling and code-editing.

The machine described is certainly not a “supercomputer” by any means - multicore machines aren’t exotic at all any more, and quite likely most workstations will have 4-cores within a few years.

The “supercomputerness” of the machine in the OP is not related to the CPU cores but rather the GPU’s. The Tesla C1060 has 240 streaming processor cores and if the problem you have fits the GPU model of computing, then you can get a lot done pretty fast.

My machine has a GTX280 which has 240 streaming processors (I think it has less mem than the tesla’s) and for some of the stuff I’ve converted it works great, and for some other pieces of code the problem doesn’t match the GPU computing model well enough to get more than just modest gains.

When you fire off a program on one of these (the GPU that is) you typically create a boatload of threads (16,000 or 32,000) so the GPU can be performing work on 240 of them while bringing in data from memory for the others that are waiting. If you have millions of particles that need physics calculations, then this works great. If, on the other hand, you have neural networks with all kinds of branchy code and completely random memory access, then it’s going to be tough to get any gains.

Computers just dont magically scale up. A lot of supercomputers have specialized applications like weather or nuclear modeling. They are good at doing that type of math only and would perform horribly at tasks like playing games or running your office suite. I knew a guy with an old cray and it was slow at everyday computing, but it was great at whatever modeling it was built for.

The computer you mention is really just a collection of high-end video cards. Video cards run typical CPU tasks very slowly. They are specialized to do the math that generates 3D graphics and it does it with a great deal of parallelism. If you ran an OS on that, you would be wasting a lot of cycles and heat for less performance than a pentium II from 10 years ago.

Yes, we’re seeing applications written for video cards like folding@home and what not, but they also take advantage of the parallelism here. These things have limited applications. If they didnt, then we would get rid of the CPU altogether.

About all that is really going to take advantage of that rig are 3d renderers, such as Lightwave 3d, 3DS Max, Maya, etc, or if you are doing lots of video conversions.

Almost no games could take advantage of that software, nor most other commercial software.

Like CutterJohn said, 3d software could make good use of the parallel processing.

I have plans to start building my own linux-based render farm out of old machines soon, for rendering animations from Blender, povRay and DazStudio.

So 3D software should make use of it always? Only the versions specifically made for parallel processing? Will all graphics intensive games know to use all the video cards? (not that I am planning on buying one, this is purely academic)

ETA: Will Matlab take advantage of it? (of all programs, you would expect this to do it). Will MS Excel?

The Tesla’s don’t have video output ports like normal GPU’s, they are designed for number crunching.

If you wanted similar power for video, use a GTX280 or GTX295 (which, if I remember right, is 2 GTX280 connected with SLI).

No, it’s not faster for the design phase, for instance. Then, a single fast machine with a decent GPU is what you need. It’s *specifically *usefull for rendering the result of your design, which usually consists of multiple frames each differing only slightly, or else just smaller pieces of a larger single frame. All that is defined by a bunch of plain text files and interpreted by the rendering engine. For that kind of grind work, parallel processing really is faster. I’ve seen, last year, how a 6-member cluster of circa-2001- vintage Pentium IIIs running Blender seriously kicked the ass of a high-end quad-core machine with expensive GPU, when it came to animating a short film - yet I couldn’t give a Pentium III away to a street person today, I think.

Not really - the actual batch processing is handled by the queue manager. All that’s required is a command-line interface for the manager to send batch jobs to.

Out of curiosity, are you talking about the standard threads provided by the OS or is this some kind of special “GPU thread” that you use some GPU-specific libraries to create and manage? And if it’s the former, what platform are you talking about? I have no experience creating massive numbers of threads like that, but my impression has always been that most native thread implementations start to fall down way before you could create 16000 threads.

It’s on the GPU. With CUDA and NVIDIA, when you launch a “kernel” (GPU program) you tell it how many threads to launch. To get the most out of it, you need to launch enough threads to make sure that each set of processors (they are in groups of 16 or 32) has a thread available to be processed for each streaming processor in the set.

For example, if you only launched 240 threads, then the processors are sitting idle as data is coming in from memory (which takes many many cycles). If you launch 32,000 then there will always be a set of threads not waiting on memory, they get processed as other threads are in varying stages of waiting for memory.

Additionally, if each kernel is not executing in lockstep (due to branches, etc.), more threads allows the GPU to process the ones that are at the same instruction. This is required because within a set of processors (16 or 32 or whatever depending on specific gpu) the same instruction is getting executed, just on different data.