Distributed networking Question

Well ive come to my Annuel Big Computer Upgrade, and I was thinking of something new this year. I finally have the resources to make a REALLY good one.

Hows this: 20 rack mounted Single Board PC’s each with an on board Gb or faster NetCard and one single Top Of the Line PC as the interface. I’m talking distributed networking here.

I figure I can keep adding more PCs indefinatly so I won’t ever need to upgrade again.

The problem is that I know NOTHING about distributing networks. Is there some type of softwhere I could install on all the PCs to make it look like one PC to Windows? OR am I stuck with linux?

Links please.

I forgot to say that this comp is going to be a web server also.

Distributed networking is only a good solution for problems that can be easily broken down into small, similar non-dependent sub-problems, with a minimum of communication between nodes. Other problems, which require large degrees of inter-node communication and/or are dependent on sub-problems being executed in a particular order will benefit from a single-image massively-paralell machine instead of a distributed one.

As for distributed stuff, by far the most common technology used is Beowulf clustering.

Making a giant webserver isn’t really a job for a purely distributed system; rather, you’d want a front-end load-balancer that would distribute requests evenly among the several backend nodes for processing. It’s not particularly useful unless a lot of processing has to be done for each request.

And where can I look into this? As for communication, the boards have a 1 or 2 Gb net card on them.

It should be noted that most off the shelf PC software will not take advantage of multiple processors. Software needs to be specifically written for multiple processors. Unreal Tournament will not run faster on the system you propose than it would any high-end ‘normal’ (single processor) PC.

You also need to determine what your bottlenecks are. Is your problem a processor intensive one? If you’re trying to calculate all the trjectories of particles in a nuclear blast then a massively parallel system is what you want. If your issue is a massive database then your storage subsystem should be looked at (i.e. not much for the processor to do but a lot of data being shovelled about). A gigabit NIC is all well and fine but may be overkill. You have to decide how much data will be entering and leaving the system.

In short, computing is not a one size fits all proposition. Think of it like a vehicle. You can spend $250,000 on a Ferrari or a large truck. Which one is right? Depends on what you want to use it for. The Ferrari will get from Point A to Point B much quicker than the truck but if you want to haul along 10 tons of goods with you the truck will beat the Ferrari. In the end your your application (what you will use it for) will determine the design of your system.

So if linux can run on all the comps as one image or whatever, Unreal won’t run faster?

Nope…one processor will be designated to do all the work. The program needs to tell the operating system how to distribute the work…the OS doesn’t know what makes the most sense on its own. Without direct instructionthe OS can do no more than assigne a single processor to the task at hand. I actually bet you might find it works slower as the distributed computing system comes with its own overhead that’ll suck some system resources.

I thought the OS runs one thread on each proccessor.

BioHazard: True, but the app would need to be multithreaded to take advantage of several processors.

BioHazard: A cluster aren’t cost effective general purpose computing machine. Hell, multiprocessor machines aren’t cost effective machines.

In addition, I don’t know of any Windows clustering software like Beowulf. I’ve seen a few apps which can use a workgroup for say, compilation work from Visual Studio (which must be hell on th network in the first place), but nothing like Beowulf.