Why Couldn't Someone Create an Online "Distributed" Supercomputer?

SETI, pioneered this concept with their “help SETI at home” campaign a few years ago. Couldn’t someone form an NPO or for profit coorporation that in effect created a “supercomputer” by distributing the workload over thousands of home systems? The individual computer owners could be awarded “shares” of the coorporation (or simply paid) based upon their contribution to the network. Here are some statistics that I copied from a Seti website: http://setiathome.ssl.berkeley.edu/totals.html
Total Last 24 Hours
Users 4979107 1087
Results received 1362946153 1269186
Total CPU time 1909887.762 years 1049.104 years
Floating Point
Operations 4.899675e+21 4.949825e+18
(57.29 TeraFLOPs/sec)
Average CPU time
per work unit 12 hr 16 min 31.1 sec 7 hr 14 min 27.5 sec

How do these stats compare with a good run of the mill CRAY X1 or “Red Storm” supercomputer?

If they do compare (or if they are good enough even if they don’t) then such a system could “compete” for industry, and governmental (paying) projects.

Sorry, I had two, nice columns of data one for Total, and then one for last 24 hours. However, when I posted somehow the system crammed everything together.

Google “grid computing” and you should find some good stuff. Scientific American had an article about it maybe a year ago, but I don’t think they keep free online archives.

http://www.gridcomputing.com/gridfaq.html is one site…

Wow, I just thought of this yesterday as well. Like why don’t we all have terminals that connect to one big machine, since I rarely use a high % of my CPU (and I’m on the computer a lot). Probably not feasable yet because people don’t want their private information exposed to the world - maybe one day when encryption is perfect we’ll see this.

There are other efforts only the smae lines as SETI currently in progress. You can use your computer to research cancer drugs, look for large prime numbers, and all sorts of other cool stuff. The list of the fastest supercomputers can be found at Top 500. As you can see, SETI beats out even the fastest ones, as measured in FLOPS. While linking together tremendous numbers of personal computers is one active area of research for getting even faster performance, there are some barriers:

1.) Not every computing task can be divided up nicely. SETI at Home works by giving each individual user a chunk of data that they work on filtering, processing, and analyzing. For other tasks, however, computation on one chunk of data might require results from some other chunk of data, so you can’t divide them up nicely.

2.) Privacy. If the problem is something where security is a concern, sending data over the internet might be too dangerous. Some companies are quite paranoid about this. Also, they wouldn’t want people from rival companies to masquerade as ordinary computer users in order to steal their data.

3.) Portability. People have different processors, different operating systems, differnt configurations. It’s tough to write programs that everybody can use.

4.) Reliability. Networks can go down, and so forth.

There are tons of distributed computing projects out there (see http://www.distributed.net/ for example) but you have to remember that not all problems can be solved by parallel computation. For example, a problem involving recursion would be inefficient in a parallel system, since each step depends on what happens in the previous step. For those types of problems, you just need a single really really big computer.

The comprehensive list of ongoing distributed computing projects. Some are non-profit, for-profit; there’s life sciences, financial, mathematical projects… It’s not all “searching for little green men.”

There are probably logistical and legal reasons why a company can’t pay or award its volunteers. Government-type projects could involve highly classified information that the government would not want sent out to thousands of people.

Fundamentally, the reason why grid computing just hasn’t taken off is because bandwidth continues to be much, much more expensive than CPU time. The range of problems is limited and any system you set up to pay people to grid compute usually works out to be more expensive than just buying some off-the-shelf supercomputers and using that.

What apparently does have potential is the massive clusters of computers that remain unused from 5pm - 9am all through the office blocks of America. They are all stuck on fast LAN’s with access to hefty pipes and are generally fairly homogenous so if grid computing does come about, it will most likely start there rather than on your home PC.

I have the following obsevations in reply:

  1. The observation about the unused LAN processing power is profound and represents a huge opportunity for exploitation (in a positive sense).

  2. Perhaps the “chunking” problem could be addressed by a “grid” which made a distinction between the high bandwidth and dial up participants. Those with cable, T-1 ect would participate more heavily in the parts of the “problem” which required more communication. Central servers (perhaps from idea #1) could play a role in coordination and integration.

  3. Keep in mind that security might not be as big of an issue as some might think. Each “user” would see, but a very small piece of the whole “puzzle”. In addition, perhaps encryption techniques could be utilized for especially sensitive projects (albeit at the expense of CPU power).

  4. What if this approach was used in combination with something like a CRAY X1? The “supercomputer” would function as the “server” and not only integrate the “grid” data, but handle those parts of a “problem” less suited to distribution on the network grid (or we could have servers that integrated the grid data and then submitted the information to the X1 thus relieving the Supercomputer of the burden). Under this scenario the “grid” would simply augment the processing power of the X1, rather than replacing it entirely. This should represent a quantum leap over every other “supercomputer” today in existence.

  5. People, would probably not receive a great deal of money (or none) for their participation. However, if they knew that they were participating in a worthy cause, many would probably volunteer there excess CPU power (as many have for SETI). What if we got a million people worldwide to serve as our “Grid” integrated with a CRAY X1, just how powerful would such a computer be?

This part wouldn’t work very well.

Supercomputers like the Cray X1 are designed to be very high speed computational processors – but are generally NOT designed for high speed input-output. In fact, standard mainframe computers usually perform better than supercomputers on I/O processing; that is one of the things they are designed for.

So early buyers of the first Cray’s were told not to sell off their current big computers (usually Control Data 7600’s), instead those were reconfigured to do the input-output functioning for the Cray I. You needed a machine that big to keep the data coming for a Cray!

So a modern supercomputer would not function very well as a “server” – that’s quite the opposite of what it was designed for. This would be a better task for a machine specifically designed for high-bandwith I/O processing.

The point of the followup question was to explore the possibility of combining the best in Supercomputer technology (such as the CRAY X1) with a grid/distributive system. Could this be done? Perhaps it would require a series of “servers” to feed the “grid” results back to the CRAY. If it could be done how powerful would such a system actually be? Consider, the SETI numbers above, integrated with a CRAY as an example.