As part of my new job responsibilities I am going to be, well, responsible for maintaining, upgrading, and specifying a couple of Beowulf clusters. Fortunately, this isn’t quite a blank slate; we’re inhereting an existing older cluster (although it needs some amount of hardware upgrade and a complete reload of OS and software–most likely Fedora), and I have the existing Responsible Party’s help, guidence, and most importantly, cell phone number for spec’ing out a new cluster. However, aside from some modest experience with Linux desktop administration and very rudimentary knowledge of networking and DNS/BIND/Samba, I’m fairly uncallused with regard to dealing with such a system.
A few questions:[ul]
[li]Preferred operating system: The current large cluster (which we’re not inheriting) is running RedHat Enterprise Ed, but we’ve an older config of Fedora that’ll probably be loaded on the old cluster. The new one, however, is unspec’ed. It was suggested by someone that we consider a G5 cluster running XServe. Or we could install RH, or any other number of cluster Linux distros, or a *BSD, or whatever on an x86 architecture, et cetera. Recommendations?[/li][li]Best general basic references: I’ll check out the O’Reilly Bookshelf; any other recommendations?[/li][li]Favored message boards or other sources of online information?[/li][li]Advice, complaints, anecdotes, sympathy, insults, abuse, disdain, et cetera.[/li][li]Help?[/li][/ul]
I suspect your post got lost in the weekend shuffle; I hope there’s someone here who can help you. I did have the opportunity to discuss this very topic with a friend this weekend. I don’t know if what I learned is helpful, but here goes, just in case:
[ul][li]You don’t have to worry about the Beowulf cluster becoming self-aware and taking over the world. Although that scenario is pretty common in the movies, it apparently doesn’t happen in the real world. Which is, I think, good to know. It certainly would make me feel better.[/li][li]It’s unclear why it’s called a “Beowulf” cluster, but I’m pretty sure that there isn’t a “Grendel’s Mother Cluster” out there looking for revenge.[/li][*]I am the only one who thinks these jokes are funny, but if you want, you could try them out with your co-workers. No charge.[/ul](Bump. )
The scenerio of one of these machines “coming alive” and taking over the world is somewhat less plausible than you convincing a witness to confess to a 20 year old unsolved murder while cross-examining him about questionable accounting practices. (I’m assuming that this doesn’t happen in real life with the same frequency it does in Perry Mason mysteries.) We do have a couple of Unix boxes named hal and skynet, but they have thus far shown no proclivity to take over the world or eliminate mankind to resolve system conflicts. They just flash their lights and making “whir-whir” sounds whenever they get upset.
But should it occur, I’ll call you and let you know before the missiles are launched, so you can duck and cover.
I’m hesitant to say this and end up looking like an ass, but this cluster of questions might be a bit too technical for the GQ folk to answer. You might do better by finding a Beowulf-cluster-specific newsgroup either on the Web or on Usenet.Accessing Usenet via Google Groups is more-or-less acceptable, but you have to learn to reply properly to avoid pissing off the people most likely to help you. Also, top-posting is universally considered rude.
The users who use them probably know a lot more about it than you; why don’t you ask them? And once you’ve read up a bit, book yourself on a course. You’ll learn quite a bit, but more, it will really help you to pull together all the bits you’ve learned by yourself.
And just as you’re asking here, don’t be afraid to have someone knowledgable on retainer.
Just a thought, but if it won’t do you out of a job and usage is sporadic, look into hiring someone else’s cluster on an as-needed basis.
I work on a 28 node dual CPU (2.4ghz) beowulf cluster. Should be getting a 128 node dual cpu this spring
We use Fedora Core 4 (uname -r == 2.6.13-1.1532_FC4smp) with Maui over PBS for scheduling jobs. We also use Mpiexec for parallel jobs. A good book is “Beowulf Cluster Computing with Linux.”
I’m guessing, from the content of the previous replies, that we’re not talking about chunks of Beowulf mixed with caramel and covered with chocolate. Pity, that.