Can a lone computer engineer/technician assemble a supercomputer?

Question above, qualified by relying only on off-the-shelf chips and components (presumably intended for personal computers.)

If not, what is the most “powerful” configuration he can assemble? A minicomputer? A mainframe?

Google “Beowulf cluster.”

Certainly. As a reference, let’s pick the 500th fastest computer on the TOP500 list as the minimum we can call a “supercomputer” (an arbitrary distinction, but fair I think).

It is a cluster using gigabit ethernet, and so doesn’t have any kind of fancy interconnect. That limits the kinds of problems it can work on, but it’s still useful.

It has a peak floating-point rate of 236.3 TFlop/s. That rate can be achieved by about 180 NVIDIA GTX Titan cards (1.3 TFlop/s double-precision each). If we can stuff 4 cards in a single computer, we only need 45 computers (and all stuff you can buy at Newegg).

A lone technician could easily put together and test 45 systems in a few weeks. Even if you needed more systems to make up for the inefficiencies in GPU processing and the slow interconnect, it wouldn’t be a dealbreaker.

Obviously 99% of supercomputers run on Linux. Should you want to build a cluster model and am comfortable using the terminal command line ( which is barely needed in personal linux computing modern distros ), what would stop you would not be the hardware, but the cost of power and cooling.
Actually most distros come with clustering software free in their repositories.

Wiki-How - How to Build a Supercomputer

For that matter, most of what get called “supercomputers” these days get most of their speed just from being massively parallel. The individual components might be slightly faster than what’s in a top-of-the-line desktop, but not an order of magnitude faster.

Of course, this sort of brute-force approach on the hardware end only gets you so far. The flip side is that, in order to make use of that brute force power, you need to design the software for your problem in such a way that it can be run efficiently in parallel. This is easy for some problems, but harder for others.

Often not even that. If you look at the top entries on the TOP500 list, the CPUs are all quite modestly clocked: 2.2 GHz for the #1 and #2 spots, and 1.6 GHz for #3 (of course, clock speed does not directly correlate to performance, but these are fairly conventional CPUs here and so the relation mostly holds).

The reason is that power consumption is pretty much the dominating factor these days, and two CPUs at 1.6 GHz consume less power than a single one at 3.2 GHz. On a single-user desktop, this isn’t always a great tradeoff since so many apps are still single-threaded, but on a supercomputer the problem space is already guaranteed to be massively parallel and hence it’s a reasonable idea (as long as the other costs don’t go up too much).

Minicomputers and mainframes really don’t exist in the modern world in the way they did. Nowadays a minicomputer is a Raspberry Pi, and a mainframe is an IBM Z series.

Beowulfs get in the Top 500 easily, partly because the benchmark for entry (Linpack) works very well on them. So even simple commodity networking (ie Ethernet) works.

That said, supercomputers, in the main, ceased being a special sort of computer well over a decade ago, and became some form of cluster based system, based in whole, or in part on commodity parts. The difficulty in assembling one comes down to logistics, and money. How you define a supercomputer in modern times is also an open question.

If you want something faster than Ethernet, go for Infiniband. Still (expensive) off the shelf commodity components. Storage can be a range of options, from local - per node, right across to huge object server arrays - you can still build these with commodity bits, just run GPFS or Lustre on them.

More advanced machines, like the SGI UV range have proprietary interconnects to create a single system image, but are still basically commodity mother boards with a few custom parts added. Only when you get to the specialist machines form IBM and Cray do you really get into the realm of really custom components, and even then, most of the parts are still off the shelf.

In a previous life I have personally (with the help of an enthusiastic student) physically rebuilt 2 clusters that rated in the Top 500. Mostly the effort is shlepping boxes onto shelves, making up hundreds of Ethernet cables, running them, testing everything, sorting out power, cooling, network switches, and eventually sorting out the auto-install process across all the nodes and configuring the whole shebang. One cluster used Myrinet, in addition to Ethernet, as the main network backbone, and that required a whole set of other physical cabling, interface cards, switches, and configuration and debugging. But there is nothing difficult about it.

GPUs are the big trick of the moment. Some jobs work on them really well, some are almost agnostic.

The reality is that many Top500 supercomputers are used as general computer servers and run many simultaneous jobs. If you have a task that needs very large scale compute, you need to do some serious analysis of the task to determine the appropriate partitioning of the task, and how that relates to the money available and the balance of the system design. It is common that you need as much money in communications, or memory as the rest of the system. Some jobs need a huge shared memory image, some need very low latency communications, some need high bandwidth communications but are less sensitive to latency, some just need insane amounts of memory, some need huge IO bandwidth, some need a mix. Getting it right can mean huge differences in the grunt for the dollar. Indeed, getting it wrong can mean that no matter how much money you spent it won’t go fast. Just building a Beowulf that gets big Linpack numbers is not the mark of a real supercomputer. Real supercomputers get the job in hand done really fast, and it is just luck that they get good Linpack number. (Of course there are a lot of tasks for which high speed linear algebra is the core problem, these do have high Linpack numbers - by design.)

ETA - the choice of the exact CPU is part of this balance. We used to have jobs for which it was insane to go for the top bin part. (Unless you had Intel in bed with the supplier giving a big discount- which happened once.) Very often the bang for the dollar sweet spot was the second bin down. But there are job mixes for which fewer and higher spec CPUS are the right answer, just as there are jobs for which insane memory and as high a speed CPU as possible is the right answer.

The number of units sold is certainly small, but IBM still does somewhere around $4 billion in mini’s (as400) and somewhere around $6 billion in mainframes.

Although Unix servers aren’t typically called “mini’s”, they occupy the same spot in the market (smaller than mainframe, larger than x86 servers), so you could kind of/maybe call those mini’s also.

For me a mini was a PDP-11, which usually meant a machine used for very basic computing or often as a laboratory or industrial control system. Hence why a Pi is the logical replacement. (DEC first marketed their machines as such). The AS400 is a much more recent thing, and about the only thing in common with a PDP is the physical size. Not that the AS400 isn’t a really nice box - it is one of the few different ideas in computer architecture that persists[sup]*[/sup] to this day, in the face of ubiquitous x86 abominations.

  • There is a joke here for those that care.