It wasn’t a 19 day project. It was 19 days from the moment the first racks started being delivered to the already up and running facility, to the point where some computing ran.
From Nvidia’s newsroom
The press release says 19 days until it started training, but omits to note whether that was training on test data to validate the system operation, or actually running production code on production data. I know which way I would bet. Systems of that size will always take time to sort out.
The infrastructure took 122 days. In parallel with that, Nvidia would have been building up racks of compute. We can assume this 122 days was from the day the spades hit the dirt, not the time taken to design, find contractors, sign contracts, get approvals, and order long lead time equipment. The large scale power and cooling systems are not off the shelf components and lead times can be significant.
The entire project is claimed to have taken 16 months. Even that will be on the back of significant early work. 16 months will be the time from signing the master contract.
Supercomputer is not a term that describes one defined thing. The really big systems are carefully designed around the problem space they are intended to address and that takes a lot of work before anyone is ordering any hardware. Raw compute is usually the easy bit. Balancing compute, memory, storage, and especially communication bandwidth and latency requires a deep understanding of the problem and how you get the best bang for buck.
An AI supercomputer helps in that we can assume a lot of that work has already been done. Nvidia are working hard to be a one stop shop. It isn’t just about the GPUs. Nvidia’s acquisition of Mellanox jumped them up to being a dominant competitor later in communications infrastructure. The Nvidia article trumpets their RDMA (remote direct memory access) system. Something that has been a mainstay of HPC for decades. It goes back to Myrinet and Infiniband, which all got gobbled into Mellanox years ago.
It isn’t unreasonable to imagine the system was configured from a known scalable design template. That would save a huge amount of time. But would still take significant time. Nobody is going to sign off on a many hundreds of millions to billions of dollars design without being clear that it isn’t a career limiting move. (Whilst these sums sound insane, none of this touches the oil industry. A single well can cost significantly more than 100 million. And there is an exploration manager that gets to decide where it goes. That focuses the mind.)
19 days is just a vanity number and one that conveys little to nothing about the actual project.
Nvidia might like to point to it as a way of advertising their one stop shop advantage. But nobody is going to be fooled that the system was created in 19 days. Closer to two years would be my bet. That is from the genesis of an idea, to serious negotiations, contract signing, design of system, design of facility, start of logistics planning, ordering, sub contracts, work starting on construction, and so on. Wheeling the racks in is the last tiny step, but one that makes for good press and photo opportunities.