A couple of years ago would be too early for 2.5D. Notice how buying the module means you get to buy their GPU and memory also? Another advantage from their point of view.
Quantum computer coding now possible. Quantum computer coding in silicon now possible
It is generally believed that all Polynomial time-complete (P-complete) problems are inherently unparallelizable. The article gives a list of just a few of the more famous ones.
One of the more interesting ones, for the purpose of this thread, is the CVP-problem. Give a set of inputs to a circuit (a computer chip), try to figure out its output. The seemingly inherent difficulty of significantly parallelizing the simulation of a computer chip limits the speed of designing/analyzing/debugging said chips.
OTOH, in such situations, there’s a really cheap way to parallelize: just do a lot of runs in parallel for different inputs. (Assuming you are going to be doing several tests.)
Which, as I mentioned above, is exactly what we do, with a ranch of thousands of processors. It works even better because the best way of verifying a design is to throw lots of pseudo-random instruction sequences at it, which is easy to parallelize. These find cases that humans tend to not consider.
As I also mentioned above communications overhead is the real killer for parallelizing circuit simulation. Processors are designed in parallel, with each team doing a cluster and then modules inside a cluster. Global signals are kept to a minimum since they are long and can slow down the chip. Simulating clusters by themselves could be efficient - but in fact communications overhead ans sychronizing the simulations means that we don’t parallelize in this way.
Pre-silicon debug by the way is mostly done locally, with full chip debug done only after integrations and then pushed back down. Post-silicon debug of interesting problems doesn’t involve much simulation at all.