The better and better design/complexity/accuracy of test rigs is, when viewed in isolation, is not key, of course, to design success. In the old days–and what the future engineers will say about us–is they “overdesigned” to cover any conceivable failure model.
Not optimum, but if the bridge stood or the cannon didn’t explode, a successful project is done.
I would think test rigs for aircraft have got to be up there for “most complex”; you have to test independent component design, fit and function, large-scale structural testing, wiring harness and electrical design, landing gear mechanisms, control surface mechanisms, engines, avionics, human factors (cockpit design, function operation) etc etc all before you even build a plane in the first place. Then you get to test it all again on the ground, and finally in the air. There are some things you can do now with predictive modeling (Bombardier’s CIASTA), but others you just have to actually test. It’s fascinating.
There are two types of tests for digital stuff. Functional tests try to exercise the function of the chip, and in that way detect manufacturing defects. They are usually written by the designers, and they try to exercise a chip in every way they can think of. As you guessed, they are very hard to write.
The second is a structural test, which doesn’t try to exercise the chip but instead tries to test each gate for defects. You do that by connecting all the flip flops into long shift registers, which makes the chip look like there are no memory state elements, which makes it much easier to test. (Even random patterns work fairly well.) This is the mode where we can generate tests automatically. I can testify from painful personal experience that leaving too many flops in cause problems.
There has been decades of effort on automatically generating functional tests, with only very slight success, and only in universities. However, in verifying really complex chips like microprocessors, we don’t rely on hand written tests alone, but instead generate more or less random patterns of instructions. You can generate a lot of these, and they often exercise things no one would think to exercise. It is harder for designers to put themselves in the mindset where things go wrong in odd ways - they mostly think about exercising the functions they were designing for.
Memories, by the way, get tested by defined sets of patterns usually generated on the chip.
The two biggest companies left are Advantest and Teradyne. I’m not associated with either, though I know lots of people at both. There has been a lot of mergers over the past ten or fifteen years, there used to be a lot more companies out there.
The reason that the high end ones are so expensive is that processors have hundreds of signal pins, and the test has to send and receive signals from these pins at speeds of hundreds of megahertz with extremely high precision. And do it for hundreds of thousands of stored patterns. ATE companies have to design circuitry faster than the state of the art (because that is what they are testing). And they have low volumes, so they can’t afford big development efforts. It is a tough business.