is it theoretically impossible to engineer a 100% reliable solution out of unreliable parts?

ICs get tested very thoroughly - more so than software - and still fail when put into a system.
Your problem is a test coverage problem. A component testing out okay either means it is okay or, more likely, the test isn’t good enough to find all failures. In hardware we have fault coverage metrics, which measure what percentage of certain defects a test can detect. When they first got introduced it showed that tests for ICs that the designers thought were good in fact stank. There are some test metrics for software, but not nearly as useful.

For testing designs, which is more like software testing than testing for manufacturing defects, we found that random tests are very effective. Hand written tests tend to mirror the assumptions of the designer (oh this will never happen - until it does) and have big holes. I know there are tools out there, but I’m not an expert. It is a big area…

Our parts go into big systems, and the voltage for each part is set individually, based on the performance of that part. If something is out of spec I don’t count it as a reliability fail - like if a chip fails because some guy is overclocking it.
We’re pretty conservative, and our FIT rate is always better than target unless there is some specific issue. A few years into one product our reliability guy sent me an email saying that according to his equations we should be seeing field returns. I looked at the return and repair data and told him, nope.
But it depends on your industry. In some too much reliability means you are leaving performance on the table.
At system level fans and power supplies are much bigger problems than ICs.

This is very true. As feature sizes get smaller, SEUs (Single event upsets - like from cosmic rays) get more important. Memories take this into account already. Though there have been some solutions proposed for logic, I’m not aware of any generally implemented. Of course things that are going into hostile environments pay special attention to this, hardening features and doing the well known checking and redundancy stuff.
Luckily SEUs in random logic are not likely to cause a problem, because they will only effect the system if the spike happens when the signal is clocked onto a storage element. SEUs in flip-flops are going to be worse.
We used to keep data on the elevation of where our systems were installed. You never know.
I’ve actually expected to see problems before now, but it looks like I get to retire before I worry about it. But I hired someone who did this for his PhD work, so if it does happen we have a resource.

It can be simpler than that; The measurement of reliability doesn’t need to be terribly precise on the first pass; the next thing to measure is the capacity for change - can the solution actually be made better at all?