There are several misapprehensions in the o.p which lead to inappropriate apples to oranges comparison of performance characteristics. The first is that thrust-to-weight ratio (TWR) is a critical parameter in assessing vehicle-level performance. A TWR is a good metric for comparison of thrust performance at the engine level, but the engine is actually small part of the dry mass of the vehicle, and even less than the total gross lift off weight (GLOW) of the loaded vehicle, so the weight of the engine(s) by themselves is not strictly germane. As the o.p. notes, maximum thrust is typically only used at liftoff, and once the vehicle is moving at an appriciable speed the engines are throttled down so as to reduce aerodynamic loading and flight vibration environments at the point of highest dynamic pressure (max-Q alpha). Then engines are then typically throttled back up once the atmosphere is thinned out to reduce gravity drag losses but there may be additional restrictions on the acceleration thrust load that the payload can tolerate. So, high thrust is nice to get moving (hence, why solid propellant boosters are used as “Stage 0” strap-ons desipte their relatively poor mass-specific performance), but isn’t a critical measure provided that some threshold level of thrust over unity can be delivered. Since most of that thrust is used to lift propellant that will be used later in flight, a lower TWR with a higher specific impulse (the normalized measure of propellant mass per unit thrust) may give better overall performance in terms of payload to a given orbit, which is exactly what the RD-180 with its 3800 psi chamber pressure and sea level I[SUB]sp[/SUB] of 311 s offers versus the Merlin-1D with a chamber pressure of ~1400 psi and sea level I[SUB]sp[/SUB] of 282 s.
Second, the Atlas V uses one NPO Energomash RD-180 engine, which is a dual combustion chamber, dual nozzle engine (perhaps giving the impression that it is multiple engines). The RD-180 is derived from the four chamber RD-170 engine used on the Energia launch vehicle and the Zenit rocket (in modified form). Given the incredibly high performance from this engine and its extended and problematic development it has performed with remarkable reliability with only one propulsive failure (debris in turbopump). In static fire testing the engine has endured up to twenty full duration static fire tests without measureable degradation. It is basically considered the crown jewel of the Russian rocket propulsion system development and a version of it (RD-190 and -191) is intended to be used on the Angara system, the Russian replacement for the existing Proton vehicle expected for operation through 2065. BTW, Pratt & Whitney attempted to build this engine under license from Energomash but was unable to build it to the required quality standards and repeatedly failed qualification testing. Unfortunately, that also leaves EELV dependant upon a foreign-sourced engine to power its RP-1/LOX launch vehicle. (The Delta IV is all cryogenic with the attendant problems that come along with that.)
Third, when we talk of reliability, it has to be looked at from a system-level context. Yes, if one or even possibly two of the Merlin-1D engines on the Falcon 9v1.1 experienced a non-catastrophic propulsive failure which allows controlled shutdown, the vehicle may still be able to delivery the primary payloads to orbit (depending on payload mass and orbital parameters), but despite efforts to isolate and protect the engines from one another, there are still failure modes which could result in multiple engine failures or loss of control of vehicle, and of course all of the additional plumbing, controls, mounting hardware, filters, turbopumps, et cetera associated with each engine. For instance, a water hammer event that feeds back into the propellant manifolds, or loss of thrust vector control system, or any of a number of failures which are not simple shutdowns could result in a loss of vehicle criticality for which there is no recovery. Propulsive failure or even a significant thrust imbalance in a side core of a triple core system (Falcon Heavy or Delta IV-H) would probably result in adverse bending modes or structural imbalance resulting in loss of guidance and control. All things being equal, nine engines increases the potential for failure exponentially to order of the number of engines, e.g. a Falcon 9 with n=9 engines will have a composite probability of failure of P[SUB]f[/SUB]=1-R[SUP]n[/SUP], where R is the per-engine reliability. So, for a R=0.997 (the nominal “three sigma” level of reliability), a vehicle with a single engine (n=1) such as the Atlas V or Delta IV will have a P[SUB]f[/SUB]=0.3% for propulsive failure, while the Falcon 9 with n=9 will have a P=2.7% chance of propulsive failure; nearly an order of magnitude greater. On the same basis, a triple core single engine vehicle like the Delta IV-H with n=3 will have a P[SUB]f[/SUB]=0.9%, while the Falcon Heavy with n=27 is P[SUB]f[/SUB]~8%. (All of these numbers are for illustration only, and these are not the predicted reliability of the Merlin-1D engine and Falcon 9v1.1/Heavy propellant feed systems.)
Of course, we can’t just assume that reliability of all engines and propellant feed systems is the same; each has to be considered in the context of the system in terms of design robustness, redundancy, complexity, and ultimately demonstrated reliability of the overall system which cannot be assessed except by a significant body of flight history. It may be that the Falcon 9v1.1 and Falcon Heavy are as or more more reliable than the Atlas V (which has a nearly spotless record to date) but we do not have enough data to make that assessment, and handwaving that having multiple engines automagically offers additional redundancy when that is actually only true in some portion of some trajectories for some types of failure is glossing over the additional complexity having so many engines requires.
Fourth, there is a dramatic performance benefit to using all cryogenic propellants on an upper stage vehicle as the propellant mass carried in the second stage is close to a 1:1 trade for payload mass for a two stage vehicle, hence why Atlas V and Delta IV cryogenic upper stages. However, this comes with all the attendant problems of LH2, e.g. propellant expansion, thermal stresses, leakage, embrittlement, high dynamic slosh, low propellant density, et cetera. SpaceX decided to trade the performance of LH2/LOX for the greater simplicity and higher volumetric energy density of RP-1/LOX propellants. Of course, this also adds some other issues that aren’t typically experienced with LH2/LOX engines, such as fuel gelling, manual purging of lines, payload contamination, et cetera, so there are tradeoffs beyond performance to be considered.
As for launch costs, I don’t know that we can really make a head-to-head comparison. On one hand, it is pretty clear that SpaceX is still eating a lot of costs as part of their development efforts, and what they are currently charging customers for launches does not reflect their actual costs including unplanned labor, design and process modifications, et cetera. On the other hand, the costs of the ULA vehicles are ridiculously higher than the original cost targets for the EELV and are effectively subsidized by the Air Force bearing the cost to maintain launch facilities and other sundries to support the program, whereas SpaceX currently maintains their own launch facilities (albeit with sweetheart lease deals). If SpaceX can launch for even half the cost per payload kilogram to orbit that ULA can, it will still be a significant improvement over current costs, and would make SpaceX a viable and badly needed competitor to ULA.
The real question is whether SpaceX can build up enough of a commercial customer base to justify the F9v1.1 core vehicle. Right now it has way more capability than nearly any commercial satellite operator needs for a single payload; in order to keep costs per payload mass reasonable, SpaceX will have to demonstrate reliable multi-payload deployment capability, which is a highly challenging endeavor that ULA has mostly avoided with their vehicles. If they can do that, and hold launch costs to a reasonable level, and demonstrate good (~98%) reliability, and get commercial providers to buy into building satellites that can be integrated horizontally into a multiple payload stack, then there is a good chance that they can foster nascent elements of the commercial spacecraft industry which have been waiting for significant reductions in launch costs. This would be a good thing for the entire aerospace industry, which is largely dependent upon government (primarily Department of Defense) expendatures to sustain space access, and would potentially open up multi-billion dollar sectors of new space applications in Earth surveillance, satellite communications, space resource utilization, et cetera. But despite the marketing glossies from the SpaceX PR department, this is still far from certain. Space launch and access to space is still a very challenging problem–as SpaceX themselves have discovered–and a rocket launch vehicle is always a fraction of a second from literally going sideways and burning up hundreds of millions of dollars of payload and hundres of thousands of hours of person-effort from liftoff all the way up to orbit.
Stranger