How sophisticated are computer models of human cells?

How well can computer models replicate the biochemical inner workings of a human cell?


I’m no expert, but I do know we don’t even know how to fold the proteins the cells create into their proper shape. Stanford uses spare CPU’s on people’s PC’s and PS3’s to try to crack this one problem.

They may be able to measure reaction rates for known processes and go from there to do very very basic simulations, but if your thinking modeling the cell by modeling the chemicals in it there is no chance. As RaftPeople mentioned, they can’t even predict how a single protien folds, an entire cell is out of the question. Even the Stanford calculation are using major approximations. A protien is way too huge for ab-initio calculations.

We can model the Hydrogen atom perfectly though.

Currently we are limited to the hundreds of atoms for “perfect” modeling using quantum mechanics (and as WarmNPrickly said ab initio computations, meaning “from scratch”). Those techniques take the electrons into account, and lots of things that only quantum mechanics guys understand (like eigenstates). We can model very large proteins decently though using standard molecular calculations, and can even design drugs that work off of these calculations. They aren’t perfect, but they work. I’ve never seen anything larger than the ribosome actually modeled though, which is still very small in relation to the entire cell.

There are some very nice videos on the inner life of the cell though, see, and click on “Watch the video” on the left a few paragraphs down.

It’s really simplistic right now. I think some of the best models can account for maybe a few dozen interactions in a single process. Probably not even that, if you want to really believe the model. On the neurobiology side of things, people have created pretty decent quantitative models at the very lowest levels – the behavior of a particular neurotransmitter and its receptor, or the behavior of a nerve impulse as it travels down a cell. I think the more biochemical modelers have focused on bacteria, where again you have some very simple models. Every model I’m aware of, however, treats a given protein or interaction as just a handful of kinetic coefficients, completely ignoring all of the complexity that other posters of mentioned.

These models have been useful enough to make interesting predictions, or show gaps in our understanding. Compared to the overall complexity of a cell, however, we know very little about how it all interacts. Just bits and pieces, mostly without connection to one another. To make a crude analogy, if we compare a cell to something like a computer (which is much simpler), our best models cover bits and pieces – how a resister behaves, how a transister works, and possibly how the simplest logic gates might be built.

(Excuse my generalization here… this is outside my area of expertise, mostly from stuff I encountered tangentially in undergrad biology classes).

Regarding protein folding, what exactly are they trying to do? As I understand it, a ribosome spits out a chain of amino acids which then spontaneously fold up in some way to form some enzyme. How they fold is dependent on their environment, correct? Furthermore, is it in this way that a gene can encode more than one enzyme? How do the models work? I gather from you that they are too complex to model atom by atom. Assuming they had sufficient processing power, could they do that if they wanted to, or is there basic science yet to uncover at work here?

Thanks for your help,

That’s several complicated questions mashed together.

Generally speaking, the shape of the protein is specified entirely by its primary amino acid sequences. There are some exceptions - disulfide bonds, chaperonins, etc, but theoretically, if you know the amino acid sequence (which is easily decoded from the genetic sequence), you should be able to predict the shape. The environment can have some minor effect in the exceptions listed above, but generally your protein will have the same overall shape regardless of its environment. I’m leaving out a LOT of caveats and exceptions here. Theoretically. Folding is driven by several forces. There are electrical charges on some residues that attract and repel each other, there are partial dipoles that are created transiently as different electron clouds interact, and there is interaction with the solvent. This last is primarily important as hydrophobic residues seek to interact with each other and exclude water.

So we know all the players involved and we know the forces that act on them. Should be easy, right? The problem is there are hundreds or thousands of atoms involved, and they all interact with each other. It’s just way too much to calculate. There have been some attempts made to do it using either massive supercomputers or massively distributed computing, a la SETI@Home, but we don’t have a good handle on it yet. The basic science, though, is pretty well understood.

Forgot to address the part about getting multiple enzymes from one gene (it’d be more accurate to say multiple protein isoforms, but hey): it’s not done through folding (excepting prions). It’s usually done through alternate RNA splicing, which gives more than one amino acid sequence from one gene. If you have one protein that can fold in different shapes (which many of them do), both shapes are still considered the same protein. Usually, the different shapes are related to its function.

They used to (and for all I know still do) model fluid dynamics problems on multi-processor machines by having each processor represent a particle in the fluid. Is the problem of folding approached this way?


Right, and as I alluded to in my previous post, we can precisely calculate the energy states of all atoms up to and including a single Hydrogen atom. Of course the only reason we can precisely calculate the Hydrogen atom, is that it is only composed of two particles, so the solution is easily reduced to solvable equations. The instant you try to include a third particle such as another electron, that electron interacts with the other electron as well as the nucleus, which is of course interacting with both electrons.

Typically, these problems are now solved with Monte Carlo methods that invoke random sampling of possible outcoms. The more possible outcomes you calculate the more likely your answer is correct, but you can see where things get nasty quickly. If your simulation always predicts one outcome, then the 1/1000 outcome is ignored, but in a large system, that outcome can’t be ignored.

Bob55 suggested they could do pure ab-initio calculations on hundreds of atoms, but I think that is extremely optimistic. I had calculations done on a series of molecules I made, and the first thing the computational chemist did was essentially chop off everything that wasn’t relevent. So my two PPh[sub]3[/sub]'s became PH[sub]3[/sub]'s. And the ring Ph’s also became H’s. Even at that, the calculations took weeks to do. Unless you have massive resources and an unending budget (because that type of computer time costs big bucks), I think even one hundered atoms is out of the question.

There are semi-emperical methods that may be more practical for hundereds of atoms, and often these methods are nearly as good as the ab initio, but with these methods come more assumptions and a higher probability of error.

For calculating something as large as a protien, they are almost definitely using simple molecular mechanics. This is the sort of calculation that smeghead is refering to. Like he said, even with all those assumptions, you have a huge number of calculations to do.

One of the problems is simply trying to figure out how a protien is folded in the first place. We can sequence a protien fairly easily. I’m not a biochemist, but I beleive they essentially break it into peices then try to fit the pieces together. This doesn’t tell you how it is folded in it’s natural state though. You can’t just get a microscope and look at it you know.

If you are lucky, you can get a protien to crystalize out in it’s natural environment. Then you take the crystal to your local crystalographer who hits it with X-rays to see how it defracts. They do a bunch of Fourier transforming and get you a pretty good arrangement of the protien. On the otherhand, the protien can fold differrently in the crystal from the way it folds in vivo, so even that might be wrong.

And that’s what happens when your lucky. If you are unlucky, your protien never crystalizes out. Then you have to look at all sorts of data and try to figure out what each peice on information means. For example, you can take a spectrum where every singly bonded carbon near a nitrogen gives a signal, then get one of those carbons to make all the carbons near it jump. It’s a complicated process, and there is a lot of guesswork. All of this time you might have a computer trying to calculate what possible configurations are consistent with all the data you have aquired.

This isn’t really the way modern computational fluid solvers work. Tracing each individual particle and its intermolecular coupling forces directly is computationally prohibitive for even the smallest practical system; what you describe is only useful for developing and validating continuum or discrete approximation models. There are also dramatic differences between computational fluid analysis and protein folding behavior. Although a fluid system is complicated and may have many different regimes, the equations to accurate simulate behavior on a practical level are actually fairly simple and, for the most part, readily linearized (or at least can be approximated as linear equations with straightforward adjustments). Proteins, however, are highly complicated in that a small change to the position and orientation of one part of the protein can cause a dramatic change in the forces on a different area of the protein. Modeling the behavior means that you have to calculate these effects, which then propagate to other self-interactions in the protein chain, et cetera, including the original locus of action. Protein behavior models, then, are closer in complexity to the self-interacting charged plasma codes used in magnetic confinement fusion, and the difficulty of accurately simulating and controlling such interactions is similar (and well beyond current capability). The interactions are highly non-linear and perturbative, and require far more complex mathematical simulation than relatively straightforward (if computationally intensive) matrix algebra and a bit of linear calculus.

As an analogy, computational fluids analysis (or structural stress and thermal analysis) is like a length of string with a lot of knots in it; it might take you a long time to untangle the individual knots, but for the most part each knot can be discretized into its own individual problem. Protein folding, however, is like a big, tangled ball of string in which you can’t find the ends, and every time you untangle one bundle of it you find you’ve made a tangle worse somewhere else. It is an extremely tough problem to which we don’t even have a good avenue toward an exact solution; only successively better rough approximations to the problem.