# how many gigabytes is human DNA

I was daydreaming while i was watching Tron the other night. The scene where the MCP zaps Flynn with lasers got me thinking. The laser took him apart 1 molecule at a time and rebuilt him inside the System.

So assuming the MCP actually had enough space back in the early '80s, how much space would 1 human’s DNA take up? Not necessarily how much space a human body would take up in terms of bytes, but just the genetic code itself?

Show off!

Okay back to the first question: How much space would 1 human body take up molecule-per-digital-molecule?

I guess that depends on how you encode it. There are about 176 trillion molecules in a single human cell. Multiple that by the number of bytes you need to describe the molecule and by the number of cells in your body (Some sources say 10 trillion).

I was reading something recently that said the physical location of the genes in the folded molecule was also important, which means the total information is probably more than just the straightforward calculation.

You’re also going to need to record the information related to where those molecules are in relation to each other. It would suck if you did all that work and then it rebuilt you just laying each molecule end to end in a straight line.

Probably need to get the information on various electrical currents as well or it would rebuild you dead.

But I have no idea what the answer would be. But I’m guessing that Sergey Brin has it in a spreadsheet somewhere.

Unless there are extra degrees of freedom (ie, more than one way for a given strand to fold), then the sequence includes all the information required. We may not have the understanding to interpret all the information fully in, say, a computer simulation, but it’s all there.

For bonus points, calculate the average bandwidth of a typical male orgasm.

Well, that’s an image I never expected to encounter…

Possibly more than a stationwagon full of tapes. Those stuffy old computer science textbooks should try to keep up with the times.

There are various epigenetic markers as well. For example, all of your DNA is compacted by histone proteins – every ~140 base pairs is wrapped around a histone core. That core can be modified in many important ways; IIRC there are eight different arms that extend from the histone core. Each of those, in turn has maybe a half dozen possible modifications (some more, some less).

Those epigenetic markers are a big part of what determines how DNA folds up. Some histone markers promote DNA folding into a compact form that silences the genes contained within, and other markers promote unfolding to allow genes to be transcribed.

To complicate this even further, these epigenetic markers can be very dynamic, changing on a fast time scale. Other times they mark relatively permanent changes during development. In other cases, epigenetic information can be heritable, sometimes even carrying a record of the environment of your ancestors.

It’s hard to put a number on how much epigenetic information there is, but I’d guess that there’s at least as much information there as there is stored in DNA.

If I remember the article correctly, the location of the genes after folding change with the age of the cell/body/organism. Some genes are close to each other due to folding early on and they can move towards the outside of the structure with age.

You could probably store all the information of a human body in a finite amount of space. The amount of memory required would depend on the precision you want. If you sort the human body into a grid with femtometer divisions, then place each isotope of each element with it’s nominal charge at it’s approximate location, then the information to “solve” the monstrosity of a Hamiltonian is all pretty much there. I guess additionally, you need to know the nominal electron configuration of each atom, since otherwise the Hamiltonian will assume the ground state.

Of course, if you want to compute the behavior of this simulated entity through time, you will need magic.

You can absolutely store all the information of a human body in a finite amount of space. In fact, you can store all the information in 5 human bodies in a Geo Metro, although it might be a tight squeeze.

There is a mistake in that wiki table. I have no idea how to correct it, but the total MB for XX should be larger than that for XY, given as X > Y in MB size. Whoever made the table swapped the data for males and females. I assume someone here knows how to modify a wiki entry?

They didn’t swap the data. They considered only one copy of each pair, and for men, they counted both the X and the Y. It doesn’t seem the right approach to me, but that’s what they did.

I always wondered when I’d find a practical use for the Dirac Delta function…

I see. Thanks! I was wondering if I had missed something obvious, but wasn’t inclined to start adding things up. I suppose it makes sense if you consider only the minimum raw data - men do have an “extra” chromosome in that case! it does lend itself to confusion, though!

As if I needed further humbling. I was hoping to at least require a DVD-ROM.

Not really. The fact is the human genome was much, much smaller than we realized, and about 30% of it is shared by most life forms. You’re not that much different than a banana.

What is important is not the genome (which is the software needed for building proteins), but the hardware that is needed to control it, and that hardware is embedded inside the cell. Equally important are the non-coding areas in the chromosome that many people don’t end up counting since they don’t encode proteins.

Sometimes this is referred to as Junk DNA. Junk DNA makes up the vast majority of our DNA. For example, we contain in our DNA almost the entire sequence of DNA to synthesize Vitamin C. However, the last step for building Vitamin C is damaged, and thus, unlike most mammals, we need to eat foods that contain vitamin C. Thus, the entire vitamin C synthesizing sequence is junk DNA.

However, some of that so-called junk contains instructions on how to read and build the encoding parts of our DNA. Some of the directions contain start and stop sequencing directions, so we’re not synthesizing proteins unless we need them.

So, the genome itself is quite surprisingly small (I think there’s 20,000 genes). But, the non-encoding DNA is much bigger (about 10x to 20x as big by some estimate). Much of that non-encoding DNA is junk, but some of that apparently contains important assembly instructions.

And, then there’s the structure of the cell itself which is needed to read the genome and actually do the building.

Think of it this way: The genome is like the user interface part of the OS that we see. It’s the Windows in Windows. The non-encoding DNA is the underlying part of the OS that builds the structures for the user interface to run. The cell itself is the hardware and built in ROM that’s needed to actually run Windows. Otherwise, that Windows CD ROM is just an expensive coaster.