# Information contained in code

I saw a TV show once a long time ago that explained why DNA which has four bases contains more information that an integer sequence that has 10 bases. In other words it, in terms of information content, it is better to have fewer bases than more. I loved the show but I don"t remember the reason why this works. Can anyone help?

Do you remember what they mean by an integer sequence with ten bases?

Clearly a sequence of n base 10 digits has more information than a sequence of n base 4 digits. So they can’t mean that. Perhaps they mean that some base 10 codings have don’t cares? For instance, if you code decimal digits in binary simplisticly, the bit values 1010 to 1111 don’t have meaning, and are wasted - so better encodings were found.

I suspect that you or they was missing something.

As Voyager said, unless a bunch of values would be meaningless, a higher base will always be able to contain more information per digit–by it’s very definition even.

Maybe I’m on the wrong track but it was pretty clear that it was easier to encode information in binary or in DNA with 4 bases that it is with a decimal system. Maybe it wasn’t that more information could be coded but there was some other advantage. When you think about it, wouldn’t it be easier to have 100 different DNA bases than 4? Wouldn’t you need less DNA? Maybe I will ask this question in GQ.

The only thing I can think of is that the more symbols you have, the larger the molecules you need to encode them all. Other than that, I got nothing.

Less DNA sure, but fancier DNA transcription complexes. With a hundred different bases you’d probably also mess up that sweet double helix structure. You’d certainly make packing it into chromatin more difficult, and engender a need for whole new types of histone like proteins.
All things considered, I’d bet the transcriptional and translational error rates would go way up.

That is a confusing statement. Going strictly based off of what you said, it doesn’t make any sense. What show was this?

Perhaps, what they meant, was that 4 bases was enough to code all the information needed in the genome to define a cat or an aloe or an e. coli cell or a human? That having 10 bases creates too many possibilities, and that much more room for error? The current system of DNA is highly efficient, with relatively low rates of error, and enough redundancy and “junk” space to keep genes intact throughout lifetimes of replication. Keeping in mind that evolution is not an intelligent force, once primitive life developed the 4 bases for DNA, it was “good enough” and everything else just came from that. No need to consider advantages of 6 or 8 or 10 bases.

I’m not sure if this is related to your question, but one reason DNA works so well is that, in many cases, it can mutate without changing the information it holds. For example, as you can see here, if the sequence CAA somehow mutated to become CAG, the code would still be interpreted the same way.

Ah. Well, that statement is true. I’m not a biologist, and I managed to evade organic chem in college, but I suspect that a 10 base molecule would be overly complex, too large, and might never have evolved.

Computers use binary for simplicity. There have been many attempts at developing hardware that is inherently base four, say, but none have been acceptable. The chance of error, from noise in the system making a 0 look like a 1, say, is too great. You also lose speed, since a multivalue system would require a greater range of voltages, which takes more time to swing from one to another and to settle.

For DNA, I’d suspect cell division would take longer, since if there are more potential base partners there is less chance the right one will wander by half of a split DNA molecule and bond to the right place.

Maximum compaction is not always the best way to go!