Amino acids and DNA

In DNA, the various amino acids are encoded by triplets of nucleotides. Some amino acids have more than one encoding. It is my understanding that the encodings are almost identical across all the species we know which is taken as evidence that the life here now is all related.

My question is, is there something mechanical (for want of a better word) about this encoding so that a certain encoding had to be related to a particular amino acid, or is it completely arbitrary and essentially just a look up table? In the latter case doesn’t there have to be someplace that information is stored?

The lookup table is stored in the DNA that codes for the different varieties of tRNA.

Transcription and translation are complicated processes that all have to work together to do anything useful, so saying any one thing is more important than another is a bit silly. However, in the process of translation, the amino acids are carried to the ribosome by tRNA. Unique tRNA molecules will match the different codons, and carry the appropriate amino acid for that codon. Because it is all one big interlocked system, if the tRNA was different then the protein coding DNA would have to be different, and on down the line.

So if I understand your answer it says there is a “mechanical” fit between a specific tRNA and the codons and also a mechanical fit between the tRNA and the correct amino acid. It is not an arbitrary code that could have been different (except that there are multiple codings for some of the amino acids).

There are mechanical fits on both sides, but it’s still arbitrary, because the tRNA pieces are made from instructions in the DNA, and could have been made to pair different amino acids with those codons. In other words, yes, it is basically a lookup table.

Douglas Hofstadter wrote an interesting article about this topic, called The Genetic Code: Arbitrary? (Note the initial letters of the title, har har.) The answer is, the code is probably arbitrary but the whole story is more complicated than it may initially appear. Surprisingly, the tRNA itself does not distinguish between amino acids and any tRNA can connect with any amino acid. But there is an enzyme (aminoacyl-tRNA synthesase) that acts as an intermediary between the tRNA and the amino acid to ensure that the right AA is selected. The enzyme recognizes the tertiary DHU loop in the tRNA to determine which amino acid should be connected. Hofstadter’s article is quite detailed and interesting and I recommend it. It was published in the March 1982 issue of Scientific American and also in Hofstadter’s book Metamagical Themas.

Here’s an older [1998] article that some may find interesting:

The Invention of the Genetic Code
Some ingenious hypothetical codes of the 1950s

In 1953 no one had yet read the sequence of bases in any DNA molecule—not one scrap of one gene. For proteins the situation was only a little better. Frederick Sanger was finishing his work on the amino acid sequence of insulin, and a few other fragmentary protein sequences had been published. But the very idea that every protein has a precisely defined sequence, the same in all copies of the molecule, was not yet universally accepted. Even the set of amino acids from which proteins are assembled was still subject to dispute (although Watson and Crick would soon sit down at the Eagle to write out the canonical list of 20). And all the biochemical apparatus for translating DNA into protein awaited discovery. Messenger RNA and transfer RNA were unknown. Ribosomes had been glimpsed in electron micrographs, but their function was unclear.

One area that was not quite so murky was the replication of DNA. From the moment Watson and Crick saw that the four nucleotide bases fit together in specific pairs—adenine with thymine, guanine with cytosine—the mechanism of replication seemed obvious: Unzip the double helix and form two new strands complementary to the original ones. One reason this process was so much easier to fathom was that the replication machinery does not have to consider the meaning of a base sequence in order to duplicate it, any more than a Xerox machine has to understand the documents it copies.

Translation, in contrast, cannot avoid semantics—and yet no one had a clue about how to interpret a sequence of bases. Even the most fundamental questions remained open. For example, since DNA is a double helix, should you look for information on both strands? If only one strand carries the message, how do you know which one it is? And which direction do you read in? Trying to make sense of the genome was like being given a book in a language so unfamiliar you couldn’t be sure you were holding it right side up.

If one understands how it works, one can apply mods:

They say “ontogeny recapitulates phylogeny”; and today’s life recapitulates the much more primitive mechanisms of early proto-life. Chemiosmosis is such an example – a complicated and seemingly useless step in energy harnessing but which was the energy source in the earliest proto-life. The many different Fe–S proteins reflect the origin of proto-life in a (catalyzing) matrix of iron-sulfide.

And translation of RNA sequences into amino acid sequences must have begun before the elaborate genetic code developed.

The following comes from Nick Lane’s Life Ascending, a 2009 book so won’t be up-to-date.
“[The] first letter of the triplet codon [corresponds with one of the] simple precursors [for amino acids.] Thus all amino acids formed from the precursor pyruvate [have T as the] first letter in the codon.”

As for the 2nd letter in a codon, “Five of the six most hydrophobic amino acids have T as the middle base, while all the most hydrophilic have A. The intermediates have a G or a C.”

But whatever properties were exploited in protein synthesis before the genetic code was “invented” are probably irrelevant now, though preserved in the code’s detail.

That may be the same book I read that suggested based on redundant codons that the original codons may have been only 2 letters long, for 16 possible combinations.

A MUST read, along with his The Vital Question

Better yet, obtain Lane’s more recent books and post reviews!

IANAB, but I think the crucial word here is ribosome. The ribosomes are where the protein synthesis takes place. A messenger RNA transfers a codon (three letter sequence of bases) to the ribosome and the latter chooses the corresponding amino acid from the cellular soup and adds it to the protein being assembled. So the place where codon is translated into amino acid is there.

I also saw a paper once that claimed some mechanism by which some genetic codings would be better than others, and concluded that, by that metric, the coding that we have now is in the 99.9th percentile (that is, better than 999 out of 1000 other possible codings). The implication is that there were once many different encodings used by life, but the one we have now outcompeted all of the others.

Another historical review:

Origin and evolution of the genetic code: the universal enigma
Eugene V. Koonin* and Artem S. Novozhilov
IUBMB Life. 2009 Feb; 61(2): 99–111.

From the introduction:

The fundamental question is how these regularities of the standard code came into being, considering that there are more than 10[^]84 possible alternative code tables if each of the 20 amino acids and the stop signal are to be assigned to at least one codon. More specifically, the question is, what kind of interplay of chemical constraints, historical accidents, and evolutionary forces could have produced the standard amino acid assignment, which displays many remarkable properties.

As a bit of trivia, the lead author of this review, EV Koonin, just missed out on being a lot more famous:

Entering the 2000s, Koonin and his colleagues used COGs and comparison of the order of genes in a genome to identify gene neighborhoods of interest and reconstruct conserved neighborhoods in various species. One of the neighborhoods Koonin and his colleague Kira Makarova identified (7) is now known as the CRISPR-Cas system, which is at the heart of current gene editing research (8). “Very regrettably, we did not quite correctly predict the biological function,” he says. At the time, in 2002, he and his colleagues hypothesized that the Cas genes were involved in a DNA repair system. “In a sense, not quite wrong,” he says, “And yet, off the mark.”