How much of our DNA is "meaningful"?

For our purposes, I’m using “meaningful” to roughly mean “if we were to overwrite this section of DNA with another sequence, the observable differences would be negligible or non-existent*”. I suppose another way to phrase it would be “how much of our DNA is actually expressed”?

I’m curious about when we say things like “We’re 97% similar to <species> genetically”. When we say that, how much of that similarity is “junk” DNA? Not necessarily “junk” in that the DNA serves no purpose in no organism, just “junk” in the sense that “these genes are expressed in whales, but the sequence being present is superfluous in land animals” (or whatever).

Are there any notable diseases where, due to a flaw (hormonal, genetic, whatever) these sequences are expressed where they normally shouldn’t/wouldn’t be?

I suppose a programmer sort of way to say what I mean is “how much of our genes are used in the application, and how much is just the standard library?”

Assume I know literally nothing, I do know some stuff here and there about hox genes and how DNA is used to form proteins, but I don’t want misremembered knowledge getting in the way.

  • Note that I’m not using comic book genetics here, I don’t mean “if you changed the DNA in one of my cells RIGHT NOW”, I’m talking about if the organism was developed from a zygote with the change.

With some thought, it occurs to me that logically the amount that’s junk is probably a relatively small amount. Because of the high degree of genetic similarity between organisms, and given the fact that genetic reproduction is random, one would expect unimportant DNA to begin to vary much more quickly between generations than “good” DNA because unused DNA would have no effects on the organism’s survival (and thus there’s no impetus for it to stay the same).

Though this is assuming there’s no arbitrary mechanism that inhibits certain segments of DNA from changing even when there’s no really good reason to prevent it (other than perhaps that it was at one time important that it didn’t change, or is in close proximity with DNA that it’s important does not change). I think I read an article once about how certain parts of chromosomes are “wrapped more tightly” (or something to that effect) which makes it harder for mutations to occur, so that may well be the case.

Still, I’d appreciate answers, especially if my reasoning here is flawed.

A couple of things:

-I remember reading about an experiment where a team started deleting bits of the E. coli genome to see how much of it was really essential. I forget the exact number, but it was somewhere around half that they finally got rid of. One important caveat is that this was in laboratory settings - there could have been lots of genes that would be essential in the “wild” that they don’t need on a nice comfy Petri dish.

-You have to be careful about what you mean by “negligible or non-existent” changes. I could easily imagine a lot of changes that wouldn’t be immediately obvious, but would have long-term evolutionary impact. This is important, because those long-term evolutionary affects are how those sequences got there in the first place.

-I hate the term “junk” DNA, which was coined before we learned a lot of what we now know about how DNA functions. That’s not to say that there’s no truly useless DNA in the genome, but a lot of what we one thought was junk is now understood to have important uses.

-True “junk” DNA would be expected to have more variation between species, not less. I’m not sure what it was, but you said something that made me think that needs to be clarified.

-You’re using the words “genes” and “DNA” interchangeably, and they’re not equivalent. Genes are regions of the DNA that code for either a protein or functional RNA. By definition, NO GENE IS JUNK. ALL “junk” DNA is between genes.

-There are lots of genes in the genome, and we’re a long way from figuring out what they all do. There are lots of genes that we can live without if we have to. There are lots of genes where we simply don’t know what they do. I’m not aware of any where we know for certain that they do literally nothing - like a protein is made, and it just sits there until it gets degraded. However, the way we look for genes and gene functions virtually guarantees that if such a gene were to exist, we wouldn’t notice it unless we got very very lucky.

-There are lots of genes that should be expressed only in certain times and places, and there are lots of diseases - cancer comes to mind - caused by those genes being expressed incorrectly.

-The numbers you see comparing our DNA to some other species’ DNA are generated in a variety of ways. The ones I’ve seen most often are specifically comparing gene sequences, and not intergenic sequences. I’ve also seen comparisons of all non-repetitive sequences, which would include all genes and a pretty good chunk of intergenic regions. The devil’s in the details - you really need to find out what they’re comparing.

-If I may get on my soapbox for a moment, there is a LOT of highly repetitive DNA in the genome, and it gets largely ignored. It’s generally considered the best candidate for truly “junk” DNA. All these genomes that have been sequenced and put together just leave out these regions because it’s basically impossible to put the sequences together. And yet, my own research is looking at how some of these regions in fruit flies are vital for holding chromosomes together during meiosis.

-Which raises the larger question of how do we decide if a sequence is “functional”? I won’t get into it here, but suffice it to say that it’s a much trickier question that it seems at first.

What do you mean? Like the sequence isn’t actually expressed, but is where it is because it’s a convenient place for translocation mutations or something like that?

That’s what I meant in my second post. You’d expect junk DNA to quickly vary between generations because there’s no impetus for it not to change (since it has no side effects).

Sorry, I tried my hardest to be careful about that and not do it, I guess I missed it. I know they’re not the same, but it’s a habit.

But serious question: isn’t it theoretically possible for a gene to not be expressible in practice (though in was expressed at one point in that family’s evolutionary history)? In that case wouldn’t an entire gene be “junk-like” in that it’s an artifact and changing it wouldn’t change anything? Or is a “non-functional” gene no longer a gene? I suppose the question then becomes how one determines a dummied out gene was once an important gene and isn’t just a random meaningless filler sequence (which you get at later in your post). I guess in some ways it’s ultimately a philosophical matter rather than a meaningful scientific one, since it’s as good as a random sequence if it can’t be expressed.

There are some examples of this. Most famously, perhaps, and probably the most along the lines of what you’re thinking of, was the experiment in which researchers “turned on” dormant genes for tooth buds in a chicken. The chicken genome still had non-functional genes required for making teeth that normally aren’t expressed. If you look up “pseudogenes”, a lot them fall in the same category - sequences that look like genes, but are never used for one reason or another.

The thing is, once these genes stop being used, they tend to degenerate pretty quickly, since there’s no longer any selective pressure to keep their sequence conserved. The chicken tooth thing was a bit surprising for that reason. For some reason, those genes haven’t degraded all that much yet.

More recent conclusions indicate that much of the junk DNS is not necessary. The carnivorous bladderwort plant has a particularly lean genome, with very little junk DNA, and it functions just fine.

You have the classic genetic position, which says that around 2% of DNA is functional.
You have the ENCODE result which was summed up to mean that 80% of the DNA has some sort of function.
You have the middlin’ view, which says that the 2% is too low, but the 80% is way too high.
And then you have the practical view, which says that we really don’t have any friggin’ idea.

My WAG is that the 80% is probably closer to the mark than the mainstream genetics community is prepared to accept. Even if we do not need the DNA now that does not mean that it has no function. For instance, some DNA might need a particular circumstance to pop-up to become activated.

The idea of simply removing DNA to see at what point you no longer have a functioning organism is misguided. At least, it’s misguided if you then call the removed DNA “Junk.” It misses possible role duplication. It misses possible regulatory pathways. It misses special cases.

Imagine doing the same thing with a car. We will remove stuff until it no longer works, and call the removed stuff “Junk.” Remove the seats. Ok, the car still run (and is kind of uncomfortable to drive), but I’ll give in on this one. What about removing the wipers? The car still runs…until you have a heavy downpour: it might run, but not for long. So if you did all your experiments in sunny weather, you might throw the wipers into the “Junk” category.

And of course, we are just really starting to understand how RNA, DNA, and exo-DNA (or whatever all the single cells organisms and viruses that we depend on to function correctly are called) all interact with each other. So who knows? Maybe some of that “Junk” DNA might help us get along with a wider set of one-celled helpers.

I’m not sure this would be the universally accepted definition of “gene”. It’s not the dictionary definition or the common-use definition, but if you read that humans have 20,000 or 30,000 genes then that would be referring to protein-coding sequences.

I second with what the others have been saying: “junk DNA” is a fading concept because research is turning up more and more mechanisms up by which DNA affects the phenotype. My sense is that we will never be able to draw a line between functional DNA and junk. A single base change may cause you to die in the womb or it may sightly alter the shape of your earlobe.

A lot of this sort of debate, IMO, is really just semantics. What exactly do you mean by “function”, “expression”, or “gene”?

Here’s one such example. The definition of “gene” as something that codes for proteins has been the dominant definition since the discovery of the central dogma (DNA is transcribed as RNA which is translated into proteins). Over the last couple of decades, however, molecular biology has discovered multitudes of RNA molecules that do not code proteins, but still have very interesting biological functions. So many biologists now include “DNA that encodes functional RNAs” in their working definition of “gene”.

As a practical matter, it’s a lot easier to refer to the “microRNA-43 gene” than to invent a new term for “the bit of DNA that is transcribed to produce microRNA-43 precursor RNA but is not a gene because it doesn’t encode a protein”.

Exactly. Especially because some “junk” may only turn on in very particular circumstances and suddenly it’s functional.