What percentages of introns are present in link DNA in chromatin? Is it possible they have a structural function, so that the introns can stay in the cored of the nucleosome?
I’m an ecologist, this isn’t my area.
What percentages of introns are present in link DNA in chromatin? Is it possible they have a structural function, so that the introns can stay in the cored of the nucleosome?
I’m an ecologist, this isn’t my area.
I respectfully beg to disagree. According to Dermitzakis et al (Science, Vol. 302, Issue 5647, 1033-1035, November 7, 2003
) the higly conserved nature of non-coding DNA regions would seem to imply that they do indeed serve a purpose. Surprisingly, some of these non-coding regions are more highly conserved across species than are genes which actually code for proteins. I’d link to the article, but it’s subscription based. A brief quote, however.
In the scientific community, there are many schools of thought on the relevance of junk DNA. It may very well be that much of it is just that; junk, but it may also turn out that much of it is vital. It’s too soon to unequivocally state that it serves no purpose, however.
Amberlei
Dermitzakis, eh? He’s on my list of people to contact for collaboration. This is nearly exactly what my PhD is on. I’ve done a lot of reading on this field, trust me.
He is talking about conserved non-genic sequences (CNGs). Most non-genic sequences are not conserved, which means that they have changed significantly during evolution. This study analyzes around 300 of these CNGs, narrowed down from 3000+ found by comparing the sequence of human chromosome 21 to similar regions in mouse (his previous Nature paper). This sounds like a lot, but considering that each one is around 100 bp, we are talking about only a few percent of chromosome 21’s sequence. His previous figure whittles 3000 down to a lower limit of 700+ using a number of different analyses. Some of these are noise – just because things evolve doesn’t necessarily mean they have to evolve, even with no functional constraint, and he reduces this down to around 700 bits. Anyway, this study looks closely at 220 of them, and he finds that about 0.3% to 1% of the noncoding DNA of chromosome 21 consists of new, highly conserved bits found in all mammals. It will take me a while to crunch the numbers, but I can’t imagine much more than a few percent of noncoding sequence is conserved, and candidates for some kind of function.
edwino, it’s pretty cool to talk to an expert on stuff like this! Here’s my question to you, then: what conclusions can we draw from junk DNA being, well, junk?
Sorry, my last post was mangled and unintelligible. Let me clarify.
Dermitzakis, Nature 2002, compared human chromosome 21 sequence to mouse syntenic sequence (syntenic means it is homologous by gene order, it doesn’t all need to be on the same chromosome). He used a simple alignment algorithm based on the gold-standard BLAST routine to find 3491 blocks greater or equal to 100 base pairs and greater to or equal to 70% conservation. 1,229 correspond to exons (coding regions) and pseudogenes (which he doesn’t care about). The remaining 2262 are not internally similar (they are not repetitive) and are single copy throughout the genome. Some are unknown parts of known genes, some are unknown genes, some are noncoding RNAs, and the rest are what we are interested in – the conserved non-genic regions with unknown functions. They run some more programs and determine that around 40% of these are not what they want (parts of genes or noncoding RNAs). To get rid of random noise, they look in rabbit sequence at a sampling of the other 60%, and find that around 50% of these are conserved there (another data point about as evolutionarily distant from humans). So he presumes around 30% are the conserved non-genic regions with unknown function. This is around 700 bits, for a fairly conservative estimate based on his assumptions.
Next, in this Science paper, he takes 191 of those 700 and he tests their conservation in 14 different mammals. 29% of them are conserved throughout mammals. So he uses this to come to the figure of 0.3% to 1% of the total genome consists of these bits. This is an important finding but it doesn’t say anything about the other 39% of the genome consisting of our other “junk DNA” (40% of the genome is “junk,” now we have identified 1% that appears to have strong evolutionary conservation). Remember that he masked all pseudogenes to start with as well, so he doesn’t even factor in this large class of sequence, so this drops his estimate. Granted, this is very conservative and he is missing a bunch. But I don’t think he is missing 39 times the amount that he found.
Ilsa_Lund
I am not an expert on chromatin structure. I know my basic stuff, but I don’t know a lot about the specifics of nucleosomal organization. I can tell you that there appears to be no motifs, degenerate or otherwise, consistent in introns. This is what we would expect to find if there were DNA binding elements in the introns. This is what we see in insulators, enhancers, repressors, promoters, and other sites which bind proteins. Introns have a handful (less than 10) total conserved nucleotides which distinguish the splice acceptor, donor, and junction sites. Intron size can often be upwards of ten kilobases in humans. These apparently are sufficient for correct splicing, and therefore localization to the nucleosome (as I have read it).
LaurAnge
One can make many suppositions about the genome structure like this, but unfortunately we don’t have good ways of testing these right now. You could call them “selfish” – old integrated viruses (proviruses) along for the ride. Dawkins proposes that the purpose of life (and DNA and genes and everything) is to replicate itself. In this respect, those retroviruses and other pieces of DNA have been quite successful. One could also suppose a nebulous structural role for all of this stuff, but given that there are plenty of eukaryotes with compact genomes that are quite successful, that this DNA can tolerate massive changes, and there is no apparent evolutionary constraint on it, that becomes doubtful. One could suppose that on some large scale, lots of repetitive elements in the genome facilitates large scale genome change and domain reorganization to facilitate more rapid evolution: genome shuffling happens between homologous bits, and having 1.4 million Alu repeats (only one of many such elements) means that there are homologous bits nearly everywhere you look. But then one has to take into account the presumably huge negative hit you take for the much more common genome rearrangement and domain reorganization we see in cancer. I could suppose lots of other things, but the evidence is sparse and as I say, the experiments are difficult to design.
Let me also just clarify (sorry, been a long weekend). We are looking for CNGs because, right now, there is no good way to find regulatory regions, the bits of DNA which control gene expression. Conservation appears to be a pretty good way of doing it. How good varies between systems, between species selected for conservations, and between methodologies.
Introns do allow for different splice varients of proteins to be produced.
I’ve also wondered if the extra intron DNA would help protect against radiation damage, carcinogens, etc, and could even be packed on the outside of the chromosome to protect important genes inside. The more DNA you’ve got, the less chance that carcinogen ends up harming a good gene.
On point 2, this appears to be a good site, with useful references.
It summarises a whole range of hypothesises on the purpose of junk DNA. Amongst which, it refers to papers supporting (and refuting) the proposal that intron DNA might be similar to natural languages since it follows statistical/cryptographical analysis such as Zipfs law.
Also refutes theory that junk absorbs harmful chemicals.
Looks like we have to keep up the computational analysis.
ghady – to sum up answer #1: In a theoretical sense, this could work, although not if ‘imprinted’ genes were on the X chromosome, (though none are currently known). However, there are doubts about whether the sperm or egg would develop normally with the aberrant chromosomes. Remember, after meiosis, the divided cells still need to develop into eggs or sperm, respectively.
#2: Some DNA seems to indeed be junk; some seems that it may have a purpose we don’t understand yet; some has a structural purpose; some provides information about when to express other parts of the DNA sequence; and some is expressed and codes for proteins.
My whole work is devoted entirely to “non-coding” DNA. We are not eubacteria. Our intergenic regions and introns seem to do a great deal of work. Track down the following article for more information:
Levine M. & Tjian R. 2003. Nature. 424:147-151.
Dogface
I know that review. Is that your group? That paper estimates only 5-10% of the genome is devoted to regulatory regions. Since around 2% of the human genome is protein coding, and let’s say another 5% is repetitive structural DNA, that still leaves conservatively 88% of the genome with no proposed function.
I’m not saying there is nothing in there, I’m just saying that the vast majority of it has no apparent function.
Oh, I got it wrong in my previous post. 40% of the human genome is that one specific Alu repeat. I don’t deal with human sequence usually.
I don’t think ‘refute’ is the correct term here - this author cites Hsu, TS “Bioessays”, which does not turn up on PubMed. If you can find this Bioessay please let me know, I’d like to see how and why TS Hsu disproves the “junk DNA absorbs harmful chemicals” theory. The link you provide only uses this one source to prove his statement that junk DNA offers no protection.
If 97% of the genome is non-coding, how can it not be protective?
lemme see if I have this non-coding DNA down.
(1) DNA (of a certain creature)(the genotype) does not exist in order to code for anything; however,
(1a) if a certain array of DNA (“of the genotype”) does code for a useful protein / structure, then it it likely to be reproduced.
(2) one of the ways that new niches* come to be explored is through random mutations of DNA
(2b) random DNA mutations have unintended, but sometimes related, effects
(2c) a lot of the time, the effects of random mutations mentioned in (2b) make a resultant organism die, however;
(2d) on the off chance that the creature with one specific randomly mutated gene (and maybe many mutated genes, but it doesn’t really matter, because even if only one gene is mutated, there’s still a chance (which chance increases the more genes are mutated) that the resulting proteins will react with each other in a totally different way than they did in the progenator) survives, the effects (“the effects of that one changed gene”) probably will not affect only one area of the genome
(3) if there is a change in the DNA (a.k.a. the genome) (a.k.a. the coding into or reading of an important protein (or other nonspecified cellular machinery)), then said change will probably impact nonuseful genetic material, as well as the useful.
therefore
(4) it is likely that random genetic change will cause crap to accrue in the genome…
{a bad example- the crow Charlie had boring parents, who collected nothing. However, Charlie got some weird ass genes, and as a result collected anything shiny he saw. As a result, he because the richest crow who ever lived (because of all the serious coinage he collected)- however, he also collected a lot of tassles and mylar balloon shreds and Doublemint wrappers, which are crap.
but his kids (oops- assuming crows use money, and (oops- double shitters!) assuming crows pass money down to their heirs) became rich, and so it looks like the collection of shinies is somehow responsible for their success.
but really, the collection of shinies is an unintended side effect of the urge to collect coins.}
that’s not really clear, huh.
jb
*very vague, I know- but there are a crapload of niches, and to explore even one (fully) would entail a shitter more knowledge than I have got, even shakily
dammit!
must work on parentheses…
Bob55
Bioessays is a reputable review journal. The cite given in that site appears to be to this article, which is by TC Hsu not TS Hsu. I’m not exactly sure how using an 11 year old autobiographical historical article proves this genome size/mutation point, though.
My biggest issue for the whole “absorption” role is that it is not a scientific hypothesis. I can’t think up any experiments that would test this. Animals with larger genomes also have more damage sensors and repair machinery. I can’t think of an easy way to assay mutations while only changing genome size. Therefore, IMHO it is just a guess and doesn’t even rise to the level of hypothesis. I could be wrong though, I haven’t thought about it very carefully.
jb_farley
The genome doesn’t usually work like that. Mutations sometimes have many effects, but sometimes they are incredibly specific. Mutations can affect only one region of the genome. Also, evolution doesn’t really work that way either. We usually don’t think about one person with one advantageous mutation taking over a whole population. We usually think about a diverse population, with the advantageous mutation at a some frequency. Moving into a new niche now selects, and enriches for that allele. Imagine a clan or tribe of a few thousand animals (implying loose relation) moving into a warmer climate. Let’s say 10% carry an allele which allows them to survive and therefore breed 20% better in the warmer climate. One can calculate how prevalent the advantageous allele will be in a number of generations (this is actually a very strong selection in evolutionary terms). That’s a more usual route of evolution.
Inheriting an advantageous allele by no means has to affect any other part of the genome apart from the exact gene changed. Not to get too deep into it here, but this is exactly what we see in something like race. Just because some people have advantageous alleles for warm, sunny climes (dark skin, broad nose, little body hair) doesn’t mean that they have other ones unrelated to climate (the ability to rap or to dance, etc). In fact, we see the greatest genetic diversity amongst people with these advantageous alleles. While new alleles can result from duplications and rearrangements, this is not the predominant way in which our genomes expand.
I agree it is a guess, but a guess that in my opinion has not been disproven. So while it doesn’t have the facts to grant it hypothesis status, it could still have some merit.
Lets say a carcinogen makes it all the way to the nucleus and intercalates a strand of DNA. Statistically speaking it has only a 3% chance of landing in a coding gene. Without introns that carcinogen would have a 100% chance of causing major problems. So while it may never be proven that introns offer protection, the statistics do support my guess.
Bob55 - Is there that much intercalating natural carcinogens in the environment that would warrant diluting the DNA with 30 times as much inactive genetic material? Benzo[a]pyrenes etc from fires? Maybe it offers protection from oxidising carcinogens - but doesnt glutothione help out there?
Following on what jb_farley said:
Can an intron be mutated to become coding-DNA and vice versa?
antechinus
Yes and yes.
First, let me clarify something. All noncoding DNA is not introns. Introns are defined as specific parts of the genome which are transcribed (made into messenger RNA or mRNA) and later spliced out during mRNA maturation. This process, logically, is called splicing. Most of the genome is never transcribed. These parts include regulatory regions (areas which tell genes when, where, and at what level to be transcribed), structural components like telomeres and centromeres, and other nonfunctional DNA. All estimates place this last category of nonfunctional DNA at well over 90% of the human genome.
Two splice sites bracket the intron, and there is another important site found in the middle of the intron. Many mutations kill these splice sites, and mutations which create new splice sites (either in the intron or exon) are not unusual.
Splice failures are pretty common in disease. This leads a functional beginning of the protein followed by a lot of gibberish where the mRNA was not spliced. This can cause a variety of different things to go on, but it usually just kills the protein’s activity. Sometimes, if a mRNA is not properly made, it will not be translated (made into protein) at all, so this just looks like a complete loss of function.
Another thing which happens regularly is the formation of a new splice site. This is usually a mutation in an intron which forms a new splice site, leading to part of an intron being spliced into an mRNA rather than left out. This also can have a number of effects. The protein usually starts normally and can terminate in gibberish as above. Occasionally, it will read into the next exon, so it leads to an inclusion of a bit of gibberish in the middle of a protein. New splice sites in the exon can be formed as well. This can lead to part of the protein being deleted by part of an exon becoming an intron. It can lead to aberrant splicing around other exons, or sometimes just protein termination.
Edwino Thanks for the clarification on introns. Splice sites sound sorta like tags in computing code eg <% in asp
So if coding DNA can become non-coding, then during evolution why cant DNA that collects into a genome become non-coding (through mutation) yet its absence selected for (through inheritance), and collect like historical detritus*.
Can junk be just the results of bad transcriptions over time (eg repeat sequences like stutters in on a scrtched CD) that survived (as above) yet were removed from use with splice sites?
antechinus
Programming tags as seen by the compiler are a useful way to imagine splicing. It may be misleading, though, because tags (hopefully) contain some kind of useful information (albeit not to the compiler), while introns don’t.
DNA can collect as you posit. For instance, primates can’t synthesize their own vitamin C. This is because at sometime during primate evolution, one of the enzymes responsible for vitamin C was mutated and made nonfunctional. That gene is still there. Obviously, dietary vitamin C was sufficient enough through our evolution that primates no longer gained significant advantage by synthesizing it themselves. This is called a pseudogene. A significant proportion of that nonfunctional DNA are pseudogenes. I don’t have an exact figure here, but I will try to find it.
The thing is that once a gene becomes nonfunctional, it loses what we call evolutionary constraint. This principle of neo-Darwinism states that if something is needed (confers fitness), it won’t change much in evolution. So pseudogenes mutate at the same frequency as the rest of nonfunctional DNA, so they become pretty unrecognizable with time. There are specific base pairs which we know are not needed in evolution (these are in coding regions, at degenerate or unrequired sites in the genetic code called wobble bases), and we can measure a rate of change. Introns and pseudogenes change at this same rate.
Is it possible for a stretch of a functional pseudogene to be spliced so that it is partially or even fully reactivated? Sure, in theory, but I don’t know of any examples. Functional genes are transposed and cross-spliced quite frequently, which may lead to advantages through “domain shuffling” – new proteins are created by fusions of old proteins. But I don’t know of this happening with pseudogenes.
A lot of “junk” DNA can be classified as pseudogenic, but the catch is that they are not human pseudogenes. There are shells of old retroviruses scattered throughout the human genome, like the infamous Alu repeat, which is present in 1.4 million copies. These contain viral pseudogenes – coat proteins, viral polymerases, integrases, proteases, other things necessary for viral lifecycle – that are no longer functional. Of course, there are exceptions. Retroviral-like intracellular particles have been noted by pathologists and cell biologists for years. Whether they are the result of gene expression from some of these viral genes (they wouldn’t be pseudogenes because they would be working) remains to be shown. There is one case which I know of where humans actually use a viral protein. Human placental cells fuse to form multicellular syncytia using a protein called syncytin. This protein is derived from a recent human retrovirus infection, and is the old env gene of that retrovirus.
The technology we used to sequence the genome is not conducive to the analysis of repeats like the Alu repeats and other high-copy elements. Each read is around 500-800 base pairs, and depends on unique sequence within the read to accurately “assemble” the read onto a “scaffold” of other reads laid out on a chromosome. Any stretch of repeats longer than that cannot be assembled, because all repeat DNA obviously looks pretty similar. So while we have sequenced the genome, what we really mean is we have sequenced the non-repetitive DNA of the genome that can be assembled onto chromosome scaffolds. So we are limited by current technology as to how much of this nonfunctional DNA we can actually sequence usefully. You may say that this limits our knowledge, but to me 20 kilobases of simple repeat sequence which appears to be freely evolving is remarkably uninteresting.