This technique is used to magnify DNA and is often used in forensics when trace amount of blood is found. I understand how the DNA is magnified but I have a few questions about how DNA is used to ID a suspect.
This technique copies only a small segment of DNA. Knowing that, as individuals, we have 99.9% of the same DNA, wouldn’t it be very likely that this segment is the same for a large part of the population?
Is there a generic primer that is used or are specific primers used in specific situations?
How exactly is PCR DNA compared to a suspect DNA? Is it run through a gel and separated? I’ve always wanted to know why snipping 3 billion base pairs produces such distinct bands and not a blur of DNA.
Not a scientist, but I do have a vague understanding of this.
The forensic lab gets a sample from the suspect, and designs a primer that will uniquely identify that specific guy. There are parts in the DNA that have great variability among people so its not hard. They then mix the primer and the sample from the crime scene and PCR it. Once that is done they detect the “copy numbers”, aka how many copies of the DNA got created. Usually by shining a special light on it. If the suspect primer and the crime sample matched, there will be millions of copies of his DNA created and it will be very easy to spot under light that makes DNA glow. If the PCR output is glowing brightly - chances are good the sample came from the suspect. If not, the sample might not be from him, or something got messed up in the process.
Certain places in the genome have a high degree of variability, with varying number of repeats of short sequences of DNA. These loci may have dozens of different possibilities for the number of short tandem repeats.
One means of forensic testing of DNA is to use primers that are designed to amplify these loci. No need to customize the primer to the patient. The DNA is copied using PCR resulting in a great amplification of the DNA of just these STR loci.
You can then run the resulting DNA through a gel electrophoresis which separates DNA fragments by size. This allows the examiner to determine how many repeats the tested sample has.
Repeat this whole process for several different STR loci and the examiner can see how many repeats the sample has at each loci. If a sample from a crime scene has the same number of repeats at each of about 14 different loci as a known sample taken from a suspect then a match is inferred.
Thanks, iggy, that makes much more sense. I suspected that sequencing the suspects DNA to design a primer sounded a bit too expensive even today, but didn’t know how they’d get around it.
Process for the crime scene sample and a sample taken from a suspect are essentially the same with one important proviso… crime scene samples may be adultered, a mixture of DNA from more than one source. Imagine a sample from a sexual assault case where a rapist’s DNA is mixed with the DNA of the victim and possibly another consensual partner. Choosing STR testing is a good choice since those areas do not mutate or degrade as fast.
The number of repeats is just like it sounds. In a particular loci in the genome a short sequence of base pairs repeats. We are testing to see how many times the sequence repeats. The more repeats, the heavier the DNA fragment, the shorter distance it moves in a gel electrophoresis.
Example: on chromosome 2 at the loci named TPOX there are repeats of the four base pair -AATG- sequence. Some chromosomes have say 5 repeats, some have 6, some have 22, etc…
An individual has two chromosome 2’s. Suppose a suspect is tested and he has 4 repeats on one of his chromosome 2, and 18 copies on the other chromosome 2.
If the crime scene sample is tested and it has 6 and 18 repeats then it is not a match. The suspect is thus excluded with essentially 100% certainty. The crime scene sample needs to match both chromosomes.
But if the crime scene sample also has 4 and 18 repeats like the suspect then the question arises as to what the chances are that the match was just pure chance. So if only 5% of the population would also match 4 and 18 repeats then by testing this one loci we have narrowed the odds of a false match to 5%.
Then we test another loci. And if it does not match then the suspect is excluded. If the samples do match then we start multiplying probabilities. So it was a 5% chance of a random match at the first loci. And supposing a 5% chance of a random match at the second loci, then the odds of a random person matching both sites is .05 * .05 = .0025 = 0.25%.
Keep testing more loci. The standard for the CODIS database is 13 loci. If a suspect’s sample matches a crime scene sample then you have multiplied the probabilities of a random match on all 13 loci. The odds of a false match become one in billions.
That’s great. Thanks.
So each of my homologous chromosomes may have a different number of repeats because they came from different parents?
To test a different loci, do they just use a different primer?
Iggy has covered the principles, but just to expand on his last point:
As stated, the loci that are used for traditional DNA profiling contain STR (short tandem repeats). The number of repeats at each STR locus is highly polymorphic (variable) within the population, because it’s fairly easy for the DNA copying mechanism (in the germline, not the lab) to make an error when you have these short repeated sequences.
At each locus, there may only be perhaps ~10 alleles prevalent in the population. The power comes from the fact that loci are independent. So, if 20 loci are analyzed, the probability that two unrelated individuals share the same allele at all loci is of the order of 1/10^20.
As **Riemann **noted, the chosen loci are highly variable. Often they are from non-coding (previously considered “junk”) sections of DNA. The example I used of the TPOX loci on chromosome 2 is one that comes from a non-coding intron in a gene that makes the thryoid perixoidase enzyme (TPO).
Since the STR is in a non-coding section of DNA, natural selection does not tend to eliminate variants, at least not as readily as a natural selection might tend to eliminate variants in a coding section. This means that a new variant might be more likely to survive and get duplicated and passed on to future generations.
The net result is a lot of different possible alleles out there in the population. This is true even though the STR loci chosen tend to actually have a slightly lower chance of mutating in any given generation as compared to a randomly chosen loci. So it is *theoretically *possible that a mutation might occur in any given generation in the tested loci, but it is quite unlikely.
The variability arises in the population not so much because these loci mutate a lot but rather because the rare mutations tend to survive and not be eliminated from the population.
This gives us confidence to use the same methodology in a paternity test. So suppose the tested loci include 2 and 12 repeats in the known mother and 2 and 8 repeats in the tested child. The child got a chromosome with 2 repeats from the mother so we know the child must have received the chromosome with 8 repeats from the father. If the suspected father does not have a chromosome with 8 repeats then we eliminate the possibility that he is the genetic father.
We then keep testing different loci and if the putative father has the same number of repeats at each loci as the child has on the chromosome he must have inherited from his father then paternity can be inferred. Random mutations causing a false positive or false negative are so rare as to generally be considered unimportant.
I agree with everything else that you said here, but I believe the average STR mutation rate is far higher than other kinds of mutation. It’s of the order of 10^-3 per generation, although highly variable among loci. http://www.cell.com/ajhg/abstract/S0002-9297(07)62782-7?cc=y=
That’s great. I learned a lot from this and also watched some Youtube videos.
I do wonder how these STR’s are copied.
In my vial, I place the DNA, the primer, nucleotides and polymerase. The reaction starts at the primer and moves along the DNA, so the fluorescent primer becomes part of the final strand. But how does the polymerase know when to stop and only include the STR and nothing else?
There is nothing unusual about the PCR amplification of STR regions, they proceed just like any other PCR. Unless the PCR is multiplex (simultaneous amplification of multiple loci), primers are not usually fluorescent. PCR-amplified fragments are routinely visualized after the reaction is complete by the addition of ethidium bromide, a fluorescent dye that binds nonspecifically to double-stranded DNA.
For the first round of amplification, the genomic DNA is the template. As you realize, for this first round of synthesis the polymerase does not “know when to stop”, so it produces a first fragment that starts at (say) the forward primer and extends an indeterminate distance beyond the binding site of the reverse primer until it falls off. However, when the second round of synthesis uses this first-round single-stranded fragment at as a template, it proceeds in the reverse direction. The reverse primer binds to its target site, and synthesis proceeds in the reverse direction toward the binding site of the forward primer. And that’s obviously as far as it can go, because the fragment produced in the first round started at the forward primer binding site.
So, throughout the reaction, the genomic DNA does act as a template that keeps producing fragments of indeterminate length at a linear rate. But these are soon vastly outnumbered by fragments that begin and end at the forward and reverse primer binding sites, and that increase in number geometrically with each cycle of synthesis.
Heat the sample and the polymerase enzyme and/or primer loses its grip on the strand it is copying. But that is only an issue for the first copying.
You heat the mixture to unwind the double stranded DNA to start to make it two single strands of DNA. You add two primers to the reaction, to copy the denatured single stranded DNA in each direction.
The products of the first copying will be longer than you want eventually, but shorter than the initial source. Remember, these products start where your primer latched on (more or less).
Cycle the PCR and start again. You still have the long original strands but it is mixed with short strands, the products of the first round of copying.
Since templates in the second and succeeding rounds are derived from earlier copying which started at a defined primer then the products over many cycles exponentially converge on only the target sequence.
Figure 6.9 from this National Institutes of Health webpage diagrams what happens in the initial copying and subsequent copying.
Sure there are a few longer strands in the overall mix, but such a small proportion as to not casue problems reading the results.