Can DNA tests be done digitally?

Talking about those tests to match crime scene evidence and paternity/family ties tests.

I mean, can we convert the DNA information into a number and store it in a computer? Can we compare samples from different times or in different continents without transporting them to the same lab? How do they handle incomplete/damaged DNA? Or is DNA testing like film photography, where everything has to be present?

I was thinking about this because I was wondering could you do a test if both people (for paternity tests) couldn’t be present at the same time. I suspect it’s possible, otherwise how would they test old evidence?

I don’t know much about the legal side of things, but I worked in a genetics lab for a few years.

Yes, it can be stored digitally. But there’s two different processes.

The most typical one just cuts the DNA up in a standard way and measures the lengths of the fragments as a sort of fingerprint. This is cheaper and easier, but also has results that aren’t 100% unique. And also, there’s different ways to cut up the DNA, so if you processed two samples with a different type of kit, you couldn’t compare the results. I suspect that by now there’s probably a national standard, but its possible older samples would have to be reprocessed and also of course possible they weren’t digitized.

Then there’s sequencing, which actually determines every base pair in the DNA, so it’s 100% unique except for twins, but a lot more expensive. It’s slowly getting cheaper though, I think you can do it for about $1000 now. I don’t think they bother with this unless there is a matter of contention at trial, or if multiple suspects are related to each other.

I don’t think damaged DNA is usually a problem. We can amplify DNA millions of times from a single good strand, and most samples will have many many strands to work from.

Yes, absolutely, paternity tests do not have to be conducted at the same time or location.

That’s how DNA-testing is done when matching crime scene evidence and other samples. They test a small number of specific locations in our genome that is known to have high variability and store the results digitally to compare with specific other results or to search through databases of such results.

Some understanding of DNA evidence is necessary. Each typical human has 23 pairs of chromosomes, each of which can be conceived of in essence as a giant lengthy molecule of the polymer DNA. (This is not strictly right, but bear with me).

At points on the DNA which are capable of being identified by technical means are the equivalent of addresses (known as loci) where genes (shorter subsequences on the longer molecule) can be found. DNA analysis is not conducted on the whole genome, for forensic purposes. Rather, a number of loci are identified which fit specific criteria. The current standard number is 16, IIRC.

The basic criteria are:

First, they are non-coding sections, so that they are not subject to evolutionary pressure. One benefit of this is that there is wider variation found at the locus, meaning greater discrimination power. Another benefit is that risk of covariation among different loci is reduced (more about this later).

Second, they are characterised by “Short Tandem Repeats”, which means that at that particular locus, a simple unit of DNA (say, the series AGG) is repeated multiple times, the number of different multiples actually found in practice varying according to the contingent naturer of our species’ biological history. For example, analysis may show that the only variants that exist are 3, 5, 9, 11, and 15 units long. Each person has two different versions of our example gene, one from each parent. Thus, when the analysis is done, a person will typically have two different expressions of the gene from the list of possibles, for example, they might have inherited a gene with 3 repeating units on one chromosome from Mum and 11 repeating units from the paired chromosome from Dad at the same locus. Such a person is called a 3, 11 at that locus. This process is simultaneously done at 15 other loci, at each of which it will be possible to find another pair of STR numbers, such as 4,9 or whatever (each locus has a different range of possible numbers of repeats).

Assembling all 16 pairs of numbers then provides a profile for an individual. Population studies are then conducted to see how often the pairing of 3,11 appears at our first example locus. It might occur in 10% of the population. One then moves to the the second example, to see how frequently 4,9 appears, which might be in 9% of the population. The various loci are independent statistically, or near as dammit -they were chosen that way. That is the value of the lack of covariation I referred to above. As a result, one can apply the multiplication rule across all 16 loci, which typically results in a determination that the chance of the same profile appearing in another unrelated person is one in bazillions.

There are some complications in all this, and parental tests are slightly different, but this is sufficient for present purposes.

You can see that it is entirely possible for the DNA of a suspect and the DNA from a crime scene in principle to be analysed by completely different labs. And it is relatively easy to reduce a profile to a series of computer entries and load any number of profiles into a database to allow cold comparisons to be done. And indeed, this is done at a level that allows comparisons from different labs across the country and internationally.

In practice, a cold hit is rechecked by referring back to the original reference samples very carefully, but the answer to your question is that DNA profiles are computerised, and comparisons are done by computer all the time, subject to close checking.

I should add that while samples from the suspect are typically perfect in the sense that you get a complete profile at all 16 loci, field samples from crime scenes are often not. They might be small or degraded, so that even the massive amplification process that is part of the analysis cannot reliably recover good data from each locus. And sometimes field samples contain mixtures, which messes with the mathematics a bit. In those cases, statistical tools are applied to the information that is there to give a reduced random match probability.

Yes, you can do paternity tests, forensic tests, and such without samples from the various sources being present at the same place and time.

For forensic criminal matters a sample can be tested and information about its profile entered into the CODIS database. This can allow for matching to a profile already on record or one that is entered later.

A particularly powerful tool for such testing is STR (Short Tandem Repeat) Analysis. It lends itself well to converting the information to a number for information storage and comparison.

Here’s an example of how that might work, if you are interested. It is long and possibly boring so I’ll spoiler it out.

STRs are areas of the DNA were a particular short sequence of bases are repeated several times. So suppose in one particular stretch of DNA on chromosome #5 there is an area where the four base pair sequence CCAG is repeated several times in a stretch of DNA outside of a coding gene. One chromosome may have 3 copies. Another chromosome may have 60 copies, or any number in between.

Suppose that a sample collected from from a child in a paternity test is tested and shown to have chromosome #5’s with 7 and 16 copies of this CCAG repeat at this particular location. That means that the child providing the sample has two chromosome #5’s, one from each parent, and one has 7 repeats and the other 16. The father must have one chromosome #5 with either 7 or 16 repeats.

If I test a suspected father and find he has 12 and 41 copies of that repeat on his chromosomes then he is not the father.

If I then test a sample from another suspected father and find he has 7 and 22 repeats. He might be the father.

I then try to calculate the chance that a match is all just a big coincidence. Suppose 30% of all chromosome have 7 copies of this repeat. A match at this one loci would hardly be conclusive. But these loci are chosen that are highly variable so normally the odds of any given number of repeats is somewhat low, says less than 5%.

And I would continue testing DNA from the child and suspected father at other similar stretches of DNA to see if he has at least one chromosome with the same number of repeats at a loci as the child.

Suppose test results looked a bit like this:



Chromosome #     Child # repeats     Susp father # repeats
5                   7    16                    7   22
7                   8    23                   11   23 
15                 18    31                    2   18
21                 10    14                   10   18
22                  4    26                   15   26

Suppose that the frequency distributions are like this:
Chr #5 3% chance of 7 repeats
Chr #7 6% chance of 23 repeats
Chr #15 2% chance of 18 repeats
Chr #21 12% chance of 10 repeats
Chr #22 4% chance of 26 repeats

To get the overall odds of matching all five of these sites you would multiply the frequencies, so (.03 * .06 * .02 * .12 * .04) = .0000001728 or about 1 in 5.78 million chance this is just a random match. Test more loci and you can be even more certain.

The exact number of loci tested may vary. In criminal matters 13 loci are commonly tested. Paternity testing may test fewer.

Oh, because the DNA sequencing I’ve seen looks like The Ultimate Self-Portrait: DNA Portrait from ThinkGeek - The Gadgeteer so I thought they print those out on photo negatives and try and align them in the light. So paternity testing is quite similar to criminal testing?

Even when they used gel elecrophoresis plates like that, they didn’t hold them up to the light. The way gels worked was that one end of the plate was positive and one was negative. Genes of a particular molecular weight would migrate across the gel together under the influence of the electric charge and stop together after a predetermined time when the charge on the gel was disconnected. You would then read the shadow where the clump of genetic DNA came to a stop. In order to be able to read the value of the bit of DNA you were interested in, simultaneously in the same gel known quantities of different molecular weights of DNA were also put on the gel. When the experiment stopped, the position of where the bit you were interested in was reached was compared to the position of the bits of known weight on the same gel, and voila, you could figure out the weight of the bit you wanted.

Now it’s all done with fluorescent dyes in tubes and computers read the results, so you don’t get those sheets anymore. The results document looks like a printout of a graph.

Paternity testing involves the same principles but different statistical tools. Thus, if at one locus Dad is a 3,11 and mum is a 5,9, daughter must be one of the first pair and one of the second pair, perhaps a 3,5. If she is a 7,9, then she cannot be the daughter of the supposed father. But if the possibilities all align so that parenthood is not absolutely excluded in the way the example I just gave illustrates, then in essence the chances of there not being an outsider gene cropping up can be calculated, resulting in a figure that expresses the confidence (usually of a very high order) that paternity is established.

Just chiming in to add that as a general…thing, storing genetic data digitally is done all the time. I’d say almost exclusively. I’ve been sequencing stuff like crazy lately, and all the results are nestling happily on my hard drive. There are huge numbers of terabytes of genetic sequencing data of various types freely available online in various databases.

Here’s an example of what the data looks like from STR analysis:

That’s actually restriction mapping, not sequencing. Where DNA is cut into different size fragments. What’s probably missing from that picture is that typically there is also a standard DNA sample run on the same gel, with known standard fragment sizes, which acts as a “ruler”. All the computer has to do is use that as a reference to put a numerical value on the test sample lines.

Thanks, one more thing:

I assume most DNA evidence is stored as STR numbers. What if one day, we decide on another analysis? Or gene sequencing becomes common? Do we still have “raw” data that we can analyse? e.g. for unsolved crimes, do they still keep the evidence, or just get the STR data and throw the evidence away?

I’m sorry, but when I read the title the first thing that hit me was the idea of doing a DNA test by sticking your finger up someone’s arse.:smack:

The physical evidence is often retained* and only disposed of in accordance with appropriate procedures. Law enforcement agencies should have a policy governing evidence retention. Some jurisdictions have huge backlogs of untested rape evidence kits still sitting in storage waiting the funding to be able to run the tests.

BUT… sometimes the sample collected from a crime scene is so small that the original sample is effectively destroyed in the analysis process. In such cases with DNA evidence a process called PCR amplification is done to duplicate the tiny amount of DNA from the original sample prior to other testing. Some of this duplicated DNA can then be stored.

  • The Innocence Project has assisted in exonerating many men who were wrongfully convicted. In many of their cases the most convincing proof of innocence they can present is DNA analysis from testing long stored evidence. As of this writing they have assisted in exonerating 309 people who were wrongly convicted.

They keep the original material as long as possible, bearing in mind the tests are to some degree destructive. Improvements in technique have meant that an unsuccessful test one year might be followed b y a successful test some years later. And keeping the original means you can go back to it if someone later tries to say there was an error in the analysis.

If you have the full sequence, you can recover any other measurement you’d like from that.