Police use info from genealogy site to ID the Golden State Killer suspect. Does that bother you?

False positive rates are tricky to determine, because there are multiple types. On the one hand, there’s the possibility that there’s someone else out there whose DNA really is similar enough to yours that it would look the same to within the margin of error of whatever test was used. That’s fairly easy to determine, and is typically very long odds. Even if the samples are degraded and incomplete, the odds will be less long, but they’ll still be fairly easy to calculate, if the expert witnesses are honest, and so you can tell the jury “1 in 10 million” instead of “1 in 100 billion”, or whatever, and let them decide whether that’s good enough.

But what that won’t tell you is about the possibility of a tech sneezing on something, or mixing up the labels on two tubes, or things like that. I don’t know how common those are, but they’re a heck of a lot more than 1 in 100 billion. And so if an expert witness tells you that the test is that reliable, they’re wrong.

Someone on the first page mentioned that a company would never want to risk the lawsuit, loss of reputation, etc. that would result from fudging data. That might be true, but companies don’t do things; people do. Don’t picture some moustache-twirling fatcat in a boardroom somewhere who’s setting a company policy of fudging results. Picture a clumsy lab tech who’s already dropped too many samples, and whose boss has told him that if he drops any more, he’s fired. And then he drops another one. What does the cost-benefit analysis look like for him? If he reports it according to company procedures, he’s sure to be out of a job, but if he fudges the results, there’s a chance (but far from a certainty) that someone who’s probably not him is going to get some heat for it. I’m sure we’ve all worked with people who would do the wrong thing there. Such people are out there. How common are they? I have no clue, and probably neither do you.

It’s not easy to accurately predict the false positive rate. I don’t even trust professional statisticians to necessarily get it right. (See Sam Wang’s poor model of the 2016 election.) To do it right one needs to know the correlations among the different SNPs, for each locus evaluated. Even then, that’ll just let you estimate the probability of a false match.

And it gets worse; the more DNA profiles are compared, the higher the chance of a false match. Obligatory XKCD: xkcd: Significant . Sweeps through a large database greatly reduce confidence in any matches.

What if they use your brother’s DNA, freely given, as a starting point? That’s what makes DNA different than fingerprints.

do companies like 23&me or ancestry make any meaningful attempt to confirm that you are who you say you are?

or can I use a prepaid visa to buy a test claiming to be Jasmine

Interesting thought, but the prosecutors wouldn’t be able to use the DNA results from a third party company, or your sister, to convict you of a crime. What they could do is what they did in this case - use relatives’ DNA to greatly narrow-down the list of suspects. I heard the story this week on The Daily podcast, and the investigator was able to use the DNA to narrow it down to like two suspects, one of whom they were able to quickly rule out.

But to convict him, they’ll need actual, traditional evidence.

They might discount guilty you from an investigation if you send an unrelated person’s sample in under your own name though.

I don’t have a problem with it nor do I see an easy way to stop it under our system of laws, nor do I think we should try.

I’m talking very specifically about this case and what happened, and I’ll explain why.

  1. The DNA database used was a “public open source one”, basically intended to be the wild west. You upload your DNA and it can be used by anyone for anything. It’s not as big as the DNA databases that paid companies like 23andme and Ancestry.com have, I think it only has around 1m entries, but interestingly if the police are able to go back 200 years in a family tree, 1m entries makes it able to probably find a relative of a very high % of the 330m Americans even just from those 1m entries. The big thing about this database is it makes no guarantees about how the data will be used, people submit to it to make connections in genealogy and out of other reasons of personal interest. While people on your family tree may not like it, I can’t imagine any legal grounds they could have to stop you from putting your DNA on the site.

  2. Using the DNA database they took the suspect DNA and ran a comparison and found his relatives who were on the site. These ended up being really distant cousins, so distant that to map out the full family tree and start to come up with a list of living persons or recently deceased persons (they weren’t sure if the killer was still alive) they had to go back to the early 1800s of the family tree of the distant relative on the DNA database. Then they had to meticulously fill out every branch of the tree to get to a list of modern day persons. This took months, and 1 person 200 years ago who has multiple kids, who then have multiple kids…ends up with a LOT of branches and a lot living persons. My understanding is this process basically created a list of a few thousand people. Now what’s amazing though, is that before this they had so little clue as to who this killer was there was no list at all, meaningfully, to start marking names off of.

With a list of a few thousand you can start marking off huge batches of names FAST:

-Anyone who is female
-Anyone too young or too old to have committed the crimes
-Anyone who had never lived in California

That right away probably got it very small, then from there they probably had to find biographical information on each of the remainders. Bonus points for anyone left who had lived most of their life in Sacramento, the area where the vast majority of the crimes were committed. The real suspect ended up also having a record of working as a police officer in the 1970s, and being fired from that job near the end of the East Area Rapist rape crime spree (for shoplifting dog repellent and a hammer.) Interestingly the serial rapist who had killed a few people in the past (in a pattern that suggests he was willing to kill to escape capture but that his primary motivation was rape) transitions to being an outright serial killer not longer after he’s fired from the police force, I bet the timing of that stuff set off some pretty amazing red lights for the investigators.

  1. None of the actual DNA or records from the DNA database were used to arrest Jospeh James DeAngelo, directly. What they did, is after using a mixture of traditional and modern investigation tools to develop a suspect, they used perfectly constitutional means to collect evidence sufficient that they knew this man had murdered the 10 people who were victims of the “Original Night Stalker” all 10 of those murders he had left semen at the scene–the 2 additional murders committed during his run as the “East Area Rapist” I’m still unclear on how they’ve linked to him, since he shot those two people on the street for unclear reasons and did not commit any sexual assault.

Specifically–to DNA link JJD to the crimes in question, they followed him around until he “discarded” DNA (probably threw away a cup or something from a fast food place or etc, a common way of getting discarded DNA.) Since there’s strong constitutional precedent that something you discard/throw away in a public place has no fourth amendment protections against search and seizure, it’s basically just “too bad” for this guy that he didn’t realize the police were going to dig his McDonald’s cup out of the trash and get his DNA from it to match him to a crime spree he committed 40 years ago.

All that being said, I’d probably feel differently if the police had used a DNA database that was say, medical records (like the people who submit DNA to companies to see if they have any indicators of hereditary diseases or etc.)

that is remarkable detective work.

They didn’t with me. I used an alias.

Yep. Here’s my story.

About 3 years ago my Sister decided that she just must have some information about her family origins, which oddly included both me and our Mother. So there had to be a 23&me saliva sample involved.

A month later she called to say that she was bothered because there was none of my Dad’s family showing up, but all kinds of indication that our Mother was right there in both samples. I tried to explain to her that (at her age of 62 or so then she should have caught the connotation pretty quickly) there was more to it. Then she had to take it further.

Okay, she got an answer, that she was genetically related to an entirely different family, and our Mother was involved as well. So then she had to follow up on the whole thing, and soon found the man who in fact was her Father. She met the man, and most of his family, and discovered an entirely new relationship pattern, involving her with a whole lot of half-siblings, and leaving me as a “half-step sibling” or something.

As she indicated a problem with her understanding of the whole thing, I simply told her, “You’ve been my Sister all your life. Nothing is changing that. Get some rest.”

I hope that was good advice. Peace.

Great. We’ve narrowed down the suspects to 30 probables.

Where were you on the night of ________.

We tested 20 of the other people. You’re now one of the remaining 10 suspects. We have evidence from (pick an item you throw away in public).

IMO it would be incredibly easy to pin a crime on any random schmuck by harvesting their bio signature. If it’s a serious crime then it’s a serious penalty if convicted.

If investigators and prosecutors are willingly fabricating evidence you are doomed no matter what.

That sequence wouldn’t have worked in this case. They used the familial DNA to narrow down the suspects, but they had DeAngelo’s actual DNA from one of his 1980 rape-murders, unequivocal DNA match, and they compared that to DNA from the actual man today–only one person on earth would match on the DNA recovered from the killer’s semen, and that person is the Golden State Killer.