genetic testing math probability of relationship

I’ve had my Y DNA (male line) tested to only 12 markers. The matches they reveal are around 2800 individuals who have the same result on all 12 basic ‘markers’. None of the surnames seem to be predominant and none of them are close to being mine.

With the results comes, if you click on any individual, the probability that you two share a common male ancestor within a range of generations.

For example, if I click on any of these folks, I get a chart that says the probability of that person and myself having a common male ancestor in the last 4 generation is about 34%; in 8 generations, 56%; in 12, 71%. The percent goes up as the number of generations increases, up to over 90% for 24 generations.

So, can I say that if I compare against any 5 people at random on the list of 2800, that the chance of not having a common male ancestor in 4 generations is :

(34^5) X (100^5) = 45.4 million / 10 Billion = less than 1/2 of 1% chance that one of them and myself did not have a common male ancestor within the last 4 generations? I can tell you, even not knowing for sure who my father is (no one does) that such a result is ridiculous. Are my probability calculations not by the book?

I’ve also experienced, on a broader test, being told that many ‘matches’ are 3rd to 5th cousins. On the few I have been able to confirm, the relationship was more like 6th cousins. I think the interpretaton of results by the testing company is skewed toward telling the customer their relationships are closer than the testing actually shows.

???

This may help:

If you’re not of European descent, try googling [probability, descent, attila] and seeif that’s enlightening.

my question is about the assertion that I have a one in three chance of having a common gggrandfather with everyone of the 2800 people that are ‘matched’ with me. Doesn’t that, mathematically, mean that if I pick any 5 of them at random, I am almost assuredly picking one with whom that statement about our gggrandfather is true?

Even their assertions about over 90% if you go back 24 generations only spans around 600-650 years in genealogical time (back to when folks didn’t have surnames) Charlemagne, et al., were another 20 generations or so further back.

???

another way to put it is that there are 2800 folks with whom I share a common male ancestor (among 3 million they have tested?) and if I am related within 4 generations to 1 of every three of them, then there are about 900 such folks, all with different surnames from mine, who are my 4th, or closer, cousins. I assure you, I know many of the names off the folks with whom I have that close a relationship, and there is no correlation. And, since there is little in common among those 2800 people so far as their surnames are concerned (let alone the place they live) I doubt the conclusion that any of them, in the numbers of 1 of 3, are that closely related.

The Y chromosome is inherited only from the father. In much of the world, the surname is also usually so inherited. Now one of your male ancestors might have been adopted, illegitimate, or otherwise taken a name different from that of his biological father from whom he received the Y chromosome. So you could be biologically related to people with different surnames, but it seems unlikely there would be a lot of different surnames. And it seems quite unlikely there would be no predominance of one or a few surnames if all these males were related within a few generations.

The other information we need, I’d think, is how many different values each of these 12 markers can take on and what the frequency is. It makes a lot of difference if you have a marker value that half the people in the world have or only 1% of them have.

You are making two errors with your probability calculation.

(1) For a moment, let’s assume independence so that just multiplying probabilities is valid:
You are calculating 0.34^5. Under the assumption of independence, this is the probability that you share a common ancestor with all of them. The probability that you share a common ancestor with none of them would be 1-0.66^5.

(2) However, I don’t think the assumption of independence is valid because many people can share the same common ancestor. So, in fact, you cannot just multiply the probabilities at all.

I don’t think you have been given enough information to work out the probability that you seek.

ETA: I need to think about this more carefully. On reflection, maybe the assumption of independence is ok.

I think the claims made by the genetic testing company are nonsense. You can tell, from looking at two men’s Y chromosomes, how far back their most recent common male-line ancestor was… but you can’t tell to anywhere near enough precision to say “four generations”. Especially not when you’re only looking at 12 markers.

No. I think you had it right the first time. Simple independence might seem likely but with what they are looking at there is a much greater than random chance of kinship among any two tested individuals. It’s a sort of pedigree limitation.

OldGuy has good point about the frequency of the tested markers. I would suppose that the chosen loci do have some significant variability, but the difference between an allele with 3% frequency versus one with 1% can really change the math a bit.

Unless some of these markers are in the pseudo autosomal region then I am doubtful that they can establish a common ancestor range of just a couple generations in the recent past. Just nowhere near enough time to count on a mutation clock showing any results in just a few markers.

Everyone’s thoughts are appreciated.

They might be looking at the pseudoautosomal region, in which case they might be looking for linkage rather than mutations, which would resolve over a much faster timescale. But many genetic testing services look at short tandem repeats, which are highly variable; if my math is correct, with a mutation rate of around 0.3% per generation per locus, with 5 generations x 12 loci you would expect about a 25% chance of at least one difference.

The “common male ancestor in the last 4 generation is about 34% …” seems very wrong to me, given just a 12-marker test, especially with the large number of matchees. Can you quote exactly what the site states?

(I have a 12-marker result in R1b-M269. Thanks and kudos to anyone who provides a good link to predict SNP from 12 STR’s. BTW, did the magnificent charts at Yfull.com suddenly become subscriber-only?))

For a 6th cousin to appear to be 5th cousin is not unusual. If you have two lines of descent from a pair of g-g-g-g-g grandparents — quite common in many communities — and so does the target, this quadruple-6th cousin will have consanguinity of 5th cousin.

For that matter, for anything other than parent-child, the consanguinity will only tell you how related you’ll be on average. It’s possible for two people to be more or less genetically similar than they “should” be, just from the luck of the genetic dice.

===The “common male ancestor in the last 4 generation is about 34% …” seems very wrong to me, given just a 12-marker test, especially with the large number of matchees. Can you quote exactly what the site states?===

"In comparing Y-DNA 12 marker results, the probability that Mr.----- and (me) shared a common ancestor within the last…

COMPARISON CHART
Generations Percentage
4 ------------- 33.57%
8 ------------- 55.88%
12 ------------- 70.69%
16 ------------ 80.53%
20 ------------- 87.07%
24 ------------- 91.41%

I just can’t get past the clear statement that there is a 1 in 3 chance that any person I click on will have a common ancestor in the last 4 generations.

I can’t see how that means anything other than what it says.

Yes, and not just any common ancestor, but specifically the agnatic ancestor. (You have 16 g-g- grandparents but only one of those sixteen is the agnatic ancestor, the one who donated his Y-chromosome to you and your putative cousin.) And we’re speaking of the standard 12-STR Y-chromosome panel? Something is very fishy.

BTW, the numbers in the table follow the formula
1 - .90277[sup]G[/SUP]
where G is the number of generations. I’ve not had my morning coffee yet, but this formula even seems wrong, showing P(A|B) where P(B|A) is asked.

I am considering either asking for my money back, or, to provide free such testing as would actually conform to their claim that a match is reason to believe we had a 1 out of 3 chance of a common ‘agnate’ ancestor within the last 4 generations.

done. I suspect for their claims to hold water, my father, his father, and his father would have had to race around impregnating women at a frantic pace their whole lives. The male results of those pregnancies would also have to be producing male progeny at a rapid pace. Even then, I doubt the claim of 1 in 3 on a list of 2800 men can be supported.

I just noticed a typo at the end there, it should have read
(1-0.34)^5 or just 0.66^5

(just addressing the probability math, I have no idea if the raw probabilities are correct or reasonable)

Yes, I’m just looking back at your original post, and it seems to me that what they have said is prima facie ridiculous, isn’t it?

Take one male 4 generations ago. If he sires two sons in each generation, he has 16 male descendants after 4 generations. Even with 4 sons each generation, that’s still only 256 descendants after 4 generations.

Yet they suggest that among the tiny proportion of the population for whom they actually have data, they have identified 2800 people, all of whom have a ~34% probability of sharing a 4th-gen ancestor with you? That doesn’t make any sense at all.

The Y is rather odd, in that most of it doesn’t recombine at all, whereas the pseudoautosomal region recombines very rapidly. I suspect that maybe they are looking at the non-recombining part, which may well be shared with a very large number of other males due to common distant ancestry, and then making invalid assumptions that lead to incorrect interpolation of recent ancestry.