What is the probability of an identical human?

wibble · April 21, 2008, 2:56pm

According to the theory of evolution, the human gene pool is finite, and is constantly being ‘shuffled around’ in the population.

My question is in two parts:

What is the probability of this random gene shuffling resulting in an identical copy of another human?
Assuming that the probability in (1) is sufficiently greater than zero as to make it worth considering, what is the probability that this event occurs in the same generation - i.e. such that two genetically identical humans are alive at the same time?

Giles · April 21, 2008, 3:05pm

Identical twins are genetically the same, so in real life the probability is around 0.2%. However, even they are not completely identical.

puppygod · April 21, 2008, 3:19pm

Twins doesn’t count?

For hard numbers, I’m afraid there is no way to get answer - we simply don’t know enough to calculate that possibility. We know approximate number of genes, but thing is, many genes are present in different variants with different frequencies. Then there is mitochondrial DNA. And on one more note, genetically identical doesn’t mean physically identical. There are differences even between twins - and they have the same genes and the same conditions during fetal development. When you calculate things you have to include chances of mutation and many other factors in later development…

So, if you allow monozygotic twins, then chances are around 0.2% for global population.

If not, then using modern methods of DNA testing you can theoretically get up to 0.000 000 000 001% of two random people showing as the same. Most probably - if some more advanced methods would be developed, probably could be analyzed until some differences would be found.

So, bottom line - my WAG would be it’s after all, so close to zero as not worth bothering about.

pulykamell · April 21, 2008, 3:38pm

[QUOTE=puppygod]

If not, then using modern methods of DNA testing you can theoretically get up to 0.000 000 000 001% of two random people showing as the same. Most probably - if some more advanced methods would be developed, probably could be analyzed until some differences would be found.

So, bottom line - my WAG would be it’s after all, so close to zero as not worth bothering about.
[/QUOTE]

Even with that miniscule number, I think question 2 (basically a variant on the birthday problem), would be interesting, considering a world population of 6 650 000 000. I don’t have enough of a math background to figure out how to plug in such big numbers to those equations, so if anyone wants to have a crack at it.

Exapno_Mapcase · April 21, 2008, 3:59pm

I think that estimate is wildly low.

The estimate of the number of human genes keeps changing practically by the hour, but 30,000 is a good midpoint estimate.

That would mean the chances of two identical genomes, not from a split zygote, is 1 in 2[sup]30,000[/sup]. That number is large beyond any terms that can be used to express it. The entire universe for its entire existence if filled with nothing but DNA changing every second would be as zero next to it.

Humans are unique except for identical siblings. There is no possibility of ever having two become identical by chance.

Leaffan · April 21, 2008, 4:18pm

So, why is it that DNA evidence provides statistics such as “The odds it could have come from anyone but Simpson were about one in 170 million.”

Quoted from Wikipedia, but you get the idea.

Q.E.D · April 21, 2008, 4:26pm

[QUOTE=Exapno Mapcase]
That would mean the chances of two identical genomes, not from a split zygote, is 1 in 2[sup]30,000[/sup].
[/QUOTE]

I don’t think so. A human isn’t a collection of 30,000 randomly-assigned genes from a total pool of 30,000 possible genes. There are about 30,000 or so genes in the human genome, yes, but every human has every one of them. The differences occur because each gene has some number of variants. I don’t know how to work out the numbers, but it’s not anywhere near as simple as you’ve stated it.

mr.jp · April 21, 2008, 4:27pm

Ok, some back of the envelope calculations:

I will only look at SNPs. (These are basically positions in the genome where different people have a different base pair.) There are also other sources of variation, such as Copy Number Variation, but SNPs should be the major one. (I will not consider whether the SNP lies in the “junk DNA” or whether it changes the codon.)

The actual number is not known yet, but a low estimate of the number of SNPs in the human population is 10 million.

The average fraction of people who has the uncommon base pair in the SNP I will design “f”.

Now we take an average individual A. He will have the common base pair in 10^7 * (1 - f) cases, and the uncommon base pair in 10^7 * f cases. If a random individual B is to be identical to A he will have to have every base pair being identical. This is the case with odds (1 - f) and f respectively.

Thus the chance of them being identical is (1-f)^(10^7 * (1 -f)) * f^(10^7 * f)

I tried inserting the theoretically lowest variation (1%, per definition), so that f is 0.01 into the windows calculator, and I think the number was too low for it to display.

The_Controvert · April 21, 2008, 4:33pm

[QUOTE=Leaffan]
So, why is it that DNA evidence provides statistics such as “The odds it could have come from anyone but Simpson were about one in 170 million.”

Quoted from Wikipedia, but you get the idea.
[/QUOTE]
Since they currently cannot compare two complete sets of DNA, they just match a set of genes or protein markers so as to be reasonably certain and have results in a relatively quick fashion.

mr.jp · April 21, 2008, 4:35pm

[QUOTE=Leaffan]
So, why is it that DNA evidence provides statistics such as “The odds it could have come from anyone but Simpson were about one in 170 million.”

Quoted from Wikipedia, but you get the idea.
[/QUOTE]

Because they don’t sequence the whole genome. In fact they only use very few but very variable regions.

Leaffan · April 21, 2008, 4:44pm

Thanks, and thanks.

Anaglyph · April 21, 2008, 4:50pm

DNA evidence usually relies on restriction fragment polymorphism of highly polymorphic regions that contain short repeated sequences of DNA , typically each polymorphism will be shared by around 5 - 20% of individuals. When looking at multiple loci, it is the unique combination of these polymorphisms to an individual that makes this method discriminating as an identification tool. The more STR ( Short tandem repeats) regions that are tested in an individual the more discriminating the test becomes. (DNA profiling - Wikipedia)

Thus genetic fingerprinting looks at a limited number of sites within the DNA that are the most likely to produce different pattern when the DNA is digested with a particular enzyme. On could even further decrease the probability by analyzing more sites, but this would take more time and cost more.

John_Mace · April 21, 2008, 5:01pm

[QUOTE=Exapno Mapcase]
I think that estimate is wildly low.

The estimate of the number of human genes keeps changing practically by the hour, but 30,000 is a good midpoint estimate.

That would mean the chances of two identical genomes, not from a split zygote, is 1 in 2[sup]30,000[/sup].
[/QUOTE]

But humans are about 99.9% identical in their DNA, so the starting point is probably about 2[sup]30[/sup].

Stranger_On_A_Train · April 21, 2008, 6:44pm

[QUOTE=The Controvert]
Since they currently cannot compare two complete sets of DNA, they just match a set of genes or protein markers so as to be reasonably certain and have results in a relatively quick fashion.
[/QUOTE]
To expand on that, commercial DNA typing is done based upon nucleotide sequences known as Variable Number Tandem Repeats (VNTR or “vinters”) which occur as within an allele. Because these are multiple sequences of the same segment, they can occur in different numbers (hence “variable number”) in different people owing to inherited errors in replication but still be viable.. These occur in different frequencies in different ethnic groups of people, and given a sufficient number of markers you can identify the relationship between to people in an attached lineage to a certain probability. This can be thought of as similar to fingerprinting, where capturing five or six points shows a good likelihood of positive identification, but ten or more points is virtually assured (although there can still be questions about the accuracy of the measurements and always some uncertainty, however remote, about the absolute uniqueness of the print.) More accurate methods of sequencing and identifying genomes can be used but they are much more labor intensive by several orders of magnitude. PCR amplification and VNTR marking are adequate for any normal forensic purposes, the O.J. Simpson trial notwithstanding.

As Q.E.D. notes, genes are hardly random combinations, which significantly limits the number of combinations below the stated figure. Nonetheless, the likelihood of even two related people who are not derived from the same zygote having a completely identical genome (even limiting the analysis exclusively to functional nuclear genes) is literally astronomical, even though the actual variation of functional codons between two random people is about 0.1%. Even considering people who are reproductively related doesn’t bump up the odds of coincidentally identical genomes dramatically.

Stranger

pulykamell · April 21, 2008, 7:48pm

[QUOTE=Stranger On A Train]

As Q.E.D. notes, genes are hardly random combinations, which significantly limits the number of combinations below the stated figure.
[/QUOTE]

So, about what number are we looking at, and how many human beings would we have to have to a have a probability that two in that bunch share a common sequence? Going back to the birthday problem, it only takes 23 people in a room to have a 50% chance that two share a birthday. What’s the equivalent to this 23 number in the DNA sequencing question?

mr.jp · April 21, 2008, 7:58pm

[QUOTE=pulykamell]
So, about what number are we looking at, and how many human beings would we have to have to a have a probability that two in that bunch share a common sequence? Going back to the birthday problem, it only takes 23 people in a room to have a 50% chance that two share a birthday. What’s the equivalent to this 23 number in the DNA sequencing question?
[/QUOTE]

According to my calculations, we are looking at a number with many thousand zeroes for the chance that two people are identical. It won’t happen if you fill up the universe with people.

pulykamell · April 21, 2008, 8:02pm

[QUOTE=mr. jp]
According to my calculations, we are looking at a number with several thousand zeroes.
[/QUOTE]

Yes, that would be rather unlikely, then.

mr.jp · April 21, 2008, 8:29pm

Lol, I guess that exposed my rather unnecessary edit.

wibble · April 21, 2008, 8:35pm

Thanks for the input, all.

Bang goes another great science fiction plot. Can’t be science fiction if it ain’t based on science…

… maybe fantasy? Hmmmm…

wibble · April 21, 2008, 8:36pm

Thanks for the input, all.

Bang goes another great science fiction plot. Can’t be science fiction if it ain’t based on science…

… maybe fantasy? Hmmmm…

Nice to have it confirmed that I’m as truly unique as I thought I was, though

Topic		Replies	Views
Chance of two people, not identical twins, having the same DNA? Factual Questions	11	2382	April 5, 2005
Have they or have they not? Factual Questions	11	1150	August 2, 2001
Given Another 100 Million years, What Are The Chances That I Will Exist Again? In My Humble Opinion	21	2116	August 21, 2015
Question about identical DNA randomly occuring Factual Questions	5	1778	February 10, 2005
Is it possible for 2 people in the history of the world to have shared fingerprints? Factual Questions	8	1745	February 8, 2004

What is the probability of an identical human?

Related topics