 # Probability of a trait spreading through a population

Is there any way to calculate the probability that a trait present in x members of a population containing y individuals will come to be present in all individuals eventually, regardless of the number of generations it takes? Assume that the size of the population remains constant. Then, what is the probability if the trait has an advantage/disadvantage due to natural selection?

I made a program to simulate this and I would like to check my results since they seem high, and because I’m not sure if I’m simulating the problem properly. For instance, with a population size of 1000 and an advantage of 2% conferred by a trait possessed by just one individual I get a 5.9% success rate. That is, 5.9% of the time the trait spreads to all members of the population.

The discipline is called population genetics. It is so intensely mathematical that I can’t get through an article without making my head hurt, so I can’t comment much on it. But if you search for the term you’ll find lots of information and academic journals regarding it.

However, your simulation is probably not off. From my reading I’ve learned that even small advantages will sweep through a population very rapidly.

I’m quite sure equations have been developed to model just this sort of thing - I remember going through them in my evolution classes in college. But what those equations are, I couldn’t tell you off the top of my head. Go buy yourself a college-level population genetics textbook, and you should be able to find them.

OK, I’m no expert and I will check my notes and books when I get to work tomorrow, but here’s what I remember.

Given an allele frequency p and p + q = 1, Hardy-Weinberg equilibrium will be reached in one generation of random mating if a certain number of conditions are met (allele frequency is constant in the sexes is the only condition I remember off-hand but there are others). Allele frequency will remain constant at this point. H-W equilibrium is easy to calculate. If there are A and a alleles of a gene, and p is the frequency of A alleles in a gene pool (and obviously a = 1 - A), then
freq(AA) = p[sup]2[/sup]
freq(Aa) = 2pq
freq(aa) = q[sup]2[/sup]

This stuff is elementary and I’ve had a handle on since high school. The college and graduate level stuff slips away with every year I’ve not used it. That’s what you are asking about.

Obviously with an evolutionary selection conferred, you will deviate from Hardy-Weinberg in each generation. This page gives a pretty good background on selection; the next chapters deal with selection for heterozygotes and against recessives.

The math is pretty straightforward, though. If you are giving allele A an advantage, then at each generation one calculates a new p (signified as p’) based on the advantage. One counts the alleles available after random mating. If you are selecting against the recessive a, for instance if a gives a 2% disadvantage (s=0.02) as per your example, then both AA and Aa have a selection coefficient of 1 and aa has a selection coefficient of (1-s) or 0.98.

One then counts alleles to calculate the q’. Multiplying through by the fitness coefficient (this is copied pretty much directly from the linked page but I’ll try to explain it in my own words):
Original H-W: p[sup]2[/sup] + 2pq + q[sup]2[/sup] = 1
aa has a 2% disadvantage, so it needs to multiplied by 0.98. There are now fewer alleles surviving in the gene pool – 2% of aas have not survived so the total doesn’t add up to 100%.
p[sup]2[/sup] + 2pq + q[sup]2[/sup](0.98) = 1-0.02q[sup]2[/sup]

One gets a new q’ by counting the number of a alleles now in the population. 1 is contributed by all of the heterozygotes (2pq), 2 by all of the aa homozygotes (2q[sup]2[/sup]0.98) over the total number of alleles. Divide these by two because each parent only contributes one gamete to the progeny – only 50% of heterozyogtes’ progeny will inherit that a. Remember that 2% of aas were lost, though!
q’ = (2pq + 2
q[sup]2/sup)/(2
(1-0.02*q[sup]2[/sup])

That’s one generation. Rinse, lather, repeat. There are some nice simplifications when the allele is lethal (i.e. the selection coefficient is 0), these are listed on the page.

It is all dependent on your initial allelic frequencies and your selection coefficient. Population size shouldn’t matter that much if it is of significant size, greater than around 200 or so. The smaller the population, the more the variance of the allele frequencies will affect the mating. The example given in chapter 3.5 of the above link is for a population of 2. One very quickly sees alleles become lost or fixed. This starts to get beyond my statistical remembrance so I will leave it to you to work through that page.

Hope that’s of some help…

Let me just clarify that. Preview is our friend.
q’ is the new recessive allele frequency. It is the percentage of alleles, out of the total, are now a alleles after selection.
aa homozygotes had a 2% disadvantage, so only 98% of aa homozygotes survive in the next generation. This reduces the total amount of alleles by 0.02q[sup]2[/sup]. The frequency of a alleles after selection (q’) is then calculated by counting alleles, 1 in Aa heterozygotes, 2 in aa homozygotes. Since we are talking about frequency, this translates to 50% of the alleles contributed to the population by Aa and 100% by aa, so divide the Aa and aa percentage by 2. Since the population isn’t at 100% anymore, instead is 1 - 0.02q[sup]2[/sup], then:
q’ = (freq(Aa) + freq(aa)0.98)/(2(1-0.02*freq(aa)))
q’ = (2pq + p[sup]2[/sup]0.98)/(2(1-0.02q[sup]2[/sup]))

Hope that makes more sense.

Erk, dammit. Typo:
q’ = (2pq + 2q[sup]2/sup)/(2(1-0.02*q[sup]2[/sup]) is correct.

This is a standard result in population genetics. The probability of ultimate fixation of an allele is approximately

(1 - exp(-2Nps)) / (1 - exp(-2Ns))

in a haploid population, where s is the selection coefficient (2% in your case), N is the population size, and p is the starting allele frequency. For a diploid population those twos become fours. I get answers in the neighborhood of yours, but it’s not clear to me whether you’re assuming haploidy or diploidy, and if the latter whether you’re assuming that the single individual is a homozygote or a heterozygote. There are simpler approximations for the case of a single copy, but I’ll wait to hear the answers to those questions. Also, this depends on a particular model (no overlap of generations, etc.; this is called the Wright-Fisher model). You can get an exact numerical answer based on the eigenvalues and eigenvectors of a matrix whose columns are binomial distributions.

Excellent, that what I was looking for. It’s worth noting some simplifications for an initial single copy. For a haploid population, in this case p = 1/N and the probability is approximately

2Ns / (1 - exp(-2Ns))

(all of this assumes things like N reasonably large and |s| much smaller than one). For an advantageous allele with s large enough (but still small compared to one), this is approximately 2s (i.e., it’s approximately independent of population size). So a resonable guess at your question, assuming haploidy, would be about 4%, and the more complicated formula gives 0.0392.

Minor correction: that 2Ns / (1 - exp(-2Ns)) should be 2s / (1 - exp(-2Ns)). The first expression is the probability relative to that of an allele with no selective advantage or disadvantage, which is often of interest. Also I should have said that under diploidy, things are more complicated if there’s dominance, and the form I gave assumed no dominance and that s is the selective advantage of a heterozygote.