Interpreting Population Genetics Formulae

Whether it’s simple equations like heritability (e.g., R = h2S or the ‘breeders equation’) or more complicated equations like Nei and Chesser’s (1983) unbiased estimator of HS, population genetics papers are filled with math. Early in my career I found the later sorts of equations intimidating, actually I still find them intimidating, but less so as I’ve been working hard to learn how to interpret them.

For example, lets breakdown what is going on in the equation to calculate ĤS:

The hat symbol on top of ĤS indicates that the variable is an estimator. In this case S estimates the gene diversity, or expected heterozygosity, within a sub-population of interbreeding organisms. ĤS improves upon the HS formula of Nei (1977) by taking the sample sizes, the number of individuals sampled by the investigator from each sub-population, into account. In other words, because genetic measurements are not taken from every individual within the population, HS must be modified to estimate the gene diversity of the total population. The subscript S simply abbreviates sub-population. HS is typically calculated in conjunction with H0 and HT which are the observed heterozygosity and expected heterozygosity across all sampled sub-populations, respectively. The 0, S and the T subscripts denote these respective values.

After the equals sign (=), I’ll assume the reader know what that means, the fraction contains ñ. ñ is harmonic mean of the sample sized across populations. A tilde on top of a variable conventionally means that the variable is a median value. The use of the variable name n is also conventionally used when the variable is a count of objects or, in this case, individuals. But, to determine that you’re supposed to calculate the harmonic mean, you must read the text of Nei and Chesser (1983).

The brackets indicate that the value of the fraction of harmonic means should be multiplied by result of the equation within the brackets. In this context they’re identical to parenthesis, but take up less space on the page. Within the brackets ‘1 –‘ is straightforward, but the summation sign (Σ) may be tricky if you haven’t encountered one since high school algebra. The subscript k on the bottom indicates that you’re to sum each value of k. k, as well as i and j, when used as subscripts, commonly denote indices. In this case you have to refer to the text of the manuscript to learn that k is the current sub-population of a set of sub-populations.

The overbar (or vinclum) that spans x2k indicates that the values are grouped together mathematically. In plain english this means you’re to square all the allele frequencies prior to averaging them together. The overbar indicates that you’re to take the mean. The last fraction indicates that you’re to divide the observed heterozygosity by two times the harmonic mean (ñ).

I hope this post gives the budding population geneticist a good start at working out population genetics formulae. In my next post I’ll tackle how implement this math in Python making use of the matrix operations available in the Numpy mathematics module. Also, if you should catch an error please post it in the comments and I’ll fix it.


Nei, M. (1977). F-statistics and analysis of gene diversity in subdivided populations. Ann. Hum. Genet. 41, 225-233.

Nei, M., & Chesser, R. K. (1983). Estimation of fixation indices and gene diversities. Annals of human genetics, 47(Pt 3), 253-259.


About Nicholas Crawford

I'm a computational genomics Post Doctoral Fellow at the California Academy of Sciences. I'm working on a number of projects including vertebrate systematics and the genomics of adaptation in lizards, heliconius butterflies, and Hawaiian drosophila.
This entry was posted in population genetics. Bookmark the permalink.