Calculating genetic differentiation with R

As molecular ecologists, it is often necessary and useful to calculate some measure of genetic differentiation. This is often accomplished with metrics such as Wright’s Fst an or an unbiased analog (e.g., Weir & Cockerham’s Fst; G’st etc.). In addition to calculating global estimates, we often want to calculate estimates of genetic differentiation between pairs of populations. Furthermore, we often need to calculate similar measures of genetic differentiation despite having different marker types, different levels of polymorphism, different amounts of missing data, different sampling schemes, and vastly different questions. Despite all of these differences, we know that there are many useful metrics for measuring genetic differentiation and we also probably don’t want to write the code for each estimator from the ground up. Many of the estimators for genetic differentiation have been implemented in packages available for R. Each package typically has its own syntax and can take some time to fully work through (particularly without examples).

Thus, we thought it would be useful for the molecular ecology community to share in a communal GitHub repository focused on calculating genetic differentiation in R. Other languages will, of course, be accepted too but it seems like R has the greatest diversity of metrics. Over time, this will hopefully become a valuable resource. As an example, I have posted an R script that calculates ubiased global Fst using the package Hierfstat. The repository is available at The Molecular Ecologist organization on GitHub under the repository ‘Genetic Differentiation’. When testing your code, it would be best to try it out on two standardized input files (one for SNPs and one for microsatellites). The example data sets currently contain individuals from 4 populations at 100 SNPs and 30 microsatellites (they are simulated data). It would be useful to have a submissions of scripts that are: 1.) short and simple, 2.) heavily commented (using ‘#’ in R), and 3.) produce useful output for both of the example data sets.

Because this is a work in progress, please feel free to suggest or submit other ‘standard’ example data sets and other ideas (e.g., currently the test data sets do not have missing data). Please comment below (I can easily update this text) and also please send me your code snippets to my email (and I will curate, test, and upload them) or you should be able to add scripts directly to the repository.

Below is a log of all the current entries:

Global FST: Hierfstat


About Mark Christie

Mark Christie is an assistant professor in the Department of Biological Sciences and Department of Forestry & Natural Resources at Purdue University.
This entry was posted in population genetics, R, software. Bookmark the permalink.