Calculating pair-wise, unbiased Fst with R:

Calculating Weir and Cockerham’s FST is very useful because it is unbiased with respect to sample size (Weir and Cockerham 1984).  Without adjusting allele frequency estimates with respect to sample sizes, estimates of FST can be upwardly biased (see Waples 1998; note that precision may still be low).  Of course there has been lively debate about the utility and appropriate usage of FST (see Meirmans and Hedrick 2011 and Whitlock 2011 as just a few recent examples).  Nevertheless, FST still remains a commonly used metric that will likely be used for years to come (if only for comparative purposes).  There are a large number of stand-alone programs that can calculate unbiased F statistics (too many to mention), and they all do an excellent job.  However, if you are working with genetic data in R, then it can be useful to calculate this metric directly, without having to export your data to a standalone program.  This can be particularly useful if you are employing a large number of iterations/simulations etc.
The way that I calculate pair-wise theta is by using the R library Geneclust (Ancelet and Guillot).  Geneclust does much more than calculate FST, but I found this library to be particularly useful at achieving these ends.  I have run across a few other packages that calculate FST in R, but not Weir and Cockerham’s unbiased formulation (Please correct me in the comments if you know of other libraries).  To install Geneclust in R, simply type “install.packages(c(“Geneclust”,”deldir”,”fields”,”spatial”)”.  Below is the code I use to calculate pair-wise Fst values.  As with the allele frequency example, I have again used the Beech data as a nice example (here modified as a tab-delimited text file: DataMEC-11-0816.R1).   I have checked these values against FSTAT and they give identical results, but more double-checking couldn’t hurt.   I hope you find this useful – and please feel free to proffer suggestions, recommendations, opinions etc.
Script posted here:  Fst_script.txt

This entry was posted in methods, population genetics, software. Bookmark the permalink.