Calculating pair-wise, unbiased Fst with R:

Calculating Weir and Cockerham’s FST is very useful because it is unbiased with respect to sample size (Weir and Cockerham 1984).  Without adjusting allele frequency estimates with respect to sample sizes, estimates of FST can be upwardly biased (see Waples 1998; note that precision may still be low).  Of course there has been lively debate about the utility and appropriate usage of FST (see Meirmans and Hedrick 2011 and Whitlock 2011 as just a few recent examples).  Nevertheless, FST still remains a commonly used metric that will likely be used for years to come (if only for comparative purposes).  There are a large number of stand-alone programs that can calculate unbiased F statistics (too many to mention), and they all do an excellent job.  However, if you are working with genetic data in R, then it can be useful to calculate this metric directly, without having to export your data to a standalone program.  This can be particularly useful if you are employing a large number of iterations/simulations etc.

The way that I calculate pair-wise theta is by using the R library Geneclust (Ancelet and Guillot).  Geneclust does much more than calculate FST, but I found this library to be particularly useful at achieving these ends.  I have run across a few other packages that calculate FST in R, but not Weir and Cockerham’s unbiased formulation (Please correct me in the comments if you know of other libraries).  To install Geneclust in R, simply type “install.packages(c(“Geneclust”,”deldir”,”fields”,”spatial”)”.  Below is the code I use to calculate pair-wise Fst values.  As with the allele frequency example, I have again used the Beech data as a nice example (here modified as a tab-delimited text file: DataMEC-11-0816.R1).   I have checked these values against FSTAT and they give identical results, but more double-checking couldn’t hurt.   I hope you find this useful – and please feel free to proffer suggestions, recommendations, opinions etc.

Script posted here:  Fst_script.txt

RedditDiggMendeleyPocketShare

About Mark Christie

Mark Christie is a post-doctoral fellow in the Department of Zoology at Oregon State University.
This entry was posted in methods, population genetics, software. Bookmark the permalink.
  • http://www2.imm.dtu.dk/~gigu/Geneland/ Gilles Guillot

    Weir and Cockeram’s estimator is also implemented in the Geneland package,
    adapted from an earlier program by J.M. Cornuet.

  • Rose Andrew

    The ‘wc’ in the hierfstat packages seems to do the same calculation. It also provides estimates for each locus separately.

  • Rose Andrew

    By the way, ‘wc’ produces global estimates, so I’ve had to script it to consider pairs of populations. It’s still the best option for me because I’m really after the per-locus estimates. The ‘pp.fst’ function in the same package gives a pairwise matrix of multilocus estimates (equivalent to the Geneclust version).

  • Viktor

    Is it possible to calculate pairwise Fst between different populations using allele frequencies only instead of genotype information? How?

  • Alicia

    The package Geneclust was removed from the CRAN repository. Are there any other R packages where pairwise Fst is calculated without having to script it?

    • http://www2.imm.dtu.dk/~gigu/Geneland/ Gilles Guillot

      Geneclust is not removed it is *archived*
      The source is still available from
      here

      The code their to compute Fst was actually borrowed from
      Geneland

      Geneland does global and locus-specific Fst’s.
      It can be used either via the command-line or the graphical user interface.
      See manual <a href="http://www2.imm.dtu.dk/~gigu/Geneland/Geneland-Doc.pdf&quot; manual on program web page.

    • http://diversityinlife.weebly.com Kevin Keenan

      I have an R package called ‘diveRsity’ which will do this for you.
      The version which is currently available on CRAN (v1.2.2) does not calculate Weir and Cockerham’s Fst, but version 1.2.3 does. It is available:

      http://diversityinlife.weebly.com/software.html

      I am still carrying out some speed tests on v1.2.3, but early results look fine when compared to alternative “stand alone” software.
      The package (specifically, function ‘div.part’) also calculates pairwise D(Jost, 2008, Chao et al, 2008), G’st (Hedrick, 2005), Gst (Nei & Chesser 1983) among others.
      Locus and global stats are also returned. All results are written in a convenient format, both to the R workspace and to an .xlsx workbook (providing the package ‘xlsx’ is installed, or.txt files if not). Pairwise and locus results can also be plotted to .html files with interactive info boxes, if the package ‘sendplot’ is also installed.

      There is a (‘slightly out dated’) help manual @ which will provide further information on function usage etc.

      If you have any questions or problems please email me at the address given in the package documentation.

      Thanks
      Kevin

  • Boyd Collier

    I’m in the midst of debugging a program written for the Mac in Objective-C to carry out several analyses commonly used in population genetics, including Weir & Cockerham’s (1984) theta, etc. But as a non-population geneticist, some of their notation is confusing to me; in particular, in their equations 1-4, they use l (“el”) to index several quantities, but they do not explain what l refers to (in contrast to their use of i as an index). Can anyone either give me a short explanation or refer me to a worked example that make clear what is being done?
    Thanks,
    Boyd

  • Sebastien renaut

    The function varcomp in Hierfstat also calculates F statistics (among other things) and produces the same estimates as the wc function.

  • Philippe Henry

    I was wondering if anyone knew of an R package that can calculate population specific Fst (sensu Weir and Hill 2002)? I have done this using GESTE in previous years, but I have had no luck getting that software to run recently.

    • Mark Christie

      Hi Phillipe,

      I haven’t been able to find an R package that can do this. For now, it looks like GESTE might still be your best bet, as was recently used in this paper:

      http://onlinelibrary.wiley.com/doi/10.1111/mec.12341/full

      Please add a comment if you find an additional solution.

      Cheers,

      Mark

  • Chris H

    Hi,
    Are there thoughts to update this R package to work with the newest version of R? Unfortunately, it is not compatible with version 3.0.2