Last Friday, Molecular Ecology released an interesting new systematic review online ahead of print. Colin Ahrens and coauthors at a number of Australian research institutions compiled results from 66 papers reporting tests for locally adapted loci based on either FST or genotype-environment associations, and find some interesting trends. The one that raised some eyebrows on Twitter, though, is presented in the paper’s Figure 3:
That’s right, there are papers in the dataset that identify almost 1 in 4 SNPs as FST outliers, and up to 8 in 10 SNPs as significantly associated with some environmental gradient. In fact, from the contents of the Dryad repository supporting the paper, it looks to me as though fully 24% of the compiled studies found that at least 10% of tested SNPs were FST outliers, and 15% found that 10% or more tested SNPs had significant environmental associations. That seems like a lot of SNPs coming up locally adapted — my first reaction was to snark that our field has forgotten what the word “outlier” means. To wit: an outlier is a data point that falls well beyond the range of values seen in the rest of the dataset. If ten percent of your SNPs are “outliers”, then it’s kind of odd to call them outliers.
However, as the text of the paper makes clear, that isn’t really the sense in which Ahrens et al. are using the term. They’re calling loci “FST outliers” if they show FST greater than what’s expected from a demographic model of the populations sampled. It’s not that evolutionary geneticists are pulling a Lake Wobegon and claiming that up to 25% of their SNPs are outside the 99% range — they’re saying up to 25% of SNPs show greater differentiation than expected if they were evolving without the influence of natural selection.
But is that a reasonable conclusion? How often do we really expect one-tenth of SNPs in a sample to show signs of adaptation to different environments? How often ditto one-tenth of SNPs in a sample to show significant genotype-environment associations? Well, maybe more often than you’d think. Here’s a number of reasons for these observations, off the top of my head:
Local adaptation is ubiquitous, and it often involves a substantial portion of the genome. Local adaptation is a very common phenomenon, and if it often has a polygenic basis, you might expect that a large-ish portion of any random sample of SNPs would show greater-than-netural differentiation and associations with environmental variation. But then again as more genome regions are involved in local adaptation, the selection experienced by individual loci, and thereby the differentiation created at any individual locus, should decrease.
Molecular ecologists aren’t sampling genomes at random. If you identify SNPs using a method that preferentially targets genes, like RNAseq or a capture array based on a transcriptome, you’re already looking at a part of the genome that may be more likely to be under selection. Indeed, some of the studies Ahrens et al. compiled targeted regions with hypothesized roles in local adaptation prior to any population genomic analysis — like this study of sea urchins, which tested eight candidate SNPs and found that all eight were significantly differentiated, and five had significant environmental associations. (That one is, apparently, not represented in Figure 3a.) In those cases, the results are quite reasonable, but they hardly reflect a fair survey of local adaptation’s genome-wide effects. However, Ahrens et al. note that the proportion of SNPs identified as candidates based on FST or G-E associations did not differ between studies using targeted datasets and those using randomly sampled SNPs — more on that below.
Molecular ecologists are using the wrong null models, or the wrong significance thresholds. As Graham Coop pointed out in an initial reaction to the paper, if the demographic model used to identify a neutral range of FST is wrong, it’s not hard to get false positives. As population histories are almost always more complex than we can easily model, that’s always a danger. In the case of G-E associations, too, a significant association may arise because of isolation by distance rather than locally varying selection — so finding lots of SNPs with G-E associations is not the same thing as finding lots of locally adapted SNPs. This is why triangulation with multiple other methods and datasets is advisable in genome-scan studies.
One more odd trend in the papers compiled by Arens et al. is that there’s a negative relationship between the number of SNPs genotyped and the proportion of SNPs identified as strongly differentiated or significantly associated with environmental variation. This makes sense if, again, targeted datasets are often targeted in ways that make them more likely to catch SNPs in regions under selection. From the supporting data, I see that the mean (and median) targeted dataset contained about a tenth as many SNPs as the mean (or median) dataset of randomly selected regions. (About 5,200 SNPs for the mean targeted dataset vs. 46,400 for the mean random dataset.) Indeed, one of the ways you can justify a “genome scan” with a set of SNPs that can’t possibly sample the whole genome is to make sure you hit likely targets of selection.
But then again, the proportion of SNPs identified as locally adapted didn’t differ between targeted and random studies.
Ahrens et al. also suggest that authors may be more likely to use methods yielding broader samples of the genome in cases where they suspect polygenic local adaptation, which is more or less the converse of the targeting hypothesis. Finally, they point out the more worrying possibility that studies sampling fewer SNPs have higher rates of false discovery. That seems quite plausible, particularly if authors usually use a subset of their data to parameterize the null model that identifies loci as unusually differentiated — having too few loci for that “control set” could reduce the accuracy of the null model.
This synthesis doesn’t really provide a broader picture of how local adaptation shapes population genetic diversity so much as it provides a look at how molecular ecologists are going about studying local adaptation using modern sequencing data. It seems like we’re finding a lot of loci that might be locally adapted — but the patterns Ahrens et al. find in the data make me wonder how many of these local adaptation candidates will support more detailed examination.
Ahrens CW, PD Rymer, A Stow, J Bragg, S Dillon, KDL Umbers, and RY Dudaniec. 2018l. The search for loci under selection: trends, biases and progress. Molecular Ecology. Accepted Author Manuscript. doi: 10.1111/mec.14549
Hereford J. 2009. A quantitative survey of local adaptation and fitness trade-offs. The American Naturalist, 173(5):579-588. doi: 10.1086/597611
Pespeni MH and SR Palumbi. 2013. Signals of selection in outlier loci in a widely dispersing species across an environmental mosaic. Molecular Ecology, 22(13):3580-3597. doi: 10.1111/mec.12337
Yeaman S. 2015. Local adaptation by alleles of small effect. The American Naturalist 186, no. S1: S74-S89. doi: 10.1086/682405