Guest contributor K.E. Lotterhos is a marine biologist at Wake Forest University, who studies evolutionary responses to fishing and climate change. You can find her on Twitter under then handle @dr_k_lo.
A major goal of evolutionary biology is to understand the genetic basis for adaptation to heterogeneous environments. Rapid advances in technology are allowing a large amount of sequence data to be collected (mostly in the form of single nucleotide polymorphisms: SNPs), presenting us with an unprecedented opportunity to address this question in non-model species on a genome scale.
A major challenge for genome scans is to determine whether patterns of genetic variation are due to the effects of selection versus neutral processes such as genetic drift and demography.
In this post, I will introduce the concept of triangulation* in genome-scans: the process of gathering more than one independent source of evidence for the inference of loci under selection. (Disclaimer: I’m thinking about long-lived, non-model organisms here, where recombinant inbred lines, knocking-out genes, or complementation tests would not be feasible). Although recent reviews have highlighted the importance of integrating multiple types of data, analyses, and experiments to uncover the loci responsible for adaptation (Barrett and Hoekstra 2011, Scheinfeldt and Tishkoff 2013), there are still relatively few studies that have achieved this integration.
How can one plan a study such that genome-scan analyses can be considered independent?
First, let’s consider the two most common types of genome scans for single-nucleotide polymorphisms (SNPs) in non-model organisms:
The FST outlier test: FST is a measure of genetic differentiation among populations. Outliers are loci that are more different in their allele frequencies when compared to the rest of the genome, and thus may explain adaptive differences among populations.
The Genetic-Environment Association (GEA): A measure of the correlation between allele frequencies (in populations or individuals) and an environmental axis, usually modeled with allele frequencies as the response variable and genotype as a predictor variable.
Let’s say a number of individuals were collected from heterogeneous environments on the landscape. Some SNPs were significant both in an FST outlier analysis and a GEA. Would we consider these SNPs to have two independent sources of evidence?
NO, because the two tests were performed on the same sets of individuals. Similar reasoning applies if the same SNP is significant in two GEAs (i.e., significant correlations in two different environments): this is not independent evidence because the same set of individuals was used for both tests. If outlier loci are enriched for functional genes (perhaps based on annotation with a closely-related species) or show an excess of non-synonymous substitutions, the strength of the evidence is increased, but this still does not constitute independent evidence.
To constitute independent evidence under triangulation, each statistical analysis should comprise an independent set of individuals. Having an independent set of individuals is important because of sampling error: perhaps—by chance—you sampled more homozygotes than heterozygotes, or—by chance—at one location only a single allele was sampled. These “chance” events occur more often at low sample sizes – and when they do occur, they are likely to affect multiple statistical tests. For this reason, a false-positive FST outlier is also likely to be a false positive in a GEA when both analyses are performed on the same dataset. Triangulation can reduce the set of false positives because it is unlikely the same “chance” events would happen in different sets of individuals.
Here are a few examples of additional experiments that one can do to achieve triangulation in non-model species:
The Genome-Wide Association Study (GWAS): A measure of the correlation between the phenotype and the allelic state. Usually some form of a mixed model, with phenotype as a response variable and genotype as the predictor variable (and random factors of population and/or relatedness). Typically phenotypes and genotypes have been measured in a common garden environment.
The Within-Generation Selection Experiment: The frequency of alleles is measured before and after selection: if an allele frequency change can be shown to be greater than that expected by genetic drift (i.e., of sampling of individuals from the population), then this is evidence in favor of selection at that locus (e.g. Pespeni et al. 2013, Gompert et al. 2014).
The Common-Garden Validation Experiment: Individuals with candidate allele (or alleles) have higher fitness in a common garden environment (e.g. Yoder et al. 2014). Alternatively, gene expression at a candidate gene (or genes) is consistently different among populations in the common garden (e.g., Chen et al. 2012).
The limitation of triangulation is that—even when we have multiple independent surveys or experiments—we don’t always expect them to give the same answer. For example in humans, different loci on each continent (in Tibet, the Andes, and Ethiopia) have been implicated in adaptation to high-altitude conditions (Alkorta-Aranburu et al. 2012, Bigham et al. 2013). All loci, however, are involved in the same biological pathway (reviewed in Scheinfeldt and Tishkoff 2013).
Take home message:
Triangulation makes a stronger case for candidate loci. In planning a project (and in reviewing papers), it is important to consider whether the sampling design utilizes multiple independent types of data and experiments.
Alkorta-Aranburu G, Beall CM, Witonsky DB, Gebremedhin A, Pritchard JK, Di Rienzo A. 2012. The genetic architecture of adaptations to high altitude in Ethiopia. PLOS Genetics 8 (12): e1003110. doi:10.1371/journal.pgen.1003110.
Barrett RD, and HE Hoekstra. 2011. Molecular spandrels: tests of adaptation at the genetic level. Nature Reviews Genetics 12:767-780. doi:10.1038/nrg3015.
Bigham AW, Wilson MJ, Julian CG, Kiyamu M, Vargas E, Leon-Velarde F, Rivera-Chira M, Rodriquez C, Browne VA, Parra E, Brutsaert TD, Moore LG, Shriver MD. 2013. Andean and Tibetan patterns of adaptation to high altitude. Americal Journal of Human Biology 25 (2): 190–197. doi:10.1002/ajhb.22358.
Chen J, T Kallman, X Ma, N Gyllenstrand, G Zaina, M Morgante, J Bousquet, A Eckert, J Wegrzyn, D Neale, U Lagercrantz, and M Lascoux. 2012. Disentangling the roles of history and local selection in shaping clinal variation of allele frequencies and gene expression in Norway spruce (Picea abies). Genetics 191:865-881. doi:10.1534/genetics.112.140749.
Gompert Z, Comeault AA, Farkas TE, Feder JL, Parchman TL, Buerkle CA, and Nosil P. 2014. Experimental evidence for ecological selection on genome variation in the wild. Ecology Letters 17(3):369-379. doi:10.1111/ele.12238.
Pespeni MH, Sanford E, Gaylord B, Hill TM, Hosfelt JD, Jaris HK, Lavigne M, Lenz EA, Russell AD, Young MK, and Palumbi SR. 2013. Evolutionary change during experimental ocean acidification. Proceedings of the National Academy of Sciences USA 110:6937-6942. doi:10.1073/pnas.1220673110.
Scheinfeldt, L. B., and S. A. Tishkoff. 2013. Recent human adaptation: genomic approaches, interpretation and insights. Nature Reviews Genetics 14:692-702. doi:10.1038/nrg3553.
Yoder JB, Stanton-Geddes J, Zhou P, Briskine R, Young ND, Tiffin P. 2014. Genomic signature of adaptation to climate in Medicago truncatula. Genetics doi:10.1534/genetics.113.159319.
*I heard this term used for the first time at the American Society of Naturalists Conference in Asilomar, CA.