On “triangulation” in genome scans

torridon view

A triangulation marker used in surveying.

Guest contributor K.E. Lotterhos is a marine biologist at Wake Forest University, who studies evolutionary responses to fishing and climate change. You can find her on Twitter under then handle @dr_k_lo.

A major goal of evolutionary biology is to understand the genetic basis for adaptation to heterogeneous environments.  Rapid advances in technology are allowing a large amount of sequence data to be collected (mostly in the form of single nucleotide polymorphisms: SNPs), presenting us with an unprecedented opportunity to address this question in non-model species on a genome scale.

A major challenge for genome scans is to determine whether patterns of genetic variation are due to the effects of selection versus neutral processes such as genetic drift and demography.

In this post, I will introduce the concept of triangulation* in genome-scans: the process of gathering more than one independent source of evidence for the inference of loci under selection. (Disclaimer:  I’m thinking about long-lived, non-model organisms here, where recombinant inbred lines, knocking-out genes, or complementation tests would not be feasible).  Although recent reviews have highlighted the importance of integrating multiple types of data, analyses, and experiments to uncover the loci responsible for adaptation (Barrett and Hoekstra 2011, Scheinfeldt and Tishkoff 2013), there are still relatively few studies that have achieved this integration.

How can one plan a study such that genome-scan analyses can be considered independent? 

First, let’s consider the two most common types of genome scans for single-nucleotide polymorphisms (SNPs) in non-model organisms:

The FST outlier test:  FST is a measure of genetic differentiation among populations.  Outliers are loci that are more different in their allele frequencies when compared to the rest of the genome, and thus may explain adaptive differences among populations.

The Genetic-Environment Association (GEA): A measure of the correlation between allele frequencies (in populations or individuals) and an environmental axis, usually modeled with allele frequencies as the response variable and genotype as a predictor variable.

Let’s say a number of individuals were collected from heterogeneous environments on the landscape.  Some SNPs were significant both in an FST outlier analysis and a GEA.  Would we consider these SNPs to have two independent sources of evidence?

NO, because the two tests were performed on the same sets of individuals. Similar reasoning applies if the same SNP is significant in two GEAs (i.e., significant correlations in two different environments): this is not independent evidence because the same set of individuals was used for both tests.  If outlier loci are enriched for functional genes (perhaps based on annotation with a closely-related species) or show an excess of non-synonymous substitutions, the strength of the evidence is increased, but this still does not constitute independent evidence.

To constitute independent evidence under triangulation, each statistical analysis should comprise an independent set of individuals.  Having an independent set of individuals is important because of sampling error: perhaps—by chance—you sampled more homozygotes than heterozygotes, or—by chance—at one location only a single allele was sampled.  These “chance” events occur more often at low sample sizes – and when they do occur, they are likely to affect multiple statistical tests.  For this reason, a false-positive FST outlier is also likely to be a false positive in a GEA when both analyses are performed on the same dataset.  Triangulation can reduce the set of false positives because it is unlikely the same “chance” events would happen in different sets of individuals.

A Manhattan plot from a GWAS of flowering time in Medicago truncatula.

A Manhattan plot from a GWAS of flowering time in Medicago truncatula.

Here are a few examples of additional experiments that one can do to achieve triangulation in non-model species:

The Genome-Wide Association Study (GWAS):  A measure of the correlation between the phenotype and the allelic state.  Usually some form of a mixed model, with phenotype as a response variable and genotype as the predictor variable (and random factors of population and/or relatedness).  Typically phenotypes and genotypes have been measured in a common garden environment.

The Within-Generation Selection Experiment:  The frequency of alleles is measured before and after selection:  if an allele frequency change can be shown to be greater than that expected by genetic drift (i.e., of sampling of individuals from the population), then this is evidence in favor of selection at that locus (e.g. Pespeni et al. 2013, Gompert et al. 2014).

The Common-Garden Validation Experiment:  Individuals with candidate allele (or alleles) have higher fitness in a common garden environment (e.g. Yoder et al. 2014).  Alternatively, gene expression at a candidate gene (or genes) is consistently different among populations in the common garden (e.g., Chen et al. 2012).

The limitation of triangulation is that—even when we have multiple independent surveys or experiments—we don’t always expect them to give the same answer. For example in humans, different loci on each continent (in Tibet, the Andes, and Ethiopia) have been implicated in adaptation to high-altitude conditions (Alkorta-Aranburu et al. 2012, Bigham et al. 2013).  All loci, however, are involved in the same biological pathway (reviewed in Scheinfeldt and Tishkoff 2013).

Take home message:

Triangulation makes a stronger case for candidate loci. In planning a project (and in reviewing papers), it is important to consider whether the sampling design utilizes multiple independent types of data and experiments.

References:

Alkorta-Aranburu G, Beall CM, Witonsky DB, Gebremedhin A, Pritchard JK, Di Rienzo A. 2012. The genetic architecture of adaptations to high altitude in Ethiopia. PLOS Genetics 8 (12): e1003110. doi:10.1371/journal.pgen.1003110.

Barrett RD, and HE Hoekstra. 2011. Molecular spandrels: tests of adaptation at the genetic level. Nature Reviews Genetics 12:767-780. doi:10.1038/nrg3015.

Bigham AW, Wilson MJ, Julian CG, Kiyamu M, Vargas E, Leon-Velarde F, Rivera-Chira M, Rodriquez C, Browne VA, Parra E, Brutsaert TD, Moore LG, Shriver MD. 2013. Andean and Tibetan patterns of adaptation to high altitude. Americal Journal of Human Biology 25 (2): 190–197. doi:10.1002/ajhb.22358.

Chen J, T Kallman, X Ma, N Gyllenstrand, G Zaina, M Morgante, J Bousquet, A Eckert, J Wegrzyn, D Neale, U Lagercrantz, and M Lascoux. 2012. Disentangling the roles of history and local selection in shaping clinal variation of allele frequencies and gene expression in Norway spruce (Picea abies). Genetics 191:865-881. doi:10.1534/genetics.112.140749.

Gompert Z, Comeault AA, Farkas TE, Feder JL, Parchman TL, Buerkle CA, and Nosil P. 2014. Experimental evidence for ecological selection on genome variation in the wild. Ecology Letters 17(3):369-379. doi:10.1111/ele.12238.

Pespeni MH, Sanford E, Gaylord B, Hill TM, Hosfelt JD, Jaris HK, Lavigne M, Lenz EA, Russell AD, Young MK, and Palumbi SR. 2013. Evolutionary change during experimental ocean acidification. Proceedings of the National Academy of Sciences USA 110:6937-6942. doi:10.1073/pnas.1220673110.

Scheinfeldt, L. B., and S. A. Tishkoff. 2013. Recent human adaptation: genomic approaches, interpretation and insights. Nature Reviews Genetics 14:692-702. doi:10.1038/nrg3604.

Yoder JB, Stanton-Geddes J, Zhou P, Briskine R, Young ND, Tiffin P. 2014. Genomic signature of adaptation to climate in Medicago truncatula. Genetics doi:10.1534/genetics.113.159319.

*I heard this term used for the first time at the American Society of Naturalists Conference in Asilomar, CA.

RedditDiggMendeleyPocketShare and Enjoy

About Jeremy Yoder

Jeremy Yoder is a postdoctoral associate in the Department of Plant Biology at the University of Minnesota. He also blogs at Denim and Tweed and Nothing in Biology Makes Sense!, and tweets under the handle @jbyoder.
This entry was posted in adaptation, association genetics, genomics, methods, population genetics, quantitative genetics. Bookmark the permalink.
  • Noah Reid

    Thanks for the post. I’m only just starting to think about this stuff, but it seems to me that no genome scan of the same set of populations, whether for Fst outliers, GWAS or GEA can be considered independent. So why do you include GWAS in the category of analyses that can achieve triangulation?

    • K E Lotterhos

      If one set of individuals was phenotyped and collected from the landscape and for an Fst outlier, a GWAS, and a GEA, then none of those tests would be considered independent.
      If one set of individuals was collected from the landscape and used for an Fst outlier/GEA, and a second set of individuals were collected and grown in a common garden for a GWAS, then I would consider the results from those tests to be independent.
      Note that there is still non-independence within each dataset: non-indpendence among linked loci, as well as non-independence due to shared evolutionary history among samples. If population structure is not accurately controlled for, it can create many false positives. Assuming that population structure is accurately controlled for by each test, then the main source of non-independence should be from linkage. Since most of the time population structure is probably not accurately controlled for, triangulation can help us pinpoint those loci that are significant in independent datasets.

  • Pingback: Fishing for genetic signals of adaptation | The Molecular Ecologist()