Selection scans, and the genomics of adaptive/maladaptive introgression

Natural selection, and the adaptive evolution of hybrid reproductive incompatibilities post divergence are known to be major drivers of speciation. At the phenotype level, these manifest as fitness differences between introgressing populations. At the genomic level, speciation “genes” or “islands” are often identified via quantifying these incompatibilities as barriers to migration, and hybridization (see my previous post for an account of “islands” and secondary contact). Three recent publications discuss these incompatibilities in different contexts (1) genome wide scans of selection (Haasl and Peyseur 2015), (2) when incompatible gene flow post divergence does not increase selection (Rolshausen et al. 2015), and generally, (3) the genomic architecture of incompatibilities under secondary contact (Lindtke and Buerkle 2015). 1) Fifteen years of genome-wide scans for selection; trends, lessons, and unaddressed genetic sources of complication – Haasl and Peyseur, 2015, Molecular Ecology In this detailed meta-analysis and review of > 100 studies across a host of species, Haasl and Peyseur (a) discuss evolutionary processes – mutational rate differences among marker types, recombination rate variation across the genome, confounding effects of reduction in diversity due to linked/background selection and positive selection, selection on polygenic traits that often complicate, confound, or complement the identification of targets of selection, (b) report observed trends in GWSS across taxa –predominantly human, use Fst outlier detection, and strongly biased towards the detection of positive, and directional selection, and (c) offer recommendations for best practices for researchers working on GWSS – the use of recombination and mutation rates, and the use of simulations eg. under an ABC framework, to test hypotheses of demographic histories.

The genome provides an organic record of evolution that is frequently likened to a palimpsest – a writing medium that is recycled, continuously written over, and reoriented so as to partially or wholly obscure older text. By this metaphor, chromosomes are the parchment, and DNA sequence the text.

2) When maladaptive gene flow does not increase selection – Rolshausen et al. (2015), Evolution

Morphological differences in river (inlet) and lake (outlet) threespine stickleback. Image courtesy:

Rolshausen et al. report results of a long-term mark-recapture observational study on inlet (river/stream) and outlet (lake) populations of the threespine stickleback (Gasterosteus aculeatus) in British Columbia which have large levels of previously observed unidirectional (inlet-outlet) gene flow, and phenotypic differences in life histories. They monitor individual survival of ~4000 fish over long periods of time, with temporal replicates, and use logistic regression models to estimate selection coefficients. Key findings of the study include (a) increased winter mortality in outlet (lake) fish, (b) smaller average body lengths in inlet fish (river) compared to outlet fish (lake), (c) positive directional selection for deeper bodies in inlet fish, and no selection for body size in outlet fish, with seasonally varying levels of intensity – contrary to theory that suggests stronger selection in outlet fish (receiving maladaptive gene flow from the inlet).

Of particular interest is the novel idea that high gene flow can causally reduce selection by broadening the fitness function – a result we demonstrated by means of a general population genetic model.

Classic DMI’s, indicating allelic incompatibility of hybrids. Image courtesy:

3) The genetic architecture of hybrid incompatibilities and their effect on barriers to introgression in secondary contact – Lindtke and Buerkle, 2015, Evolution Dobzhansky-Muller Incompatibilities, or DMI’s are often broken down in populations with few viable hybrids under secondary contact, but yet suffer reduced fitness. Lindtke and Buerkle use simulations of whole genomes in contact zones, to explore the classic DMI model, and an alternative model of coadaptation in genomes due to a ‘pathway’ model (wherein incompatibilities arise not from alleles, but from disruptions of pathway interactions). Quantifying Fst differences between diverged populations, they report (a) strong selection, and low migration showed the classic signatures of DMIs, and vice versa – weaker selection, and high migration indicated breakdown of DMI’s due to enabling interspecific recombination, and (b) genome-wide reduction in Fst under the pathway model, and declined linearly with time, with more introgression, stabilizing after a few generations, and (c) strong influence of demography – particularly migration rates on the genomic outcomes of hybridization.

…This highlights the potential contribution of intra-genomic interactions to speciation with gene flow and suggests the value of a broader set of epistatic models in speciation research.


Haasl, Ryan J., and Bret A. Payseur. “Fifteen years of genome‐wide scans for selection: trends, lessons, and unaddressed genetic sources of complication.”Molecular Ecology (2015). DOI:10.1111/mec.13339

Rolshausen, Gregor, et al. “When maladaptive gene flow does not increase selection.” Evolution (2015).DOI: 10.1111/evo.12739

Lindtke, Dorothea, and C. Alex Buerkle. “The genetic architecture of hybrid incompatibilities and their effect on barriers to introgression in secondary contact.” Evolution (2015). DOI: 10.1111/evo.12725

RedditDiggMendeleyPocketShare and Enjoy
Posted in adaptation, Coevolution, evolution, genomics, Molecular Ecology, the journal, mutation, natural history, population genetics, selection, speciation, theory | Tagged , , , , | Leave a comment

Should we use Mantel tests in molecular ecology?

No. Stop.

At least that is the message from a new publication in Methods in Ecology and Evolution by Pierre Legendre and colleagues (pay-walled, but I found a pdf here).

Mantel tests should simply not be used to test hypotheses that concern the raw data from which dissimilarity matrices can be computed or to control for spatial structures in tests of relationships between two autocorrelated data sets.

Continue reading

Posted in Uncategorized | Tagged , , | 1 Comment

PCA of multilocus genotypes in R

An earlier post from Mark Christie showed up on my feed on calculating allele frequencies from genotypic data in R, and I wanted to put together a quick tutorial on making PCA (Principal Components Analysis) plots using genotypes. I used the genotype data published by Tishkoff et al. (2009) for this example, but it should work for any generic genotype format, as long as it’s stored as a table/matrix. As an example, let’s try plotting the first three principal components for the Yoruba, Hadza, Han, and Tamil populations.

PCA (first 3 PC's shown) of genotypes from 4 populations (Hadza, Yoruba, Han, and Tamil) using genotypic data from Tishkoff et al. (2009)

PCA (first 3 PC’s shown) of genotypes from 4 populations (Hadza, Yoruba, Han, and Tamil) using genotypic data from Tishkoff et al. (2009)

legend(p$xyz.convert(-80, 20, -10),
col= c(“red”,“blue”, “maroon”, “green”), 
bg=“white”, pch=c(20,20,20,20), yjust=0,
legend = c(“Yoruba”, “Han”, “Hadza”, “Tamil”),
cex = 1.1)

And voila! I bet there are plenty of more complex packages (in R and other tools) that would help make similar plots – for instance, see adegenet, and SNPRelate, but do try my script out and leave your suggestions/comments below!

Posted in bioinformatics, genomics, howto, population genetics, R, software | Tagged , , | 4 Comments

When and how to “go for the genes”


A new special issue of Molecular Ecology, entitled “Detecting selection in natural populations: making sense of genome scans and towards alternative solutions”, is coming down the line, and a few articles from that issue are starting to appear as newly-accepted.

Seeing those words made me think back to some papers I downloaded a while back based on an interesting tweet:

Evolution is a fundamentally genetic process, but the reciprocal connections between alleles, phenotypes, and environment are not fundamentally tractable. All three of these papers are various stances on the relationship between identifying genes that underlie traits of interest and understanding the evolution of those traits. The second link is a strong critique of the genotype-phenotype map pursuit and suggests that chasing underlying genes and alleles is distracting evolutionary biologists from the study of phenotypes. The third link is from a 2011 paper that covers the problem of “missing heritability”, an issue elegantly summarized by The Molecular Ecologist previously.

The first link takes you to a new paper appearing in the most recent issue of Evolution and provides yet another conversation starter by suggesting a blueprint for when “going for the genes” may be necessary and when it’s not. Rausher and Delph lay out a total of ten study objectives, broken up into two categories.

A. When explanations of evolutionary processes could be made without identifying the genes responsible for the phenotype of interest:

  1. Explaining evolutionary change and divergence in quantitative traits
  2. Detecting tradeoffs
  3. Explaining evolutionary change and divergence in Mendelian traits

B. When gene identification in justified

  1. Any study of molecular evolution
  2. Drift vs. selection
  3. Analysis of parallel evolution
  4. Understanding asymmetries in evolutionary transition rates
  5. Evaluating the cause of the trait-trait and trait-fitness correlations
  6. Costs of adaptation
  7. Selection-component analysis in undisturbed natural populations

I admired the reasonable and constructive dialog here, including many “how-to” sections to back up their claims, which are especially helpful when you are someone looking from the outside-in at this “debate”.

Or, if you’re feeling feisty and find your work in category A, here is a potential script for your next interaction with a reviewer or conference talk attendee:

Many evolutionary biologists have been asked at one time or another why they haven’t tried to determine what genes underlie the evolutionary issues they examine. In our view, a legitimate answer is that doing so would not significantly enhance our understanding of those issues.

but wait!

Given the vast range of issues knowledge of the relevant genes can address, however, in providing that answer, there are situations in which one must be able to articulate why discovering the genes is indeed irrelevant.

“Have an opinion that you can support” continues to be timeless advice, whether you are “going for the genes” or not.


Rausher, M. D., & Delph, L. F. (2015). When does understanding phenotypic evolution require identification of the underlying genes?. Evolution, 69(7): 1655-1664.

Rockman, M. V. (2012). The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution, 66(1), 1-17.

Travisano, M., & Shaw, R. G. (2013). Lost in the map. Evolution, 67(2), 305-314.

Posted in adaptation, association genetics, Molecular Ecology, the journal | Tagged , , | Leave a comment

The Tao of open science for ecology

I think we can all agree that science needs to be transparent, shared, and reproducible. Recently, however, the discussion about “open science” has been conducted mostly in online forums and less so in publications (hopefully Open Access ones!). This is why Hampton et al decided to publish their idea for the path for open science in ecology for all to see – even those who are less active on social media. Continue reading

Posted in science publishing, Uncategorized | Tagged , | Leave a comment

Current archival practices limit our ability to reuse genetic data

Slide1Archiving genetic data is important for a lot of reasons, like ensuring reproducibility and transparency of results. Being able to access previously published data is also important given that the same set of data can often help answer a diversity of relevant questions in the field of evolutionary biology. In the current issue of Molecular Ecology, Pope et al. analyzed 419 data sets from 289 articles published in the journal over the last 5 years, recording the extent to which the data sets could be recreated given the geographic and temporal provided by the authors. For example, for sequences collected across a geographic range, could Pope et al. determine which sequences were collected in which areas? If only unique sequences were uploaded to Genbank by the original authors, was information needed to figure out the number of individuals from a given location that had a particular sequence also provided (i.e. sample sizes and haplotype/allele frequencies)? Did the authors report the timeframe in which they collected the samples?

Pope et al. found that since the 2011 implementation of the Joint Data Archiving Policy (JDAP), which requires that data supporting publications be made publicly available, the archiving of genetic data increased from 49% (pre-2011) to 98% (2011-today). To me, uploading genetic data to a curated database like Genbank or the European Nucleotide Archive feels as much as part of the process as does writing the paper. Unfortunately, Pope et al. were unable to recreate 31% of the archived data sets they downloaded based on the information provided in the paper or with the sequence data themselves. Over a third of articles provided geographic information as text only without including geographic coordinates and 18% of those described sampling at the broader regional scale. About 40% of the articles provided no temporal information and 20% reported only a range of years.

While great progress has been made towards the public availability of genetic data, the lack of emphasis on provision of associated information, such as geographic location and time of sampling, may impede our ability to fully reproduce such studies or use their genetic data in new ways.

Pope et al. recommended that in order to make genetic data truly accessible and useful for future analyses, at a minimum, individual genotypes should be recoverable and linked to geographic and temporal information. The authors also suggested including a readme file with the archived data that provides relevant information, like the naming/coding system used to identify sequences generated in the study.

To fully realize the future potential of this data legacy, there should now be a greater push to link spatio-temporal metadata to genetic data and to develop standards and repositories that facilitate data deposition, curation and searchability.


Pope, L. C., Liggins, L., Keyse, J., Carvalho, S. B., & Riginos, C. (2015). Not the time or the place: the missing spatio‐temporal link in publicly available genetic data. Molecular Ecology (24) 3802–3809. DOI: 10.1111/mec.13254

Posted in Uncategorized | Leave a comment

Who came first – the Paleo- or Native American?

In yet another infamous Science vs Nature race, two studies published this Tuesday toss more cans of worms at the ongoing debate about the founding of the Americas – with disparate findings. Uh oh.

Representatives of six native American tribes bury the remains of Anzick-I. Image courtesy:

Skoglund et al. Nature (2015) Genetic evidence for two founding populations of the Americas

In further evidence for what’s come to be known as the Paleoamerican model, Skoglund et al. (2015) analyzed genomic ancestries of 63 individuals in 21 Native American populations with little evidence of European or African ancestries at 600,000 SNP’s by computing f4 statistics, and reject the null hypothesis that Native Americans descend from one single homogenous population after divergence from other discernible distinct populations across the world. Native Americans also cluster with Amazonian, Mesoamerican, Australasian, and other Pacific island populations. Further analyses also indicate the possibilities of (a) Amazonians descending from an ancestor of Anadamanese and other Australasian populations, perhaps more plausibly, (b) ancestral admixture of Amazonians and ancestors of Native Americans, termed the population “Y”. While questions remain about how the “Y” populations migrated into South America, this study warrants genomic analyses of more ancient remains to fill up the blanks.

Raghavan et al. Science (2015) Genomic evidence for the Pleistocene and recent population history of Native Americans

Raghavan et al. (2015) analyze whole genome sequences of 31 present day individuals from the Americas, Siberia, and Oceania (with a similar sampling strategy as Skoglund et al. (2015)), 23 ancient genomes from the Americas, and SNP genotypes from 79 individuals from the Americas and Siberia. Admixture analyses indicate structuring of all Native Americans into one cluster (at K=4), indicating common ancestry of all Native Americans. At K=15, however, some Native American individuals are indicative of shared ancestry with Anzick-1 (from the Clovis site), with others clustering with Siberians, further ascertained by admixture graph analyses. Estimation of time of divergence between Native Americans, Siberians, and Han Chinese indicated a unanimous splitting time of around ~23,000 ybp for both Native American groups. Analyses of SNP chip data however reveals a similar story as reported by Skoglund et al. (2015), indicative of an ancestral admixture event which resulted in Oceanic ancestry in some Native American populations, however purportedly more recent – particularly after the peopling of the Americas. Studying the ancient genomes also revealed no evidence of admixture of Oceanic populations into ancient American peoples, further indicating no support for the Paleoamerican model.


Skoglund et al. “Genetic evidence for two founding populations of the Americas.” Nature (2015) DOI:

Raghavan et al. “Genomic evidence for the Pleistocene and recent population history of Native Americans.” Science (2015) DOI:

Posted in genomics, next generation sequencing, Paleogenomics, population genetics | Tagged , , , , | 2 Comments

Dozens of talks from the Evolution 2015 meetings are on YouTube

If, like me, you didn’t make it to the 2015 Evolution meetings — maybe the logistics of a trip to Brazil were beyond your financial and/or temporal means — you can make up for it with the big cache of videos posted to the conference’s YouTube channel. This is the second year the joint annual meeting of the American Society of Naturalists, the Society of Systematic Biologists, and the Society for the Study of Evolution has taken video of research presentations (with the permission of the presenters), and it’s good to see the practice continuing.

There are many, many talks to peruse, but here’s just one that looks like it’ll be of interest to Molecular Ecologist readers: Diego F. Alvarado-Serrano proposing a new, spatially-oriented version of the site-frequency spectrum, that may help understand historical changes in species’ ranges.

Posted in community, conferences, phylogeography, population genetics | Tagged | 3 Comments

Dispersal and the rainbow trout takeover


I’m going to keep rolling on the dispersal theme from last week and share a new paper by Ryan Kovach and colleagues that demonstrates the balance between dispersal and selection. Specifically, the authors show that this balance dictates the hybridization between a native and invasive trout species.

The authors utilized data from two populations of cutthroat trout that spans 24 years in order to detect changes in rainbow trout ancestry and quantify associated phenotypic variation. In this case, the danger for cutthroat trout populations is very real: too much hybridization with rainbow trout can lead to a hybrid soup of genomes in which native genomes dissappear (Allendorf and Leary 1988).

Figure 1 from Kovach et al. (2015)

Figure 1 from Kovach et al. (2015) showing the relationship between rainbow trout (RBT) admixture and length (a) or early migration (b)

The identification of genetic introgression from rainbow trout increased dramatically from 1984 to 2003 (from 0% to 87% in one adult population!). And if you are a hybrid salmon, the more rainbow trout genes you can get, the better. As the proportion of rainbow trout alleles goes up, body size goes up and time until migration goes down: two factors strongly associated with fitness.

However, the proportion of rainbow trout alleles entering the cutthrout populations was much greater than the proportion of alleles leaving, indicating selection against hybrids. And the selection coeffiicients against these hybrids were strong to boot, up to 0.88!

This left Kovach et al. with a simple explanation: dispersal by rainbow/cutthroat hybrids plays a huge role in the increase of hybrids over the past 24 years.

Thus, our study shows that combining data on fitness and dispersal is necessary to fully understand the mechanisms driving invasive hybridization and other eco-evolutionary dynamics [59]; the paucity of such data in wild animal populations makes this a novel step forward in our empirical understanding of how invasive introgression can spread in natural populations.



Allendorf, F. W., & Leary, R. F. (1988). Conservation and distribution of genetic variation in a polytypic species, the cutthroat trout. Conservation Biology, 170-184.

Kovach, R. P., Muhlfeld, C. C., Boyer, M. C., Lowe, W. H., Allendorf, F. W., & Luikart, G. (2015). Dispersal and selection mediate hybridization between a native and invasive species. Proceedings of the Royal Society of London B: Biological Sciences, 282(1799), 20142454.

[59] above Lowe, W. H., & McPeek, M. A. (2014). Is dispersal neutral?. Trends in ecology & evolution, 29(8), 444-450.

Posted in adaptation | Tagged , , | Leave a comment

What do with all those pesky mtDNA reads in your NGS experiment

Have you ever noticed how many reads from your high throughput sequencing project map to the tiny fraction of your genome that is the mitochondrial genome (mtDNA)? Pretty much any NGS experiment (e.g., RNA-seq, DNA-seq, capture-based sequencing) leave you with ultra-deep coverage of mtDNA. But what do you do with them? The most common option is to ignore reads mapping to mtDNA. An even less common option is to turn them into a Science paper . But what if you want to do something with those reads and not publish it in Science? Continue reading

Posted in bioinformatics, genomics, howto, mutation, software, Uncategorized | Tagged , | Leave a comment