Sweptaway – Part 1

Brace yourselves for a series of new posts on selection, especially with articles from the special Molecular Ecology issue on “Detecting selection in natural populations: making sense of genome scans and towards alternative solutions” starting to roll out!

Selective sweeps (i.e reduction in genomic diversity at regions linked to positively selected, and fixed mutations in a population – also see my previous post here) are commonly identified by quantifying polymorphism at linked sites. Commonly used methods to characterize sweeps include tests with Tajima’s D, Fay and Wu’s H, the HKA chi-squared statistic, etc. Likelihood-based frameworks to infer selective sweeps include SweepFinder (Nielsen et al. 2005), SweeD (Pavlidis et al. 2011), XP-CLR (Chen et al. 2010), among others. However, reduction in genomic diversity need not necessarily be due to selective sweeps – alternate explanations could include population bottlenecks, background selection (see my earlier post explaining this), and unusually slow mutation rates. Huber et al. (2015) in their recent paper on “Detecting recent selective sweeps while controlling for mutation rate and background selection” describe an addition to Nielsen et al. (2005)’s SweepFinder.

To control for mutation rate (and possibly selective constraints), Huber et al. (2015) suggest the inclusion of invariant, fixed sites (with respect to an outgroup) in the analyses. Simulation analyses suggest increases in the power of the Composite Likelihood Ratio (CLR) test, and decrease in False Positive Rates (FPR) in detecting sweeps, with this inclusion of fixed differences. Similarly, controlling for background selection (by inclusion of a B-value map) showed greater power in detecting sweeps while including all sites i.e. polymorphic, and fixed invariant sites in the analyses.

Using the reduction in diversity relative to divergence as a necessary hallmark of a selective sweep in our model also helps to reduce false positives, e.g. in the case of a recent population bottleneck.

Nucleotide diversity across the X chromosome in CEU, YRI, and JPT human populations, compared with regions of Neandertal introgression. Image courtesy: Dutheil et al. (2015) doi/10.1371/journal.pgen.1005451.g005

In a neat application of detecting selective sweeps, Dutheil et al. (2015) also offer an alternate explanation to the observed levels of reduced diversity and divergence in human X chromosomes (with respect to Chimpanzees) to the controversial hypothesis of Patterson et al. (2006). X chromosomes have smaller effective population sizes (due to hemizygosity in males) compared to autosomes, and thus expected to be drifting more, and show lower divergence between humans and chimpanzees. By By studying the patterns of incomplete lineage sorting (ILS) along X chromosome by simulating gene trees and estimating demographic parameters under a divergence model (CoalHMM), Dutheil et al. (2015) report a bimodal distribution for ILS along the X chromosome, with 8 regions identified as exhibiting particularly low proportions of ILS. Analyses of the effect of background selection on ILS showed that only about 31% of X chromosome windows could be explained solely due to background selection, whereas comparison of diversity across the X chromosome across different human populations, and the neandertal showed that low ILS regions predominantly evolve by recurrent selective sweeps. They argue that the observed large-scale reductions in diversity in extant human populations are also not plausible under a model of secondary contact between humans and chimpanzee ancestors (as suggested by Patterson et al. 2006).

Whatever the underlying mechanism, our observations demonstrate that the evolution of X chromosomes in the human chimpanzee ancestor, and in great apes in general, is driven by strong selective forces. The striking overlap between the low-ILS regions we have identified and the Neandertal introgression deserts identified by Sankararaman et al. further hints that these forces could be driving speciation.

References: Huber, Christian D., et al. “Detecting recent selective sweeps while controlling for mutation rate and background selection.” Molecular Ecology (2015). DOI: http://dx.doi.org/10.1111/mec.13351

Dutheil, Julien Y., et al. “Strong selective sweeps on the X chromosome in the human-chimpanzee ancestor explain its low divergence.” PloS Genetics (2015). DOI: http://dx.doi.org/10.1371/journal.pgen.1005451

Chen, Hua, Nick Patterson, and David Reich. “Population differentiation as a test for selective sweeps.” Genome research 20.3 (2010): 393-402. DOI: http://dx.doi.org/10.1101/gr.100545.109

Pavlidis, Pavlos, et al. “SweeD: likelihood-based detection of selective sweeps in thousands of genomes.” Molecular biology and evolution (2013): mst112. DOI: http://dx.doi.org/10.1093/molbev/mst112

Nielsen, Rasmus, et al. “Genomic scans for selective sweeps using SNP data.”Genome research 15.11 (2005): 1566-1575. DOI: http://dx.doi.org/10.1101/gr.4252305

Sankararaman, Sriram, et al. “The genomic landscape of Neanderthal ancestry in present-day humans.” Nature 507.7492 (2014): 354-357. DOI: http://dx.doi.org/10.1038/nature12961

Patterson, Nick, et al. “Genetic evidence for complex speciation of humans and chimpanzees.” Nature 441.7097 (2006): 1103-1108. DOI: http://dx.doi.org/10.1038/nature04789

RedditDiggMendeleyPocketShare and Enjoy
Posted in adaptation, evolution, genomics, mutation, population genetics, selection, speciation, theory | Tagged , , , , , | Leave a comment

Environmental association analyses: a practical guide for a practical guide

"Do you have the kind with latent variables? I think it is a yellow bottle?"

“Do you have the kind with latent variables? I think it’s a yellow bottle?”

Obtaining extensive SNP data for your organism of choice isn’t such a feat these days, but actually matching that breadth of data with appropriate analyses is still a challenge. Fortunately, there has been an avalanche of new methods to make these connections between genetic variation and environment more clear. Unfortunately, the recent surge in new methodologies sure makes decision making tough. What are the drawbacks of different methods? What works for my data? Why didn’t I think of this before I generated all these SNPs?

If only there was some sort of….practical guide.

Continue reading

Posted in adaptation, association genetics, methods, Molecular Ecology, the journal | Tagged | Leave a comment

And who made your beer?

In the spirit of it being almost Friday, and while we’re on the topic of your favorite beverages – perhaps wine puts you to sleep, couldn’t care less where it came from, but prefer the bitterness of lager beers at your tailgate party? While ales are primarily fermented with the same yeast species as wines, Sacchromyces cerevisiae, alloploid hybrid strains of Sacchromyces eubayanus with S. cerevisiaeS. carlsbergensis (Saaz), and S. pastorianus (Frohberg) are used extensively in brewing lagers, the origins of which are yet contentious. Baker et al. (2015) sought out to solve this mystery with a high-quality de novo assembly of the S. eubayanus genome with the Yeast Genome Assembly Pipeline.

Hypothesized origins of Frohberg and Saaz strains - image courtesy Baker et al. (2015) http://mbe.oxfordjournals.org/content/early/2015/08/20/molbev.msv168/F6.large.jpg

Hypothesized origins of Frohberg and Saaz strains – image courtesy Baker et al. (2015) http://mbe.oxfordjournals.org/content/early/2015/08/20/molbev.msv168/F6.large.jpg

Analysis of maltose (MAL) utilization genes revealed 14 genes across 4 chromosomes, and 2 pseudogenes. Comparison with several other MAL genes across Sacchromyces (eg. MAL genes in ale counterparts) showed large levels of sequence identity (98%), with phylogenies indicating common origin of all MAL genes across brewing yeasts. Genome-wide signatures of domestication were quantified using dN/dS computations along each lineage, showing significantly (P < 1e-21) greater proportions of nonsynonymous substitutions in the Saaz and Frohberg lineages, while the two lineages themselves did not differ from each other, suggesting little/relaxed purifying selection and/or population bottlenecks, compared to non-lager lineages. Similar higher proportions of nonsynonymous substitutions were also observed among metabolic genes such as NOT3, ADR1, and ADH2. Testing levels of neutral divergence (equal levels implying single origins of Saaz and Frohberg from a single nucleus hybrid of S. cerevisiae and S. eubayanus, compared to different levels due to multiple origins) indicated support for the multiple origin theory, with dS (rate of synonymous substitutions) being over ten times higher in S. cerevisiae subgenome of lager strains, than in the S. eubayanus subgenome.

In this context, these results suggest that the Saaz and Frohberg lineages were created by at least two distinct hybridization events between nearly identical strains of S. eubayanus with relatively more diverse ale strains of S. cerevisiae.

Cos it’s almost Friday!



Baker, EmilyClare, et al. “The genome sequence of Saccharomyces eubayanus and the domestication of lager-brewing yeasts.” Molecular biology and evolution(2015): msv168. DOI: 10.1093/molbev/msv168



Posted in adaptation, evolution, genomics, natural history, next generation sequencing, phylogenetics, population genetics, selection, speciation, yeast | Tagged , , , , | 1 Comment

The unforeseen genomic consequences of domestication

Image credit

When a desired genome is selected for propagation, all mutations, beneficial, neutral, or deleterious, shift in frequency, and this sometimes can have unforeseen consequences.

Natural selection takes the good with the bad. Beneficial and harmful mutations combine to provide a net effect that is selected upon. You may remember from your most-recent genetics course that this is called the Hill-Robertson effect.

One would predict that this balance between beneficial and deleterious mutations is especially apparent in taxa that have undergone domestication. When population sizes shrink and selection is relaxed on characters that were important in the wild but not necessarily when domesticated, what are the consequences for genomes?

In a recent Molecular biology and evolution paper, Sebastien Renaut and Loren Rieseberg predict that domesticated lines should have reduced genetic variability, more deleterious mutations, and these mutations should be found in areas of the genomes that have the lowest recombination rate (where it is harder to get rid of them).

Simply put, the combined effects of a reduction in effective population size and fast population growth during domestication can drag along nonadaptive mutations, especially those that arose during the process of domestication itself.

That is simply put!

To follow up on those predictions, the authors gathered transcriptome data from sunflowers (Helianthus annuus): two wild types (one a weed representative from outside the native range) and two types of domesticated lines (Landrace and Elite). Indeed, domesticated lines had lower genetic variability and a greater proportion of nonsynonymous mutations. Using the bioinformatic method PROVEAN, they determined that up to 14% of those nonsynonymous mutations were classified as deleterious. Lastly, they demonstrated that these deleterious mutations are likely in areas of the genome with low recombination rates.

Figure 1 from Renault and Reiseberg

Figure 1 from Renaut and Rieseberg


Call me American, but I  had no idea what a cardoon was.

Even more interesting, they repeated these methods with two other domesticated crops (cardoon and globe artichoke) and saw the same pattern.

But it isn’t just being domesticated, how a line is domesticated matters. The elite lines have been under multiple stages of domestication compared to the landrace lines, but don’t show the predicted higher loads of deleterious mutations. Renaut and Rieseberg suggest that the modern selection practices used in the elite lines, such as crossing wild alleles back into cultivars, may help relieve the burden of deleterious mutations from domestication.


Renaut, S., & Rieseberg, L. H. (2015). The accumulation of deleterious mutations as a consequence of domestication and improvement in sunflowers and other Compositae crops. Molecular biology and evolution, 32(9): 22732283.

Posted in domestication, genomics, plants, selection, transcriptomics | Tagged , | Leave a comment

Another uninterpretable epigenetics study

If you looked at your Twitter feed on Sunday you likely saw a lot of buzz about a new study that found that “Holocaust survivors trauma is passed on to children’s genes”. Many people have already taken time to blog about the issues with this study, but I wanted to ensure that the message was passed on to the molecular ecology community because I think that it is relevant to the community. So here is a quick (and by no means comprehensive) list of why you should be skeptical about the study (and, importantly, why the authors should not have been able to draw the conclusions that they drew): Continue reading

Posted in Uncategorized | 2 Comments

Where’s your wine from?

Human-mediated selection of yeast cultures has played a huge role in the development of numerous unique strains of Sacchromyces cerevisiae, often attributed to production of a wide variety of wines the world over. Previous studies have indicated a single domesticated origin of S. cerevisiae, termed the “Wine-European” group, but detailed demographic history of the species has been thus far mired by insufficient sampling of wild cultures (especially from oak niches in the Mediterranean) that often coexist with domesticated strains. Almeida et al. (2015) analyze whole genome sequences of 145 strains of S. cerevisiae to understand (a) population genomic structure, and (b) ancestral demography of wild and domestic strains.

Network analysis showing geographic separation of S. cerevisiae strains. Image courtesy: Figure S1 of Almeida et al. (2015)

Network analysis showing geographic separation of S. cerevisiae strains. Image courtesy: Figure S1 of Almeida et al. (2015)

Phylogenetic network analyses indicated the positioning of Mediterranean oak strains, and commonly used wine strains in one horizontal extremity, and the other extremity occupied by North American, Asian African, and Caribbean strains, along with six Mediterranean strains. Largely geographical separations of strains were identified at K=10 ancestral subpopulations while using STRUCTURE under an admixture model. Haplotype structure indicated high degrees of shared structure between wine and Mediterranean oak strains, and complex admixture history among all sampled strains, also supported by maximum likelihood phylogenies. Diversity analyses indicate similar levels of polymorphism and diversity in the wine and Mediterranean oak strains, which are both lower than genomic diversity North American, or Asian strains. Demographic analyses of a subset of intronic regions under models of isolation, versus isolation with migration, versus population size growth post isolation using δaδi indicated better fit of the growth model. Estimates of demographic parameters under the growth model showed relatively low migration between strains, a strong bottleneck in the wine strains, with the isolation dated to around 5400-5000 BC, coincident with the first biochemical evidence of wine.

Perhaps you’d like to look at a genome list, Ms. Knope?

As in the case of crop and livestock domestication, linking wild and domesticated microbe genotypes is an essential step for understanding the roots and trajectories of man-driven artificial selection.


Almeida, Pedro, et al. “A Population Genomics Insight into the Mediterranean Origins of Wine Yeast Domestication.” Molecular Ecology (2015). DOI: 10.1111/mec.13341

Posted in domestication, evolution, genomics, horizontal gene transfer, microbiology, Molecular Ecology, the journal, next generation sequencing, phylogenetics, phylogeography, population genetics, STRUCTURE, yeast | Tagged , , , | 1 Comment

Fossils and phylogenetics meet in the evolutionary middle

Image by Matt Mechtley

Image by Matt Mechtley

…if evolutionary biologists are intent on documenting the history of life, we need methods that can at least approximate patterns of evolution in deep time for clades without fossil information.

A scientists who wants to understand the evolutionary history of a group of organisms has some serious roadblocks. One of the most obvious of these issues: the majority of the evolutionary history is gone. Dead. Extinct! So unless you have a nice fossil record (most clades do not), you are left trying to understand millions of years of evolution by looking only on what is extant today. For the most part, this doesn’t provide the most accurate picture.

Jonathan Mitchell provides another cautionary tale about interpreting evolutionary history using only extant taxa in a new paper in Evolution. He uses birds as a study system, which have some of the most complete fossil records and phylogenies of all vertebrates, to test the two main hypotheses of when avian evolutionary radiations happened: one at the base of the tree during the Cretaceous and again when the Passeriformes first appeared.

Using morphological and phylogenetic data, Mitchell shows (as expected) that the model of diversification best supported by the data greatly depends on which taxa you include. When fossil data isn’t considered, the result tends to underestimate the diversity in the fossil record. However, the ratio of within-to-between clade differences does provide some evidence of early radiations. These combined results cause Mitchell to predict that modern radiations have washed out the signal of older radiations, providing another example of why interpretations of evolutionary history from only extant taxa can be problematic (feathered or otherwise).

The variance and range in morphology observed in the fossil assemblages from the “halfway point” of avian evolution is ~70% of the modern, which is substantially higher than models based solely on extant taxa would predict for ~50Ma. This observation, of crown Aves having achieved such ecological disparity by the Eocene, stands in stark contrast to expectations from modern data alone. None of the models based on extant taxa only consistently predicted this high level of early disparity, and the fossil-informed method was unable to predict both the high level of ancient disparity and the relatively low modern disparity simultaneously. All of these models are known as extreme simplifications, but they are commonly used to at least predict the broad contours of morphological evolution.


Mitchell, J. S. (2015). Extant‐only comparative methods fail to recover the disparity preserved in the bird fossil record. Evolution. DOI: 10.1111/evo.12738


Posted in phylogenetics | Tagged , , | 2 Comments

What’s the most replicated finding in population genetics?

Cloning Experiments:  Jess Payne

The more the merrier. (Flickr: Dan Foy)

DrugMonkey tells a tale of a specific finding in addiction research — that rats provided with an intravenous drip of cocaine solution will push a lever to self-administer the drug — which has been replicated countless times over the decades. Past the point of usefulness, you’d think. But it turns out that in all this replication, folks have turned up a lot of factors that make the replication, um, not replicate. Everything from the cocaine dose available in each infusion to whether or not the rat-handler wears a clean lab coat.

And this, as he concludes, has taught addiction researchers a lot about the mechanisms underlying a seemingly unassailable “classic” result.

I can’t speak to how many “failure to replicate” studies were discussed at conferences and less formal interactions. But given what I do know about science, I am confident that there was a little bit of everything. Probably some accusations of faking data popped up now and again. Some investigators no doubt were considered generally incompetent and others were revered (sometimes unjustifiably). No doubt. Some failures to replicate were based on ignorance or incompetence…and some were valid findings which altered the way the field looked upon prior results. Ultimately the result was a good one. The rat IVSA model of cocaine use has proved useful to understand the neurobiology of addiction.

Science! Where you learn things even when you screw up. (Maybe especially when you screw up.)

It left me wondering, though, what the equivalent experiment or result would be for my own field, population genetics. My first thought is isolation by distance, the finding that populations distributed across a landscape will show greater genetic differentiation as the geographic distance between them grows, even if there is no meaningful difference in the environments they encounter. Testing for IBD is a terribly basic thing to do with your shiny new population genetic dataset, and it’s no surprise when it turns up — but if you don’t find it, you know something odd is going on.

Or maybe there’s a better alternative that I haven’t thought of? Submit your nominations in the comments.

Posted in population genetics | Tagged , | 3 Comments

Survival of the fittest: a marine snail toughs it out through a salty time

Dendropoma and its its associated calcareous algae Neogoniolithon brassica-florida. Photo from the Mediterranean Sea climate and environmental change blog

The vermetid gastropod Dendropoma and its its associated calcareous algae Neogoniolithon brassica-florida. Photo from the Mediterranean Sea climate and environmental change blog.

For marine organisms, salinity plays an important role in determining how populations and species are distributed across time and space, particularly in the Mediterranean Sea. During the Mesozoic, about 252 to 66 million years ago, the Tethys Ocean, a body of water that would become the Mediterranean Sea, connected the Atlantic and Pacific oceans. According to Lejeusne et al., “the Mediterranean is a peculiar sea, a product of a tormented geological history, where continents collide and water masses come and go, a crossroads of biogeographical influences between cold temperate biota and subtropical species.” Continue reading

Posted in Uncategorized | Leave a comment

Models matter when linking genetic diversity to niche model predictions

Ecological niche models and the methods to create them continue to evolve. These techniques provide a tidy way to relate the distributions of taxa to environmental variables from the present, past, or future. Oh, and they are pretty too:

Image from Bram Breure

Those pretty maps of niche models assign some measure of suitability to each pixel, which indicates how close the conditions at that point are to a species’ most-optimum environmental conditions. One particular juicy prediction that results from niche modelling is that there may be a positive relationship between the “probability of occurrence” and the abundance of a species. Having this sort of forecasting power would provide some helpful inference for population demography and associated genetic diversity, right?

However, combining genetic data with ecological niche models can be a tricky business. Some of the concerns and caveats are summarized nicely in a review by Alvarado-Serrano and Knowles last year in Molecular Ecology Resources. One empirical example for these considerations appears in the same journal from Diniz-Filho and colleagues. They used 14 general methods for niche modelling with four climate data models and tested how these variations affect inferences of genetic diversity (heterozygosity measured by microsatellites) of Dipteryx alata, a widely distributed tree species in Brazil.

The correlation between He and environmental suitability would then reflect the effects of variable population size in geographical space that, under distinct environmental conditions, leads to a well-known pattern in which larger populations are able to maintain more genetic diversity.

Modelling method was the most influential factor in predicting genetic diversity, with our old friend Maxent having the highest mean correlation across the different climate models (Pearson correlation = 0.438, = 0.037). However, the overall message is that the type of modelling methodology can have drastic effects on the correlations between predicted occurrence and genetic diversity.

Figure 2 from Diniz-Filho et al. (2015)

Figure 2 from Diniz-Filho et al. (2015)


As an alternative to stacking an ensemble of methods for modelling, Diniz-Filho and colleagues provide a R-script that creates distributions of correlations with random combinations of modelling method and climate data, providing the ability to simply visualize if variation in modelling method affects the ability to detect a pattern of interest (genetic diversity here, but could be something else!).



Alvarado‐Serrano, D. F., & Knowles, L. L. (2014). Ecological niche models in phylogeographic studies: applications, advances and precautions. Molecular Ecology Resources, 14(2), 233-248.

Diniz‐Filho, J. A. F., Rodrigues, H., Telles, M. P. D. C., Oliveira, G. D., Terribile, L. C., Soares, T. N., & Nabout, J. C. (2015). Correlation between genetic diversity and environmental suitability: taking uncertainty from ecological niche models into account. Molecular Ecology Resources, 15(5), 1059-1066.


Posted in methods, R | Tagged , , | Leave a comment