Best laid plans are probably not best laid … As I mentioned before, I had every intention of writing up posts on interesting papers as well as highlighting the hosts gracious enough to house/feed/guide us around this summer. Alas, time was my enemy and I barely kept up with emails while flying and driving around the northern hemisphere.

At long last, my part of this international effort has come to an end and I am happy to be able to return to normalcy, including catching up on all the posts that were sidelined.

Continue reading

RedditDiggMendeleyPocketShare and Enjoy
Posted in Uncategorized | 1 Comment

Sweptaway – Part 2

Numerous methods have been developed over the last few years for the detection of selective sweeps (hard and soft – see my previous post). This week, we look at three new studies that (a) compare existing methods to detect sweeps (Vatsiou et al. 2015), (b) develop a new method to detect hard-sweeps (Pybus et al. 2015), and (c) develop the theory behind detecting soft-sweeps under a unique mutation sweeping in response to environmental perturbations (Berg and Coop 2015).

Genomic signatures of hard and soft sweeps explained in this infographic by Nandita Garud – courtesy https://nanditagarud.files.wordpress.com/2014/11/figure1_resized.png

Vatsiou et al. 2015 – Comparison of methods

Vatsiou et al. (2015) in this study compare the performance of seven recent methods to detect selective sweeps from genomic data – broadly using “genome-scans” of differentiation, quantifying genetic variation along a chromosome within a population, or using physical linkage maps around selected SNP’s to study lengths of homozygous haplotypes (also called IBD segments), or on multilocus differentiation. The authors simulate data under three different models of population evolution (island, stepping-stone, and hierarchical island) models under hard and soft sweeps, with sweeps commencing at migration-mutation-drift equilibrium frequencies. By comparison of estimates of selection, and FDR’s on SNP’s across windows, the study reports (1) low power and high FDR while using EHHST and XP-EHHST, (2) strongly detrimental effect of increased migration on performance of all methods, (3) strong effect of initial allele frequency on power of all methods to detect soft sweeps.

As we have shown, no single method is able to detect both starting and nearly completed selective sweeps. Combining several methods (e.g. XPCLR or hapFLK with his or nSL) can greatly increase power to detect a wide range of selection signatures.”

Pybus et al. 2015 – Hierarchical boosting to detect hard sweeps

Pybus et al. (2015) develop a new method for the detection of hard-sweeps by training the model with different evolutionary scenarios resulting in final allele frequencies of the selected allele, and with the age of the sweep. Using a new “hierarchical boosting” (HB) algorithm, their method classifies the genome into different evolutionary scenarios (eg. complete versus incomplete sweeps). By analysis of SNP’s in the 1000 Genomes Project data, coalescent simulations of different selective scenarios by varying the times of sweeps, and final allele frequencies, Pybus et al. (2015) compare the performance of the HB algorithm against nine popularly utilized methods for detection of sweeps (including several methods used by Vatsiou et al. 2015 above). They report (1) highest sensitivity of the HB algorithm among all methods considered to detect complete hard sweeps, and (2) lower sensitivity in detecting incomplete sweeps using both simulated and real data.

This study offers a unique and powerful way of detecting candidate regions in the genome that have been evolving under positive selection in a more reliable way than many lists produced by single selection tests or even some other existing composite methods. It also distinguishes, in many cases, the final state (complete/incomplete) and the relative age (ancient/recent) of a given selective event.

Berg and Coop 2015 – Analytical formulae for polymorphism under soft-sweeps

Soft sweeps – i.e. adaptive, positive selection on standing allelic variation can plausibly be characterized by two processes; positive selection on multiple independent mutations at a locus, and associated hitchhiking of neutral variants, versus a single unique mutation that segregates neutrally as standing variation until perturbed by environmental change, and is thence swept. Berg and Coop build the theory to study the signatures of soft-sweeps under the latter model, particularly its effects on polymorphism after the sweep. By modeling the probabilities of escaping the sweep by recombination (during the ‘sweep’ phase), and that of coalescence in the ‘standing’ phase, the authors derive analytical expressions for the (a) reduction in diversity, (b) number of segregating sites, and (c) frequency spectra under the soft-sweep model, explored via simulations.

Unfortunately, our work largely confirms the intuition and existing results indicating that standing sweeps are likely to be rather difficult to identify, and characterize, on the basis of genetic data from a single population time-point, and when they can be identified, they may be difficult to distinguish from classic hard sweeps.


Vatsiou, Alexandra I., Eric Bazin, and Oscar E. Gaggiotti. “Detection of selective sweeps in structured populations: a comparison of recent methods.”Molecular ecology (2015). DOI: http://dx.doi.org/10.1111/mec.13360

Pybus, Marc, et al. “Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations.”Bioinformatics (2015): btv493. DOI: http://dx.doi.org/ 10.1093/bioinformatics/btv493

Berg, Jeremy J., and Graham Coop. “A Coalescent Model for a Sweep of a Unique Standing Variant.” Genetics (2015): genetics-115. DOI: http://dx.doi.org/ 10.1534/genetics.115.178962

Posted in adaptation, bioinformatics, evolution, genomics, methods, population genetics, selection, software, theory | Tagged , , , | Leave a comment

The Goldilocks zone of missing data

One of the more adorable members of the Iguania

One of the more adorable members of the Iguania. Photo by Rob Denton

Reduced representation sequencing approaches, such as RADseq and UCEs, have provided some fascinating inferences in recent years, but something has always been missing in these analyses: data. As sampled taxa become more divergent, the price paid for more loci is more missing data. The extent to which this is a problem has been debated, and there is no best recommendation for balancing the choices of number of taxa, number of loci, and acceptable percentage of missing data.

More generally, it is not clear how sampling for targeted-sequence capture studies should be designed (given finite resources). Should studies try to obtain large numbers of loci for a more limited set of taxa? Or more taxa and fewer loci? Should taxa or loci with missing data be excluded? What amount of missing data should be allowed? Do the answers to these questions change when applying concatenated versus species tree approaches? These fundamental questions have barely been addressed.

Streicher, Schulte, and Wiens provide a new empirical example to further inform these decisions in an upcoming issue of Systematic Biology (Don’t have access? A version can be found here too). They take a dataset of UCEs from Iguanian lizards and create different variations by adjusting the number of taxa sampled (44, 29, or 16) and the percentage of missing data per locus (20%, 30%, 40%, 50%, 60%). The resulting 15 datasets were used to create phylogenies with both concatenated (RAxML) and species-tree (NJst) methods. The authors then looked for clades with previously-supported monophyly and compared the support between datasets and methodologies to the “true” relationships.

Figure 1 from Streicher et al. (2015) describing the relationship between the number of loci and percent missing taxa among their datasets

For both types of analysis, branch support was maximized when up to 50% of taxa were missing per gene. That isn’t to say that the missing values themselves improved inferences, but that threshold allowed for a greater number of taxa and genes to be used overall.

We show that allowing more missing data can increase the number of taxa and loci that are included, and increase support for estimated relationships (but that including the maximum amount of missing data does not necessarily maximize support).

That’s right, more missing data is helpful for capturing more loci/taxa, but too much missing data is a problem no matter how many loci/taxa get included. The most peculiar thing of all is that the two method for building the phylogenies perform best under opposite sampling strategies: concatenated analyses are most accurate with maximum taxon sampling and moderate locus sampling, species-tree analyses are most accurate with minimum taxon sampling and extensive locus sampling. Method matters!

The authors suggest other researchers avoid the removal of loci based on the fear of missing data, since the added breadth of genes and possibly taxa is likely more beneficial. However, more caution than ever is required since different sampling strategies cause very different results within each method.

These considerations are most apparent when branch lengths are short. To avoid these problems entirely, find some nice long branch lengths to resolve. But where’s the fun in that?

Thus, we show that some sampling strategies must be yielding incorrect but strongly supported results. While this sensitivity may be largely confined to short branches in this ancient, rapid radiation, it is just such branches that phylogenomic data may be needed to resolve.


Streicher, J. W., Schulte, J. A., & Wiens, J. J. (2015). How Should Genes and Taxa be Sampled for Phylogenomic Analyses with Missing Data? An Empirical Study in Iguanian Lizards. Systematic Biology, doi: 10.1093/sysbio/syv058.

Posted in evolution, methods, next generation sequencing, phylogenetics | Tagged , , , | Leave a comment

Small mammalian genomics of adaptation

While large mammals have had their day on our blog, two recent studies on small mammals reveal the genetics of size evolution in island mice, and differential introgression of mitochondrial and nuclear genomes in chipmunks – steps towards understanding the process of adaptive evolution to new environments post divergence.

Gray et al. (2015) study of the adaptation genetics through a common-garden cross-breeding experiment of the largest species of wild house mice – variants of Mus musculus domesticus on Gough Island (GI), a volcanic island in the Atlantic Ocean. These mice were introduced to the island within the last two centuries, and weigh almost twice as much as their common mice relatives. After creating inbred lines from cross-bred Gough Island mice and mainland common mice, all mice were phenotyped over 16 weeks, and genotyped with SNP arrays for QTL studies. Analyses of growth trajectories revealed significant weight differences between wild and lab-reared GI mice, and between GI and wild mice raised in similar environments. F1 hybrids from GI and wild mice indicate similarities in growth up until 3 weeks, and closer to mid-parent values beyond, with F2’s varying widely in weight at all ages, apart from evidence of sex related, maternal, and line effects. They identify 8 QTL for weight differences, and 11 for growth rate differences, with variance in timing of their greatest effects across crosses, and offer hypotheses for the evolution of body size in newly isolated populations of mice.

This research is a first, necessary, and foundational step toward pinpointing the genetic variants responsible for increased size in GI mice. Identifying the causative genes and mutations will allow several intriguing evolutionary questions to be answered. Did selection on Gough Island target standing variants or new mutations? Do the causative loci show signatures of adaptive evolution predicted by strong selection on individual loci or by selection spread across many genes?

Smart adaptation to food availability…

Good et al. (2015) in a study of introgression during “secondary contact” in two species of chipmunks – yellow-tailed Tamias amoenus, and red-tailed Tamias ruficaudus, attempt to characterize the frequency and importance of hybridization during speciation. Via genome-wide targeted re-sequencing of over 10,000 exonic regions, characterization of the species super-trees for autosomal and X chromosome loci, the authors report discordance of up to a 20 times lower potential introgression in the X. With further analyses using an ABBA-BABA framework, indicated less than 1% of total nuclear loci to have introgressed from either species, a stark contrast to mitochondrial work previously suggested.

"The Essence of the Yellow Pine Chipmunk" - Image courtesy Alex Badyaev - www.tenbestphotos.com

“The Essence of the Yellow Pine Chipmunk” – Image courtesy Alex Badyaev – www.tenbestphotos.com

With respect to the overall importance of hybridization, we must ultimately understand what the contribution of introgressed alleles has been to adaptive evolution…However, to understand the broader importance of hybridization to adaptation we must ultimately understand what proportion of adaptive variants derive from introgression versus mutation and standing genetic variation within species.


Gray, Melissa M., et al. “Genetics of Rapid and Extreme Size Evolution in Island Mice.” Genetics (2015): genetics-115. DOI: 10.1534/genetics.115.177790

Good, Jeffrey M., et al. “Negligible nuclear introgression despite complete mitochondrial capture between two species of chipmunks.” Evolution (2015). DOI: 10.1111/evo.12712

Posted in adaptation, domestication, evolution, genomics, natural history, pedigree, phylogenetics, population genetics, selection | Tagged , , , | Leave a comment

Picking the ripest model with PHRAPL

Authors are free to use my awful logo, no charge.

Authors are free to use my awful logo, no charge.

To study patterns of genetic variation is to consider scale. The choices an investigator makes when designing a study can produce such a beautiful breadth of evolutionary patterns: from populations to species, from local to continental, from ancient to contemporary. The fields that combine to describe these different scales are sometimes disparate and sometimes highly integrative. Fields like population genetics have benefitted from extensive theory that has been reproduced in the field and the lab. Fields like phylogenetics have rapidly expanded thanks to the power provided by expanding sequencing/computing technology.

Phylogeography, the discipline in between population genetics and phylogenetics, gets the advantages of both fields along with twice the headaches. A lot of things can happen while populations shrink, expand, differentiate, and come to together again. This complexity makes fitting analyses that were designed for different scales a challenge. For example, most phylogenetic methods used in a phylogeographic context will show the evolutionary relationship between taxa. However, these techniques aren’t necessarily designed to consider gene flow between groups, which can alter conclusions. Similar limitations happen in the opposite direction when an analytical method for population genetics is scaled up to answer phylogeographic questions.

If you are interested in understanding the demographic history of a taxon, this all adds up to a lot of parameters to consider: population size changes, migration, divergence, drift, among others. The number of evolutionary scenarios that could have shaped a specific pattern of genetic variation can be staggering. Therefore, phylogeographers have often limited themselves to only the models they would most expect given the information they already possess. As you can imagine, this is often a tenuous situation, especially in cases where exploratory analyses are needed.

Continue reading

Posted in methods, phylogeography | Tagged , , | Leave a comment

Testing local adaptation at latitudinal and elevation range edges

Plantago lanceolata. Photo from Wikipedia Commons

Plantago lanceolata. Photo from Wikipedia Commons

A species’ distribution is determined by the relative strength and complex interaction of many factors including (but not limited to) dispersal, life history, and physiological tolerance. Often the center of a species’ range is the warm, fuzzy place to be and fitness there is high compared to the range edges, where conditions can be harsh and inhospitable. From a global change perspective, the fate of a species may depend on these edge populations and their capacity to adapt to new conditions.

In a recent paper published in the Journal of Evolutionary Biology, Halbritter et al. (2015) compared fitness in reciprocal transplants, genetic diversity, and genetic differentiation among populations of two Plantago species (P. lanceolata and P. major) across their ranges. Although many studies have tested local adaptation generally and local adaptation between range centers and edges, one of the cool things about this study is that Halbritter et al. compared different types of range edges in Europe- latitudinal and elevational. Specifically, the study addressed the following questions:

(1) To what extent are populations locally adapted to conditions at the range center and edges, and does this differ between gradients?

(2) Do patterns of genetic diversity and genetic differentiation vary similarly along elevation and latitudinal gradients?

(3) Do patterns of genetic diversity and genetic differentiation reflect the contrasting breeding systems of the two species?

Here are the results-

Range center vs range edge: In comparing center to edge populations for P. lanceolata Halbritter et al. found no evidence that range-edge plants were better adapted to range-edge conditions than plants from the center of the range. In P. major, plants from the middle of the range outperformed plants from the edges when both were grown at the range center.

Elevation range edge vs latitudinal range edge: For both species lifetime fitness was higher in the elevation-edge populations than the latitudinal-edge populations when plants were grown at high elevation but there were no differences in fitness when elevation-edge and high latitude edge populations were grown at high latitude.

Genetic diversity and population differentiation: P. lanceolata populations were more diverse than P. major populations, consistent with reproductive strategy- P. lanceolata outcrosses and P. major is an inbreeder. Genetic diversity declined as latitude increased in P. lanceolata, but did not vary with elevation. In P. major, diversity decreased with increasing latitude and increased with increasing elevation. In both species, genetic population differentiation was greater along the latitudinal gradient than the elevational gradient.

The authors sum up their results nicely…

Our study revealed local adaptation in range-edge populations of both species, although this tended to be greater in the inbreeding P. major than in the outbreeding P. lanceolata. We also found stronger adaptation at the high-elevation edge than at the latitudinal edge in both species, which was associated with greater neutral genetic variation and lower population differentiation along the elevation gradient.

As for the take home messages, Halbritter et al. note that the signal of local adaptation was strongest, but still subtle, when comparing types of range edges, and therefore may be missed in studies comparing only range edges to the center. Patterns of adaptation and genetic structure in Plantago suggest that populations along elevational and latitudinal gradients will likely respond climate change in different ways. Interestingly, a previous study (Halbritter et al. 2013) found P. lanceolata can already be found at higher elevations than would be predicted based on its temperature tolerance!


Halbritter, A. H., Billeter, R., Edwards, P. J., & Alexander, J. M. (2015) Local adaptation at range edges: comparing elevation and latitudinal gradients. Journal of Evolutionary Biology. DOI: 10.1111/jeb.12701

Halbritter, A. H., Alexander, J. M., Edwards, P. J., & Billeter, R. (2013) How comparable are species distributions along elevational and latitudinal climate gradients? Global Ecology and Biogeography, 22, 1228-1237. DOI: 10.1111/geb.12066

Posted in adaptation, plants, population genetics, selection | Leave a comment

Sweptaway – Part 1

Brace yourselves for a series of new posts on selection, especially with articles from the special Molecular Ecology issue on “Detecting selection in natural populations: making sense of genome scans and towards alternative solutions” starting to roll out!

Selective sweeps (i.e reduction in genomic diversity at regions linked to positively selected, and fixed mutations in a population – also see my previous post here) are commonly identified by quantifying polymorphism at linked sites. Commonly used methods to characterize sweeps include tests with Tajima’s D, Fay and Wu’s H, the HKA chi-squared statistic, etc. Likelihood-based frameworks to infer selective sweeps include SweepFinder (Nielsen et al. 2005), SweeD (Pavlidis et al. 2011), XP-CLR (Chen et al. 2010), among others. However, reduction in genomic diversity need not necessarily be due to selective sweeps – alternate explanations could include population bottlenecks, background selection (see my earlier post explaining this), and unusually slow mutation rates. Huber et al. (2015) in their recent paper on “Detecting recent selective sweeps while controlling for mutation rate and background selection” describe an addition to Nielsen et al. (2005)’s SweepFinder.

To control for mutation rate (and possibly selective constraints), Huber et al. (2015) suggest the inclusion of invariant, fixed sites (with respect to an outgroup) in the analyses. Simulation analyses suggest increases in the power of the Composite Likelihood Ratio (CLR) test, and decrease in False Positive Rates (FPR) in detecting sweeps, with this inclusion of fixed differences. Similarly, controlling for background selection (by inclusion of a B-value map) showed greater power in detecting sweeps while including all sites i.e. polymorphic, and fixed invariant sites in the analyses.

Using the reduction in diversity relative to divergence as a necessary hallmark of a selective sweep in our model also helps to reduce false positives, e.g. in the case of a recent population bottleneck.

Nucleotide diversity across the X chromosome in CEU, YRI, and JPT human populations, compared with regions of Neandertal introgression. Image courtesy: Dutheil et al. (2015) doi/10.1371/journal.pgen.1005451.g005

In a neat application of detecting selective sweeps, Dutheil et al. (2015) also offer an alternate explanation to the observed levels of reduced diversity and divergence in human X chromosomes (with respect to Chimpanzees) to the controversial hypothesis of Patterson et al. (2006). X chromosomes have smaller effective population sizes (due to hemizygosity in males) compared to autosomes, and thus expected to be drifting more, and show lower divergence between humans and chimpanzees. By studying the patterns of incomplete lineage sorting (ILS) along X chromosome by simulating gene trees and estimating demographic parameters under a divergence model (CoalHMM), Dutheil et al. (2015) report a bimodal distribution for ILS along the X chromosome, with 8 regions identified as exhibiting particularly low proportions of ILS. Analyses of the effect of background selection on ILS showed that only about 31% of X chromosome windows could be explained solely due to background selection, whereas comparison of diversity across the X chromosome across different human populations, and the neandertal showed that low ILS regions predominantly evolve by recurrent selective sweeps. They argue that the observed large-scale reductions in diversity in extant human populations are also not plausible under a model of secondary contact between humans and chimpanzee ancestors (as suggested by Patterson et al. 2006).

Whatever the underlying mechanism, our observations demonstrate that the evolution of X chromosomes in the human chimpanzee ancestor, and in great apes in general, is driven by strong selective forces. The striking overlap between the low-ILS regions we have identified and the Neandertal introgression deserts identified by Sankararaman et al. further hints that these forces could be driving speciation.

References: Huber, Christian D., et al. “Detecting recent selective sweeps while controlling for mutation rate and background selection.” Molecular Ecology (2015). DOI: http://dx.doi.org/10.1111/mec.13351

Dutheil, Julien Y., et al. “Strong selective sweeps on the X chromosome in the human-chimpanzee ancestor explain its low divergence.” PloS Genetics (2015). DOI: http://dx.doi.org/10.1371/journal.pgen.1005451

Chen, Hua, Nick Patterson, and David Reich. “Population differentiation as a test for selective sweeps.” Genome research 20.3 (2010): 393-402. DOI: http://dx.doi.org/10.1101/gr.100545.109

Pavlidis, Pavlos, et al. “SweeD: likelihood-based detection of selective sweeps in thousands of genomes.” Molecular biology and evolution (2013): mst112. DOI: http://dx.doi.org/10.1093/molbev/mst112

Nielsen, Rasmus, et al. “Genomic scans for selective sweeps using SNP data.”Genome research 15.11 (2005): 1566-1575. DOI: http://dx.doi.org/10.1101/gr.4252305

Sankararaman, Sriram, et al. “The genomic landscape of Neanderthal ancestry in present-day humans.” Nature 507.7492 (2014): 354-357. DOI: http://dx.doi.org/10.1038/nature12961

Patterson, Nick, et al. “Genetic evidence for complex speciation of humans and chimpanzees.” Nature 441.7097 (2006): 1103-1108. DOI: http://dx.doi.org/10.1038/nature04789

Posted in adaptation, evolution, genomics, mutation, population genetics, selection, speciation, theory | Tagged , , , , , | Leave a comment

Environmental association analyses: a practical guide for a practical guide

"Do you have the kind with latent variables? I think it is a yellow bottle?"

“Do you have the kind with latent variables? I think it’s a yellow bottle?”

Obtaining extensive SNP data for your organism of choice isn’t such a feat these days, but actually matching that breadth of data with appropriate analyses is still a challenge. Fortunately, there has been an avalanche of new methods to make these connections between genetic variation and environment more clear. Unfortunately, the recent surge in new methodologies sure makes decision making tough. What are the drawbacks of different methods? What works for my data? Why didn’t I think of this before I generated all these SNPs?

If only there was some sort of….practical guide.

Continue reading

Posted in adaptation, association genetics, methods, Molecular Ecology, the journal | Tagged | Leave a comment

And who made your beer?

In the spirit of it being almost Friday, and while we’re on the topic of your favorite beverages – perhaps wine puts you to sleep, couldn’t care less where it came from, but prefer the bitterness of lager beers at your tailgate party? While ales are primarily fermented with the same yeast species as wines, Sacchromyces cerevisiae, alloploid hybrid strains of Sacchromyces eubayanus with S. cerevisiaeS. carlsbergensis (Saaz), and S. pastorianus (Frohberg) are used extensively in brewing lagers, the origins of which are yet contentious. Baker et al. (2015) sought out to solve this mystery with a high-quality de novo assembly of the S. eubayanus genome with the Yeast Genome Assembly Pipeline.

Hypothesized origins of Frohberg and Saaz strains - image courtesy Baker et al. (2015) http://mbe.oxfordjournals.org/content/early/2015/08/20/molbev.msv168/F6.large.jpg

Hypothesized origins of Frohberg and Saaz strains – image courtesy Baker et al. (2015) http://mbe.oxfordjournals.org/content/early/2015/08/20/molbev.msv168/F6.large.jpg

Analysis of maltose (MAL) utilization genes revealed 14 genes across 4 chromosomes, and 2 pseudogenes. Comparison with several other MAL genes across Sacchromyces (eg. MAL genes in ale counterparts) showed large levels of sequence identity (98%), with phylogenies indicating common origin of all MAL genes across brewing yeasts. Genome-wide signatures of domestication were quantified using dN/dS computations along each lineage, showing significantly (P < 1e-21) greater proportions of nonsynonymous substitutions in the Saaz and Frohberg lineages, while the two lineages themselves did not differ from each other, suggesting little/relaxed purifying selection and/or population bottlenecks, compared to non-lager lineages. Similar higher proportions of nonsynonymous substitutions were also observed among metabolic genes such as NOT3, ADR1, and ADH2. Testing levels of neutral divergence (equal levels implying single origins of Saaz and Frohberg from a single nucleus hybrid of S. cerevisiae and S. eubayanus, compared to different levels due to multiple origins) indicated support for the multiple origin theory, with dS (rate of synonymous substitutions) being over ten times higher in S. cerevisiae subgenome of lager strains, than in the S. eubayanus subgenome.

In this context, these results suggest that the Saaz and Frohberg lineages were created by at least two distinct hybridization events between nearly identical strains of S. eubayanus with relatively more diverse ale strains of S. cerevisiae.

Cos it’s almost Friday!



Baker, EmilyClare, et al. “The genome sequence of Saccharomyces eubayanus and the domestication of lager-brewing yeasts.” Molecular biology and evolution(2015): msv168. DOI: 10.1093/molbev/msv168



Posted in adaptation, evolution, genomics, natural history, next generation sequencing, phylogenetics, population genetics, selection, speciation, yeast | Tagged , , , , | 1 Comment

The unforeseen genomic consequences of domestication

Image credit

When a desired genome is selected for propagation, all mutations, beneficial, neutral, or deleterious, shift in frequency, and this sometimes can have unforeseen consequences.

Natural selection takes the good with the bad. Beneficial and harmful mutations combine to provide a net effect that is selected upon. You may remember from your most-recent genetics course that this is called the Hill-Robertson effect.

One would predict that this balance between beneficial and deleterious mutations is especially apparent in taxa that have undergone domestication. When population sizes shrink and selection is relaxed on characters that were important in the wild but not necessarily when domesticated, what are the consequences for genomes?

In a recent Molecular biology and evolution paper, Sebastien Renaut and Loren Rieseberg predict that domesticated lines should have reduced genetic variability, more deleterious mutations, and these mutations should be found in areas of the genomes that have the lowest recombination rate (where it is harder to get rid of them).

Simply put, the combined effects of a reduction in effective population size and fast population growth during domestication can drag along nonadaptive mutations, especially those that arose during the process of domestication itself.

That is simply put!

To follow up on those predictions, the authors gathered transcriptome data from sunflowers (Helianthus annuus): two wild types (one a weed representative from outside the native range) and two types of domesticated lines (Landrace and Elite). Indeed, domesticated lines had lower genetic variability and a greater proportion of nonsynonymous mutations. Using the bioinformatic method PROVEAN, they determined that up to 14% of those nonsynonymous mutations were classified as deleterious. Lastly, they demonstrated that these deleterious mutations are likely in areas of the genome with low recombination rates.

Figure 1 from Renault and Reiseberg

Figure 1 from Renaut and Rieseberg


Call me American, but I  had no idea what a cardoon was.

Even more interesting, they repeated these methods with two other domesticated crops (cardoon and globe artichoke) and saw the same pattern.

But it isn’t just being domesticated, how a line is domesticated matters. The elite lines have been under multiple stages of domestication compared to the landrace lines, but don’t show the predicted higher loads of deleterious mutations. Renaut and Rieseberg suggest that the modern selection practices used in the elite lines, such as crossing wild alleles back into cultivars, may help relieve the burden of deleterious mutations from domestication.


Renaut, S., & Rieseberg, L. H. (2015). The accumulation of deleterious mutations as a consequence of domestication and improvement in sunflowers and other Compositae crops. Molecular biology and evolution, 32(9): 22732283.

Posted in domestication, genomics, plants, selection, transcriptomics | Tagged , | Leave a comment