The not so singular process of hybridization

What, if anything, are hybrids?
© G. Horatiu

© G. Horatiu

Zach Gompert and Alex Buerkle ask this question in a special issue in Evolutionary Applications.

Hybrids occur when unrelated individuals mate, but how distant do the taxa need to be to constitute a cross? The varied definitions of hybridization downplay the continuous nature of genetic and phenotypic differentiation. Anywhere you are along this continuum, gene flow could generate similar consequences.

Gompert and Buerkle take an interesting approach in reviewing and synthesizing existing literature on hybridization while also adding new simulations. The issues they address

have a relatively long history, some of which is underappreciated, and clarifying these ideas should have practical consequences for managing hybridization and gene flow in plants,

and likely other taxonomic groups as well.

They argue that it is the evolutionary and ecological consequences of gene flow that should be considered when we define hybridization. This is not unlike what Pante and colleague argued awhile back in building “species hypotheses” to test with estimates of gene flow. For species, a careful exploration of connectivity can inform on species delimitation. For hybridization, we should worry less about a taxon and more about the differences between two groups.

Gompert and Buerkle define hybridization as

cases where outcrossing and gene flow occur between populations that differ, at least quantitatively, at multiple heritable characters or genetic loci that affect fitness.

Using simulations, Gompert and Buerkle explored cases in which management decisions could be misled while studying hybridization. They simulated genetic data under conditions of primary divergence or secondary contact as well as quantitative traits along an environmental gradient or reduced hybrid fitness.

They found that it will often be difficult to distinguish different histories of selection and gene flow from genetic data, but they did find that recent primary divergence and secondary contact generate different variation. Managers should, therefore, treat recent primary divergence and secondary contact as distinct processes. It is only after greater periods of time that the patterns of variation from these two processes begin to look similar.

The variability in outcomes makes it difficult to describe categorical statements about

the composition, importance and … threats of hybrids.

The make up of parental taxa and hybrids at site A may be completely uninformative about other locations where these taxonomic groups co-occur.

The challenges facing hybridization studies arise due to the complexity and uncertainty of the process of hybridization itself. In order to overcome these difficulties, it will be necessary to perform detailed studies that include sampling multiple geographic locations and contexts, characterizing the demography of parents and hybrids and estimate the multiple dimensions of ancestry.


Gompert Z and C. A. Buerkle (2016) What, if anything, are hybrids: enduring truths and challenges associated with population structure and gene flow. Evolutionary Applications doi:10.1111/eva.12380.

RedditDiggMendeleyPocketShare and Enjoy
Posted in bioinformatics, conservation, domestication, evolution, genomics, natural history, next generation sequencing, plants | Tagged , , , | Leave a comment

Data, data everywhere and another tool to use: Taxonomer, a web-tool for metagenomics data analysis

Because sequencing. With all the affordable genome and metagenome sequencing available, we’ve reached an unprecedented point at which we can profile microbial communities more accurately than ever before. For this reason, it’s essential to develop efficient methods for data analysis. While some researchers are adept at collecting samples, preparing them for sequencing, and analyzing the mountains of resulting data, there can also be an appreciable gap between the wet lab and the bioinformatics analysis side of a project. It’s important that tools are developed that allow for powerful and efficient data analysis, even if you don’t have the strongest background in programming there shouldn’t be a barrier to understanding all of the cool stuff your data has to offer.

Developing tools for data analysis is no small feat, accounting for bias or any issues in the sequence data itself can be just one of the many challenges. However, that being said, there have been some nifty tools to tackle the mountains of sequence data available (such as Anvi’o (Eren et al., 2015) developed by the Meren lab, Kraken (Wood et al., 2014) and CLARK (Ounit et al., 2015)) for the powerful analysis of large data sets, and just today another was published in Genome Biology.

Figure 1. Flygare et al., 2016

The study by Flygare and colleagues presents “Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling”. As the authors highlight, really huge data sets are hampered by long computation times and algorithmic inaccuracies. They push this tool as a method to help in the identification of pathogens across broad geographic scales. In particular, they point out that we have an opportunity like never before to link microbial communities to human health and disease.


Figure 2. Flygare et al., 2016.

Advances in RNA sequencing have also assisted in enabling pathogen detection and shifts in host expression in response to the pathogens. These developments have the potential to enhance disease diagnosis and treatment. Moving away from PCR amplification of marker genes and toward microbiome studies gets rid of some biases introduced by this method.

Figure 3. Flygare et al., 2016.

Figure 3. Flygare et al., 2016.

Taxonomer presents itself as a fast, easy to use, web-based metagenomic sequence analysis tool for DNA or RNA sequences. It claims to be the most comprehensive taxonomic profiling tool around and also very, very fast. It could potentially also enhance pathogen detection and strives to make high-quality data analysis accessible to non-bioinformatic specialists. The newest versions of these up and coming web-based tools and their attempts to more accurately and quickly analyze large sets of sequence data  demonstrate that we are headed in the right direction.

Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, Delmont TO. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ. 2015 Oct 8;3:e1319.

Flygare S, Simmon K, Miller C, Qiao Y, Kennedy B, Di Sera T, Graf EH, Tardif KD et al. Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling. Genome Biology. 2016 May 26; 17:111. DOI: 10.1186/s13059-016-0969-1

Ounit R, Wanamaker S, Close TJ, Lonardi S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics. 2015;16:236.

Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15:R46.

Posted in bioinformatics, community ecology, genomics, metagenomics, methods, microbiology, software | Tagged , , , | Leave a comment

Steelhead in a random forest: identifying the genetic basis of migration

Genome-wide association studies (GWAS) have been quite successful in identifying variants associated with various phenotypes (I suppose there is some debate surrounding this statement. For an interesting, if dated, discussion look here). While most of this work was originally conducted on model organisms, more recently the methods have been applied to natural populations and have shown promising results.

Conservation genetics, in particular, is a field which stands to benefit from association studies. For example, think about if we can select for a Tasmanian Devil resistant in the face of cancer tearing through the population. In their recent paper, Hess et al. use association testing to identify the genetic basis of migration timing in steelhead trout, a salmonid of conservation concern.

Steelhead are migratory anadromous salmonids; individuals are born in fresh water, spend 1-4 years maturing in the ocean, then migrate back to fresh water to spawn. What is interesting about the system is that there are distinct summer and winter runs back to fresh water (see figure below). The individuals that migrate during the summer face easier passage upstream due to higher water flow but risk high mortality while waiting for spring; winter individuals experience the opposite. However, individuals within a drainage are more genetically similar than those between drainages, indicating that the separate runs interbreed. Continue reading

Posted in association genetics, bioinformatics, conservation, genomics, next generation sequencing | Leave a comment

Poorly updated databases will affect your results

If you’re anything like me, your research is heavily dependent on the many wonderful database resources available online. NCBI, UniProtKB, Ensembl, Swiss-Prot, EMBL-EBI, and many other sites and organizations offer highly useful (and often curated) molecular information. Can you imagine having a few nucleotide sequences and not being able to BLAST them to a public database? Many of these resources are updated continuously, some even daily.

Lina Wadi and her colleagues analyzed the expansion of gene annotations in the public databases of the Gene Ontology Consortium and Reactome, and found that the number of pathways and processes had doubled in the last seven years. This is all well and good thanks to the massive influx of new data the last couple of years.


What is highly worrying, however, is that the great majority of publications using gene ontology and pathway enrichment analyses make use of tools that haven’t been updated in many years. According to Wadi et al., 80% of the publications they screened from 2015 used outdated software that only captured 20% of the pathway enrichments available in the current gene annotations.

Continue reading

Posted in bioinformatics, genomics, next generation sequencing, software | Tagged , , , , , | Leave a comment

Catching evolution in the act with the Singleton Density Score

A recent study led by Jonathan K. Pritchard at Stanford University brought a media storm with catchy headlines in both of the flagship scientific outlets Nature and Science News. Aside from highlighting the question of preprints without peer review being covered by popular media, it has also raised attention of the scientific community because of the newly described method for detecting recent selection.

The Singleton Density Score (SDS) is a measure based on the idea that changes in allele frequencies induced by recent selection can be observed in a sample’s genealogy as differences in the branch length distribution.

“The key idea underlying SDS is that recent frequency changes generate differences in the distributions of coalescence times on the two allelic backgrounds.”

The new method uses whole-genome data and looks at the variation around SNPs. Assuming that derived alleles increasing in frequency have shorter branches (and ancestral alleles decreasing in frequency have longer branches), these are expected to have fewer mutations. With SDS, Field et al. look at the distance to the nearest singleton upstream and downstream from each SNP.

Continue reading

Posted in methods, mutation, population genetics, selection | Tagged , , , | 1 Comment

Genomes on the beach: The International Conference on Polyploidy, Hybridization, and Biodiversity

Croatia ain't bad

Croatia ain’t bad

I’ve spent the last week in Rovinj, Croatia at the International Conference on Polyploidy, Hybridization, and Biodiversity. I’ve been thinking (and writing) a great deal about polyploidy recently, and this meeting was certainly the impetus for much, much more of that.

Having a history of multiple genomes is becoming a more and more prevalent cog in the evolution of most taxa. Once considered only marginally important evolutionarily and confined taxonomically, both contemporary and ancient polyploidization events are now detectable and important across a large number of fungi, animals, and plants.

Because polyploidy is such a widespread phenomenon, the diversity of study systems and questions was fantastic at this meeting (more fantastic that the food, wine, and location? Not sure about that).

Here are my big three takeaways:

Continue reading

Posted in community, conferences | Tagged , , | Leave a comment

Ice-Age Euro-trips

Recent works that attempt to get at human migrations inside Europe paint a complex portrait of migratory events, admixture with archaic hominids, and adaptive evolution to new geographies, and a changing global climate. Analyzing whole genomes of 51 ancient humans (from 45,000-7,000 ybp) across Europe, Fu et al. (2016) sought to address these complexities with quite possibly the largest ancient genome study till date.

Location and age of the 51 ancient modern humans, Image courtesy: Q Fu et al. Nature 1–16 (2016) doi:10.1038/nature17993

Location and age of the 51 ancient modern humans, Image courtesy: Q Fu et al. Nature 1–16 (2016) doi:10.1038/nature17993

Analyzing percentages of Neanderthal ancestry across these individuals, Fu et al. (2016) report conformity with expected declines (3.2-4.2% to 1.8-2.3%), with a stronger signal around genic regions than in others due to selection against Neanderthal DNA. Analyses of shared Y chromosome haplogroup variation among samples revealed the presence of the R1b (prominent in Western Europe) in the ~14,000 year old Villabruna sample in Italy pointing to a much ancient wave of migrants from East into West (than the Bronze Age migrations that are largely accepted). This observation was also supported by presence of the eye color variant allele (HERC2), and other haplogroups in more ancient Western European individuals.

Fu et al. (2016) also study clustering of individuals (based on shared drift), describing the 51 individuals into five clusters (Vestonice, Mal’ta, El Miron, Villabruna, and Satsurblia, after the oldest individual in each cluster), based largely on their age, while few individuals were admixed among clusters. Reconstructing their history by building an admixture graph, the authors report a complex sequence of historical events that point to (1) refuting previous evidence of three major lineages in modern European genomes, (2) modern Europeans share ancestry with at least ancient Europeans dating to 37,000 ybp, and (3) possible long distance migrations between Europe and the Near East around 14,000 ybp.

An important direction for future work will be to generate similar ancient DNA data from southeastern Europe and the Near East to arrive at a more complete picture of the Upper Paleolithic population history of western Eurasia.


Fu, Q., Posth, C., Hajdinjak, M., Petr, M., Mallick, S., Fernandes, D., Furtwängler, A., Haak, W., Meyer, M., Mittnik, A. and Nickel, B., 2016. The genetic history of Ice Age Europe. Nature. DOI: 10.1038/nature17993

Posted in adaptation, evolution, genomics, natural history, Paleogenomics, population genetics, selection | Tagged , , , , , , , , | Leave a comment