haploidy, diploidy, polyploidy … not a problem

Investigating pairwise relatedness is fundamental to the characterization of the mating system and inferring genetic structure. If no pedigree exists, then relatedness is estimated from genetic markers (e.g., microsatellite loci) using method-of-moment or maximum-likelihood methods.

However, not all individuals in a population have the same ploidy. In ferns, mosses and some seaweeds, haploid gametophytes alternate with diploid sporophytes. In some insect orders, such as the Hymenoptera, haploid males develop from unfertilized eggs. Thus, individuals may be related, but have differing levels of ploidy. Though many estimators exist for diploid organisms, no estimators exist for organisms with multiple ploidy levels.

Cape honey beesat a feeding station. Photograph by Anthony Vaudo, University of Florida

Cape honey bees at a feeding station. Photograph by Anthony Vaudo, University of Florida

That was until the software package recently published online by Huang et al. in Molecular Ecology Resources. Here, a relatedness coefficient, a maximum-likelihood and three coefficient of coancestry estimators are extended to enable the calculation of relatedness coefficients using co-dominant markers between individuals differing in ploidy.

The simulations and comparisons presented should help with the selection of the appropriate estimator for a given question or application.

Estimating pairwise relatedness between individuals with different ploidies will significantly advance our understanding of mating systems and the structuring of populations of organisms with complex life cycles.

Huang K, Ritland K, Guo S, Dunn DW, Chen D, Ren Y, Qi X, Zhang P, He G and Li B (2014, accepted) Estimating pairwise relatedness between individuals with differing levels of ploidy.  Molecular Ecology Resources DOI: 10.1111/1755-0998.12351

RedditDiggMendeleyPocketShare and Enjoy
Posted in natural history, pedigree, population genetics, software, Uncategorized | Leave a comment

Sweeping for Sweeps

Reduction in genomic diversity around a site has been attributed to one of two mechanisms – (1) sites linked to positively selected mutant alleles are often `swept’ to fixation, in a process often called genetic hitchhiking, and/or (2) background selection at sites linked to deleterious mutants are purged (or purified). Recent selective sweeps are thus characterized by long sequences of homozygous sites, and reduced linkage disequilibrium. Ancient sweeps on the other hand, are difficult to characterize – with several methods being proposed to detect them, often using scaled (with respect to a common ancestor) haplotype diversity, Tajima’s D, number of segregating sites, etc  – see Enard et al. (2014) for an excellent recap.

Two recent studies that analyzed human genomes for ancient and recent recurrent selective sweeps revealed some very interesting results.

Quidditch, anyone?

Racimo et al. (2014) propose a method based on ABC (Approximate Bayesian Computation) to detect ancestral selective sweeps that occurred soon after the split of humans and Neanderthals, and apply it to 26 phased human genomes from the 1000 Genomes Project. Scaled diversity (and other statistics) estimated in 0.02 cM windows around non-synonymous mutations, splice sites, 5’ UTR’s, regulatory motif changes show (1) no significant differences in signatures of positive selection between synonymous and non-synonymous sites, 5’ UTR’s, or regulatory motifs, but (2) significantly reduced differences in diversity in splice sites, and (3) failure of sites in favor of positive selection to lie in regions introgressed from Neanderthals.

Dutheil et al. (2014) take a different approach – they analyze regions of the genome (here X chromosome) that show signatures of Incomplete Lineage Sorting (ILS) – i.e. lower divergence, while reconstructing population histories. Low ILS regions thus would be expected to be either under strong background selection, or have experienced strong selective sweeps. Their analyses of reduction in genomic diversity at low-ILS sites from the 1000 genomes data on the X chromosome reveal (1) greater reduction in genomic diversity in non-African X chromosomes, compared to African X chromosomes, and (2) sites with low-ILS, and reduced genomic diversity do not lie in regions introgressed from Neanderthals.

Two studies, similar conclusions, leading into more questions about complex speciation in great apes. A classic clash of brooms. Quidditch, anyone?


Dutheil, Julien Y., et al. “Strong selection in the human-chimpanzee ancestor links the X chromosome to speciation.” bioRxiv (2014): 011601. http://dx.doi.org/10.1101/011601

Enard, David, Philipp W. Messer, and Dmitri A. Petrov. “Genome-wide signals of positive selection in human evolution.” Genome research (2014). http://dx.doi.org/10.1101/gr.164822.113

Racimo, Fernando, Martin Kuhlwilm, and Montgomery Slatkin. “A test for ancient selective sweeps and an application to candidate sites in modern humans.” Molecular biology and evolution 31.12 (2014): 3344-3358. http://dx.doi.org/10.1093/molbev/msu255



Posted in Uncategorized | Leave a comment

A molecular how-to for hibernating this winter

As the academic semester ends, I see the tell-tale signs of the upcoming holiday hibernation. The weary eyes of teaching assistants peeking over piles of final exams. Students who may have mentally been on break before finals even started. A little more pep in the faculty step (finally some time for that NSF proposal!).

Upon return to campus after the new year, most are refreshed and excited for a new semester. However, others will return in a slightly, well, degraded state: slowed by the excess of holiday nourishment and mentally lulled by an embarrassingly lengthy Netflix binge.

No matter what group you fall into, take a look at this new paper from Dr. Vadim Federov and colleagues that describes how some of our fellow mammals actually hibernate while still keeping themselves in shape.

In humans and most mammals, physical inactivity leads to loss of muscle strength and mass. In contrast, hibernating bears and ground squirrels demonstrate very limited muscle atrophy over the prolonged periods (6–8 months) of physical inactivity of winter hibernation, suggesting that hibernating mammals have evolved natural mechanisms that prevent disuse muscle atrophy.

Two hypotheses for how these mammals carry out this feat have been proposed. Either a) genes that build proteins are upregulated during hibernation or b) genes that are responsible for breaking down muscle tissue are downregulated during hibernation.

By measuring the expression levels of a host of functional genes from black bears and arctic ground squirrels that were either in the process or hibernation or not, Federov and his colleagues show that the role of genes that increase protein biosynthesis is more pronounced in animals that are hibernating compared to those that aren’t.

At the same time, they found no changes in pathways that result in the catabolism of proteins, indicating little influence of genes that prevent the breakdown of muscle tissue.

These findings imply reduction in amino acid catabolism and suggest, besides possible urea recycling, redirection of amino acids from catabolic pathways to the enhancement of protein biosynthesis.

If only this was applicable to humans. Goodbye hustle and bustle. Hello to sweet, sedentary life.


Fedorov V.B., Nathan C. Stewart, Øivind Tøien, Celia Chang, Haifang Wang, Jun Yan, Louise C. Showe, Michael K. Showe & Brian M. Barnes (2014). Comparative functional genomics of adaptation to muscular disuse in hibernating mammals, Molecular Ecology, 23 (22) 5524-5537. DOI: http://dx.doi.org/10.1111/mec.12963

Posted in association genetics, Molecular Ecology, the journal, quantitative genetics, Uncategorized | Tagged , , | Leave a comment

This post is for the birds

Paintings of mourning doves (left) and a flamingo (right) by John Audubon

Paintings of mourning doves (left) and a flamingo (right) by John Audubon

Note: this post was has been corrected to reflect the fact that Flamingoes and Pigeons are not sister species, but members of sister clades.

Darwin’s favorite bird, the pigeon, has a new sister (clade) that includes Flamingoes and Grebes. This somewhat surprising result came from a recent phylogenomic analysis of 48 bird species published last week in Science. This analysis and its 27 companion papers were the culmination of years of work conducted by the Avian Phylogenomics Project, which is led by Erich Jarvis, a Professor of Neurobiology at Duke University, Guojie Zhang of the National Genebank at BGI in China and the University of Copenhagen, and M. Thomas P. Gilbert of Natural History Museum of Denmark.

In this post I take you on a supervised speed date with 12 of the 28 papers: Continue reading

Posted in Uncategorized | 3 Comments

LSUMNS researchers are at the top of the list for new species discoveries in 2014

2014 was an exciting year for describing new biodiversity for researchers at the Louisiana State University Museum of Natural Science (LSUMNS). Top ten lists are ubiquitous this time of year and two such lists documenting the top new species of 2014 include taxa described by LSU researchers.

A list compiled by Discover Magazine includes a new fish species described by Prosanta Chakrabarty and colleagues and a new rat species described by Jake Esselstyn and colleagues.

The Hoosier Cavefish Amblyopsis hoosieri, the first new cavefish species described from the United States in the last 40 years, is found in subterranean habitats of southern Indiana. Amblyopsis hoosieri is distinct from its congener A. spelaea based on morphological and molecular characters. The Ohio River appears to act as a barrier for these two species with A. hoosieri distributed to the north of the river and A. spelaea to the south.

The Hoosier Cavefish Amblyopsis hoosieri. Photo by M.L. Niemiller.

The Hoosier Cavefish Amblyopsis hoosieri. Photo by M.L. Niemiller

Continue reading

Posted in Uncategorized | Leave a comment

Totally RAD

Puritz et al. (2014) weigh the pros and cons of, the aptly titled, “RAD fad” in a comment recently published online in Molecular Ecology. They challenge:

(1) the assertion that the original RAD protocol minimizes the impact of PCR artifacts relative to that of other RAD protocols, (2) present additional biases in RADseq that are at least as important as PCR artifacts in selecting a RAD protocol, and (3) highlight the strengths and weaknesses of four different approaches to RADseq which are a representative sample of all RAD variants.
Artwork courtesy of chrispiascik.com © Chris Piascik

Artwork courtesy of chrispiascik.com
© Chris Piascik

In Box 1, the authors break down four representative protocols: mbRAD (Miller et al. 2007, Baird et al. 2008), ddRAD (Peterson et al. 2012), ezRAD (Toonen et al. 2013) and 2bRAD (Wang et al. 2012).

Then, the mitigation of PCR artifacts is discussed followed by a summary of the pros and cons of each of the four representative RAD protocols.

The most important consideration when selecting a particular RAD protocol are the facilities and molecular experience of the research applying the approach, as well as the biology of the organisms and the hypotheses being tested … at present, there is no reason to broad-brush paint any method as the superior or default protocol.


Miller MR, Dunham JP, Amores A, et al. (2007) Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers.  Genome Research 17: 240-248. dpi: 10.1101/gr.5681207

Baird NA, Etter PD, Atwood TS et al. (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PloS ONE, 3, e3376. DOI: 10.1371/journal.pone.0003376

Peterson BK, Weber JN, Kay EH, et al. (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PloS One, 7, e37135. DOI: 10.1371/journal.pone.0037135

Wang S, Meyer E, McKay JK, et al. (2012) 2b-RAD: a simple and flexible method for genomewide genotyping. Nature methods, 9, 808–10. doi:10.1038/nmeth.2023

Toonen RJ, Puritz JB, Forsman ZH et al. (2013) ezRAD: a simplified method for genomic genotyping in non-model organisms. PeerJ, 1, e203. DOI 10.7717/peerj.203

Andrews, KR, G Luikart (2014) Recent novel approaches for population genomic data analysis. Molecular Ecology 23: 1661-1667. DOI: 10.1111/mec.12686

Puritz, JB, MV Matz, RJ Toonen, JN Weber, DI Bolnck, CE Bird (2014, accepted article) Comment: Demystifying the RAD fad. Molecular Ecology. DOI: 10.1111/mec.12965

Posted in bioinformatics, genomics, methods, next generation sequencing, Uncategorized | Leave a comment

Migration Circos plots in R

We’ve all seen them – colorful, and I daresay, pretty darn informative. Circos plots are fun visualizations of large data-sets. I’ve seen them used in two contexts in comparative genomics – to represent structural variants in homologous chromosome segments in species alignments, and to perhaps represent gene-gene interactions. But there are scores of other interesting applications of these plots to scientific data, a comprehensive list (and yet growing) can be seen here.


Circos plot of source-sink migration dynamics between populations “3” and “4” here, with 8 other populations. Width of migration curves indicates amount of migration.

A relatively recent publication by Abel and Sander (2014) in Science on using Circos plots to represent migration prompted me to explore the migest package in R. Scores of studies in molecular ecology and population genetics utilize methods to estimate ancestral or contemporary migration routes between populations, such as MIGRATE-N, IM/IMa2, IMMANC, BayesAss, etc. I am yet to see a migration visualization that comprehensively describes complex migration routes between populations in pop-gen studies. So putting these two together, I thought I’d use migration estimates from one of the above tools, and represent it as a Circos plot. I am going to skip a few steps in creating the data files required to make these plots, but I refer you to some excellent documentation by Guy Abel.

Continue reading

Posted in bioinformatics, genomics, howto, R, software | Tagged , | Leave a comment

C.L. Gloger’s favorite owl

European Barn Owl (Tyto alba). Photo by Carlos Delgado.

Biologists love clines. We’ve been mentally masticating on clines for decades.

Clines in body size. Clines in color. Clines in heart size! Clines that go in circles!

Recognizing clinal patterns in phenotypes or genotypes is fun, but discovering the mechanisms behind these clines has proven to be a real challenge. Sure, clines can be produced by forces of natural selection. But these signals can also be produced by neutral processes like isolation by distance or secondary contact between populations.

A recent study by Slyvain Antoniazza and colleagues builds on previous investigations of a widespread color cline in European Barn Owls (Tyto alba). Whereas nailing down the selective forces that may cause variation in plumage color of these owls is difficult, removing other potential explanations can be done through the process of elimination.

Antoniazza and colleagues do just that, tackling a third neutral process that may produce the observed color clines in Barn Owls, allele surfing.

In the allele surfing process, neutral alleles may ‘surf’ the wave of range expansion and increase their frequency along the way eventually forming a genetic cline.
Here, we investigate whether a postglacial colonization model is compatible with today’s observed genetic diversity of the European barn owl, and investigate how likely it is for the colour cline to have arisen by allele surfing (as opposed to natural selection) during colonization.

Using Approximate Bayesian Computation (ABC) simulations with genetic data from owls all over Europe in combination with analyses of color variation, allele surfing is ruled out as a major contributor to the color clines. When combined with previous results, it looks like the color cline in Barn Owls is most likely due to natural selection.

One explanation eliminated, now only the forces of natural selection to go.

Antoniazza S., Luca Fumagalli, Jérôme Goudet & Alexandre Roulin (2010). Local Adaptation Maintains Clinal Variation in Melanin-Based Coloration of European Barn Owls (Tyto alba), Evolution, DOI: http://dx.doi.org/10.1111/j.1558-5646.2010.00969.x

Antoniazza S., Samuel Neuenschwander, Reto Burri, Arnaud Gaigher, Alexandre Roulin & Jérôme Goudet (2014). Natural selection in a postglacial range expansion: the case of the colour cline in the European barn owl, Molecular Ecology, 23 (22) 5508-5523. DOI: http://dx.doi.org/10.1111/mec.12957

Posted in adaptation, Molecular Ecology, the journal, population genetics | Tagged , , | Leave a comment

Increase your broader impacts with Data Nuggets



This week we have a special guest post by Elizabeth Schultheis, a PhD candidate at Michigan State University and the Kellogg Biological Station, to describe her Data Nuggets project. Previous guest posts have discussed other great projects happening in the scientific community, including improving scientific reproducibility and the role of pre-prints making research available more quickly and to a broader audience. Data Nuggets is a great way to invest in the scientific community of the future by making research accessible to K-12 educators and students in new and exciting ways.

Broader impacts can be hard.

We’ve all had that moment while writing an NSF grant proposal where we have to discuss the broader impacts of our research and demonstrate that our work contributes to society. The NSF makes it clear that they highly value projects with significant broader impacts; grants that do not explicitly address them will be returned without review, and some reviewers give them equal weight to the intellectual merit of a project when making funding decisions. Additionally, it is no longer enough to train undergraduates when performing our research, or TA a course where we discuss our research. The NSF is looking for creative answers to their call for projects that both improve our understanding of science and benefit society.

 But they don’t have to be.

Data Nuggets were designed to help scientists improve their communication skills and share the story of their research with a broad audience. When creating a Data Nugget you increase your broader impacts by:

  • Improving STEM education at all levels, including K-12 and undergraduate classrooms
  • Increasing your public outreach by disseminating your research findings to a broad audience and putting your data into a format that nonscientists can understand
  • Making science relatable by sharing your journey of exploration and discovery with students, increasing the passion for science and retention in STEM fields
  • Providing a snippet of data from your research, allowing students to analyze and interpret messy, real data as opposed to the polished data in textbooks that is not a realistic outcome of experimentation
  • Showing students that scientists are not all old men in lab coats, but can be done in a variety of settings by anyone with a passion for the natural world

Continue reading

Posted in career, citizen science, community, funding, methods | 1 Comment

Identifying and correcting errors in draft genomes

Cumulative number of genomes sequenced over the past 3 decades (figure by Greg Zynda http://gregoryzynda.com/)

Over the past decade we have seen an exponential increase in the number of sequenced, assembled, and annotated genomes. These these genomes are essential for pretty much any genomics research. If you want to sequence the genome, transcriptome, epigenome, or whatever-ome of your super-special study species and population, you’ll need (or at least want!) a pretty solid (read: well-annotated) reference genome to which to align your sequence data.

Fortunately for you, genomicists have been sequencing pretty much any genome that they can get their hands on. Unfortunately, these genomes are first published in “draft” form and come with a multitude of potential errors. These errors are highlighted in a recent paper by James Denton and colleagues. Here’s the one-sentence summary of their paper:

Low-quality assemblies result in low-quality annotations, and these annotation errors cause both the over- and under-estimation of gene numbers.

The good news is that:

many genome assemblies and annotations have improved over time due to further efforts aimed at both increasing sequence contiguity and adding functional data (e.g. RNA-seq) in order to correct gene models.

… but the bad news is that:

it is often the case that a great deal of research will be based upon the draft assembly before it has reached a finished state, and erroneous conclusions may result.

More specifically, in this paper the authors compared the most up-to-date genomes (from fruit flies to chickens to chimpanzees) to their draft-genome predecessors. What they found was that:

low-quality assemblies can result in huge numbers of both added and missing genes, and that most of the additional genes are due to genome fragmentation (“cleaved”* gene models)… Upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes.

(*”cleaved” gene models are those in which multiple genes are estimated from sequences that actually came from just one gene.)

Their findings make sense. If you are sequencing fragments of the genome then the prediction algorithms will be more likely to assign fragments from different exons, which may be far apart, to different genes. These cleaved gene models lead to an overestimation of single-exon genes and a depletion of multi-exon genes.

Alas, there is hope, and this hope comes in the form of RNA-sequencing. The authors found that paired-end RNA-sequencing improves the annotation of genomes by connecting the cleaved genes.

Overall, this suggests that caution should be taken when using and interpreting draft genomes. Use them with caution and, if you can, improve the annotation by sequencing your organism’s transcriptome.

Denton JF, Lugo-Martinez J, Tucker AE, Schrider DR, Warren WC, et al. (2014) Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies. PLoS Comput Biol 10(12): e1003998. doi: 10.1371/journal.pcbi.1003998

Posted in Uncategorized | Leave a comment