## From crocodiles to coconuts

The first plant trypanosomatids were discovered in plant tissues over 100 years ago, but we know very little about their biology, life cycle or how they have adapted to life inside plants.

Jaskowska et al. (2015) provide a review of Phytomonas parasites and our current state of knowledge in light of the release of the first genomes in this genus.

They highlight some recent genome-wide analyses identified several aspartyl proteases, which are known to be secreted by plant pathogenic fungi.  But, these proteases were absent from the trypanosomatids that infect animals.  Future comparisons between plant and animal trypanosomatids may help elucidate the evolution of cell surfaces for plant-adapted and mammalian-adpated parasites.

With genomic resources it will be possible to explore the huge morphological diversity exhibited in Phytomonas species. Do morphological differences represent different life cycle stages (i.e., different selection pressures in discrete host environments as found in other trypanosomatids) or something adaptive and important to virulence within plants?

Comparative genomics will shed light on the evolution of these organisms that inhabit such a diverse

spectrum … from crocodiles to coconuts.

Jaskowska E, Butler C, Preston G, Kelly S (2015) Phytomonas: Trypanosomatids Adapted to Plant Environments. PLoS Pathog 11(1): e1004484. doi:10.1371/journal.ppat.1004484

Share and Enjoy

## Discordance in ancestry inference using human mtDNA and autosomes

Mitochondrial haplotypes have been used extensively over the last few decades for inference of a population structure in humans. Key findings from these studies include what has come to be known as the “Mitochondrial Eve” hypothesis (see the controversial Cann, Stoneking, and Willson (1987), subsequent reviews by Templeton (2002), Stoneking (1997)), numerous studies on the identification of mitochondrial haplogroups (for a complete list see, van Oven and Kayser (2009)http://www.phylotree.org/), and the basis for several commercially available ancestry inference kits.

mtDNA-haplogroup membership might not be associated with autosomal ancestry proportions. Figure 5 from Emery et al. (2015)

In a recent study, Emery et al. (2015) compared estimates of global ancestry using autosomes and mtDNA haplogroups, and report discordance between ancestry inferred using mtDNA haplogroups and their `true’ continental-ancestry proportion. The authors mined the HGDP-CEPH, and the 1000 Genome Project data to analyze 28 diagnostic SNP’s for mtDNA haplotyping, and >650k SNP’s for autosomal continental-ancestry determination using ADMIXTURE (Alexander et al. 2009). They further determined the correlation between the two using a multinomial logistic regression model.

Key findings from the study include (1) more than one unique mtDNA haplogroup in most populations, (2) less information for inferring individual ancestry from mtDNA haplogroups, (3) high misclassification rates of continental-ancestry using mtDNA, and (4) low association between ancestry determined by mtDNA haplogroups and autosomes. As the authors rightly state,

Overall, our results question the validity of making anything but fairly crude inferences of continental ancestry on the basis of most mtDNA lineage tests. The limitations of lineage-based ancestry inference should be acknowledged by researchers and made explicit to consumers of commercial ancestry-testing products.

References:

Estimates of Continental Ancestry Vary Widely among Individuals with the Same mtDNA Haplogroup, Emery, Leslie S. et al. (2015) The American Journal of Human Genetics http://dx.doi.org/10.1016/j.ajhg.2014.12.015

## Estimating the ticks and tocks of molecular clocks

M

Like many undergraduate students, I learned about the linear, universal molecular clock: the homogeneous rate of nucleotide change over time. When I sat down to actually do analyses of molecular data, I was confounded by the array of options to treat DNA sequences with a molecular clock. Relaxed clock? Strict clock? Local clock? I had no idea what was going on.

With the growing amount of sequence data, the performance of molecular-clock analyses will increasingly rely on rigorous model selection and the identification of accurate, informative calibrations.

Ho and Duchêne have recently provided a nice, practical review for choosing methods to apply molecular clocks to sequence data. Even though some molecular clock parameters are reduced to a single check box in some buried software menu, the consequences of not considering your options can lead to significant error in the estimation of evolutionary timescales. Using this review, you should be able to make an informed choice based on your taxa, question, data set, and evolutionary time frame.

Ho S.Y.W. & Duchêne S. (2014). Molecular-clock methods for estimating evolutionary rates and timescales, Molecular Ecology, 23 (24) 5947-5965. DOI: http://dx.doi.org/10.1111/mec.12953

## The microbiome doesn’t always explain everything.

Microbiome research is sexy. Just look at the Google Trends graph. Anyone and everyone is studying the gut, nasal, vaginal, skin, oral, aural, any-other-body-part microbiome. This means that a lot of research is getting published saying what constitutes a “healthy” vs. “unhealthy” microbiome (hint: it’s not binary or that simple)

So don’t blame me for the fact that I read this new study with a healthy bit of skepticism. In fact, I was pointed there by a press release that, in my opinion, overstated their conclusions. Continue reading

## Incorporating phenotype and genotype in model-based species delimitation

Figure by Jeremy Yoder showing gene tree species tree discordance. This phenomenon complicates species delimitation efforts using genetic data.

Species are the fundamental unit of biology but identifying them is a challenging task that receives a lot of theoretical and empirical attention. In a recent Evolution paper, Solís‐Lemus et al. (2015) introduce a new model-based method that integrates phenotypic and genetic data in the delimitation of species boundaries. The method also accommodates divergence with gene flow and selectively driven divergence.

The goal of our work is to develop a species delimitation method to combine genetic and trait data into a common framework based on an explicit model of evolution. Specifically, we extend the Bayesian program BPP (Bayesian phylogenetics and phylogeography, Yang and Rannala 2010) to combine genetic and quantitative trait data in a single Bayesian framework, which we call iBPP (integrated BPP).

## The paludicolous life: peatmosses and pH

High dispersal should counteract local adaptation by continuously redistributing genetic variability.  In the bryophyte Sphagnum warnstorfii, the North Atlantic may not be as formidable a barrier as expected.  Spores may traverse the Atlantic Ocean to North America from Europe and vice versa.

Mikulášková et al. (2015) revisit local adaptation in this high disperser in a new paper in Ecology and Evolution. Its broad tolerance to pH and calcium (two major determinants in species distribution in fens) could be due to genetically differentiated ecotypes. Indeed, pH was an important determinant in genetic structure, but it was independent of geography.

Alternatively, broad tolerance could be due to the occurrence of cryptic species, highlighting either the problems associated with species boundaries or the occurrence of introgression with phylogenetically allied species. Species definitions are a bit of a sticky subject, but both species ID and hybridization raise intriguing questions with regard to the latter’s role in shaping the genetic structure of species and the former’s influence on patterns we describe. In either case, the addition of a free-living phase, differing in ploidy, adds a complicated twist

Mikulášková E, Hájek E, Veleba A, Johnson MG, Hájek T, Shaw JA (2015) Local adaptations in bryophytes revisited: the genetic structure of the calcium-tolerant peatmoss Sphagnum warnstorfii along geographic and pH gradients. Ecology and Evolution 5: 229-242 DOI: 10.1002/ece3.1351

In light of this recent study by Knauff and Nejasmic (2014) that makes a lot of presumptive leaps on the utility and effectiveness of $\LaTeX$ in scientific writing, my case for the utility of $\LaTeX$ for every equation, reference, table, figure, and revision will hopefully sit well with MS Word loyalists (I used to be one too).

We conclude that even experienced LaTeX users may suffer a loss in productivity when LaTeX is used, relative to other document preparation systems. Individuals, institutions, and journals should carefully consider the ramifications of this finding when choosing document preparation strategies, or requiring them of authors – Knauff and Nejasmic (2014)

At the same time, this blog-post (and scores of other text documents that I have written) was typed in Programmer’s Notepad. I could have very well typed this in Vi/Vim/EMACS/Notepad/WordPad/MS Word or any editor of choice – in fact, I write all my $\TeX$ files (and C/C++) in Programmer’s Notepad/Vi. The fundamental difference between an editor (which MS Word is), and a typesetting language (which $\LaTeX$ is) is often overlooked. While I point you to some very valid arguments laid forth by Claus Wilke here, a breakfast conversation with him over the utility of $\LaTeX$ over MS Word prompted me to come up with a list of cool things that you can do using $\LaTeX$, which you invariably will have trouble with achieving in MS Word to produce publication quality documents. My objective here is to point out some easy to use $\LaTeX$ hacks, and definitely not to belittle MS Word’s utility – all journals will end up typesetting your text nevertheless in accordance with their requirements. As Claus rightfully points out, there is also no one correct/perfect tool to use.

John Lees-Miller and I collaboratively editing the $\LaTeX$ version of this post on www.overleaf.com

Posted in howto, science publishing | Tagged , , | 1 Comment

## The imitation game: simulating the genetics of large populations

The most adorable of simulations. Credit to Liza Gross

Computational simulations of genetic data are such a powerful and flexible tool for carrying out studies in molecular ecology.

Do you want to know how much explanatory power your data provides? Simulate it!

Predict the future response of species to hybridization, climate change, or translocation? Simulate it!

Do you want to know what it is like to run a city, drive a city bus, or be a goat? Ehhh, that’s not really what I’m talking about.

Many of the programs for simulating genetic data rely on constructing simulations based on individuals. Simulating individuals makes a lot of sense: easy to interpret and flexible for many evolutionary scenarios. However, the biggest limitation to individual-based simulators is that the computing power needed to simulate large numbers of individuals can be unwieldy. And if you are really trying to simulate biological phenomenon, large number of individuals is likely a requirement.

There are other types of models for simulation (analytical models) that focus more intently on a handful of genetic parameters of interest. These simulators obtain more accurate estimations of parameters of interest by sacrificing the complexity that may be more representative of those real-world large populations.

MetaPopGen, a new simulation package from Marco Andrello and Stéphanie Manel, offers a new approach to combine the strengths of these methods and simulate complex evolutionary scenarios in large populations. To do this, they ignore individuals and use genotypes as the basic unit of simulation.  This allows for the user to simulate huge sets of “individuals” and opens up a whole range of demographic and genetic complexity.

Sound too good to be true? The trade-off inherent in these simulations is a limitation to a single locus, making MetaPopGen inappropriate for multilocus investigations:

The strengths and weaknesses of MetaPopGen with respect to other forward-time simulators suggests which simulator can be used depending on the evolutionary scenario. While individual- based simulators are well adapted to multilocus systems where the number of individuals is not too large, MetaPopGen is adapted to simulate scenarios with large numbers of individuals but only one locus. The optimal forward-time simulator capable of dealing with multilocus populations of very large size probably does not exist, and the correct practice is to choose the most adapted simulator to the situation of interest.

So, if you are interested in in simulating the effects of complex demographic scenarios across large metapopulations (as the authors do in the example dataset), MetaPopGen might be just what you are looking for.

Additionally, if you aren’t familiar with genetic simulation software, this paper offers a nice entry point to the field. For example, did you know there is a database comparing different types of simulators? If you are just starting to think about simulating some data, the citations and explanations provided by Andrello and Manel could be helpful to you.

Andrello M. & Manel S. (2015). MetaPopGen: an R package to simulate population genetics in large size metapopulations, Molecular Ecology Resources, n/a-n/a. DOI: http://dx.doi.org/10.1111/1755-0998.12371

## New to the genome sequencing \$8 menu: Nextera library preps!

Researchers are thrifty. We’re always looking for ways to make our expensive supplies and reagents go the extra mile. This shit has been going on for decades – hell, probably even centuries: I remember when I was a kid and my dad paid me \$0.10 for every box of pipette tips that I re-filled by hand (attn: Child Services – this is way below minimum wage).

Well, hold onto your britches bargain-whole-genome-sequencers, because there’s a new preprint that’s just for you! Continue reading

Posted in genomics, methods, next generation sequencing | Tagged , , , , | 7 Comments

## Nature versus nurture in the human immune system

Arnold Schwarzenegger and Danny Devito starred in the 1988 movie Twins. Photo from Wikipedia

An organism’s phenotype is the result of its genotype and its environment. Teasing apart the relative importance of these factors in determining phenotype is a difficult task. However, monozygotic (i.e. identical) twins offer a natural experiment to test the contributions of genes (‘nature’) and environment (‘nurture’) to phenotype.

In their 2015 Cell paper Brodin et al. measured 204 immunological parameters  in 210 sets of healthy twins between 8 and 82 years old. They found that variation in immunity between twins was too great to be explained by variation in their genomes. This suggests that the environment plays a larger role than genotype in determining an individual’s immune system phenotype.

Our results show that these functional units of immunity vary across individuals primarily as a consequence of non-heritable factors, with a generally limited influence of heritable ones.

Brodin et al. also found that young twins were more similar to their sibling than old twins were to theirs, suggesting a divergence of immune systems over time as twins are potentially exposed to different environments than their sibling. This also supports the hypothesis that environment influences immunity more than genotype.

the immune system of healthy individuals is very much shaped by the environment and most likely by the many different microbes that an individual encounters in their lifetime.

Brodin P, Jojic V, Gao T, Bhattachatya S, Lopez Angel CJ, Furman D, Shen-Orr S et al. (2015) Variation in the Human Immune System Is Largely Driven by Non-Heritable Influences. Cell (160) 37-47. DOI: 10.1016/j.cell.2014.12.020