Are you my mother? Exploring the possible microbial ecology of LUCA

Figure modified from Darwin (1837), Haeckel (1866), Woese (1990), Hug et al., (2016), and Weiss et al., (2016) 

One persistent question has always been: where did it all begin? What was the origin of the complex life that we have today? What happened billions of years ago that resulted in beautiful giant manta rays, magnificent Sequoias, and even humans that are incredibly adept at posting cat videos on YouTube?

Ever since the phylogenetic tree was sketched by Darwin in 1837, there has been a search for how life on this planet is connected, and relatively recent advances in sequencing (not to mention how affordable it all is now-a-days) has led to revolutionary studies detailing relationships among extant organisms. It seems that we might be one step closer to painting a picture of the habitat of the last universal common ancestor, or LUCA. Originally, it was thought that LUCA represented the ancestor of bacteria, archaea, AND eukaryotes, although more recently it looks like eukaryotes actually arose from the bacteria and archaea.

In an article out this week by Madeline C. Weiss and co-authors, (which is unfortunately not open access) they reveal an analysis of over 6.1 million protein coding genes from currently available prokaryotic genomes to gather clues in order figure out what the habitat / lifestyle / microbial ecology of LUCA might have been.

Figure 1. Weiss et al., (2016)

Figure 2. Weiss et al., (2016)

The study identified 355 protein families from 286,514 protein clusters that they determined were likely in LUCA’s genome. While it’s tricky to account for confounding factors such as horizontal gene transfer, which might have swapped genetic info around over time, the genes included in this theorized version of LUCA indicate that this organism was anaerobic, fixed CO2 and N2, was dependent on H2, and thermophilic, not to mention required cofactors that depended on a variety of molecules including transition metals, coenzyme A, ferredoxin, and selenium.

Continue reading

RedditDiggMendeleyPocketShare and Enjoy
Posted in evolution, microbiology | Tagged | Leave a comment

How Molecular Ecologists Work: Tracy Heath on TSA precheck, writing on your desk, and not having an alarm clock

Welcome to the next installment of How Molecular Ecologists Work!

This entry is from Dr. Tracy Heath, assistant professor at Iowa State University. Tracy and her lab develop methods and models for inferring phylogenetic relationships. Some of these approaches have included using paleontological data to make better estimates of node ages in phylogenetic trees and being a part of the RevBayes team.

Continue reading

Posted in career, interview | Tagged | Leave a comment

How Molecular Ecologists Work: J. Chris Pires on mono-tasking, not doing it all yourself, and defining that dream job

Welcome to the next installment of the How Molecular Ecologists Work series.

Pires_April2016_Teson_DSCN0140_2This entry is from Dr. J. Chris Pires, associate professor within the Division of Biological Sciences at The University of Missouri. His work is broadly described as plant evolutionary biology — from molecular systematics to patterns of gene expression. Chris’s research program has been both wildly productive and impactful (he is one of Thompson-Reuter’s “Highly Cited Researchers“), but has also been recognized for exemplary mentoring of undergraduate students. How does he do it?

Continue reading

Posted in career, interview | Tagged | Leave a comment

How do Missing Data Impact Phylogenetic Inference with UCEs?

Next-generation sequencing (NGS) has put gobs of sequence data in the hands of molecular biologists, and that data is measurably advancing our prospects for a fully resolved Tree of Life. Nearly simultaneously, however, we have realized that every NGS dataset has unique properties (not a surprise), such as the number of loci you can expect to generate, variability of these loci, their usefulness at either shallow or deep timescales, etc.


Missing data in a UCE concatenation.


A question that is being posed of all types of NGS datasets is: how do missing data affect phylogenetic inference? A new topic this is not; but it has recently taken on new fervency in genomic-scale studies, where missing data are commonplace. The Molecular Ecologist has blogged a bit about this recently as well (see here and here). Because my group is currently using UCEs to address phylogenomics of several different mammal taxa, I was curious if any consensus was emerging on how missing data in these particular markers impact phylogenetic inference.

To satisfy this curiosity, I conducted a brief – but taxonomically and methodologically representative – survey of a variety of recent literature, focusing specifically on papers with datasets characterized by some percent of missing loci, and where these were analyzed alongside more complete datasets. Some common themes emerged. First, in concatenated analyses (both ML and Bayesian), inclusion of more UCE loci at the expense of increasing missing data nearly always increases branch support. Also, in the majority of papers I read, missing data impact topology only minimally, and often not at all. This is consistent with some previous assertions (but which were based on single empirical datasets) that a relatively high amount of missing UCE data (20-50%) may not greatly affect historical inferences.

Second, UCE-based species trees built from summary coalescent or quartet approaches appear slightly more sensitive to missing data, both in terms of topology and support values. Still, the topological variation observed is often small. Moreover, anomalous or highly incongruent trees are usually recovered when built with highly complete (sometimes 100% complete) datasets. This might be expected, because target capture of UCEs yields many fewer loci than some other methods, such as RADseq. Also, UCEs are by their nature often minimally variable. Therefore, a low tolerance for missing data can lead to exclusion of a large proportion of loci (occasionally >90%) and, depending on the system, final datasets with pretty low levels of phylogenetic signal.

So how are species tree methods best utilized with incomplete UCE datasets? This is definitely a fine line, because additional evidence from other types of genomic data suggests summary coalescent methods in particular (such as ASTRAL) perform better when missing data are minimized. One solution is to choose the most phylogenetically informative loci, and to tolerate some small level of missing data in those loci. This could have the effect of maximizing returns when data are incomplete. The optimal level of missing data for such an approach is likely less than that permissible under concatenation, but exact numbers are still hard to come by, and these probably differ depending on the system. Given the significant advances in summary and quartet methods methodologies recently, future work that characterizes the performance of these approaches for UCEs in the presence of different amounts of missing data will be a ripe research area to pursue.

Posted in next generation sequencing, phylogenetics | Tagged , | 2 Comments

My review of Lab Girl for the LA Review of Books


How should I illustrate a review of Lab Girl? Let’s go with a cool plant. This is bunchberry, Cornus canadensis. (Flickr: jbyoder)

NB: Cross-posted from my personal blog.

You have surely, by now, heard all about Hope Jahren’s terrific scientific memoir Lab Girl, including as one of my “bookshelf” recommendations for Chronicle Vitae. My full-length review of Lab Girl is now online at the LA Review of Books, and it is, as you might expect, very positive — Jahren writes beautifully about the process of scientific discovery and the daily miracles of the natural world. As a postdoc still scrabbling for purchase on the lower rungs of the tenure track, though, Lab Girl managed to simultaneously tweak my anxieties and give me hope:

The world is heating up, and it often seems that the intellectual luxuries afforded to scientists of the past — Darwin’s leisurely publication schedule, Haldane’s dalliances with radical politics — are gone. Lab Girl’s rendition of the daily institutional frustrations of research marks it as a different kind of scientific memoir — but also as a product of twenty-first century science. If you navigate among scientists’ blogs or scroll through their Twitter feeds, you’ll quickly find the same fears and vexations and injustices Jahren describes, intertwined with accounts of the work that excites scientists’ passions. … Jahren does not makes science look like an easy career choice, but it isn’t her job to do so — and if Lab Girl chronicles the real and substantial barriers to becoming a successful scientist, it also makes that life compelling: she shows the fruit that can still grow from the rocky soil of a research career.

I do hope you’ll read the whole review, and pick up a copy of Lab Girl if you somehow haven’t already.

Posted in book review, career | Tagged , | Leave a comment

Molecular Inversion Probes: phylogenomics without the excess?

The onset of the phylogenomic era has revolutionized molecular ecology and systematics, helping resolve relationships throughout the tree of life that have long eluded researchers working with only a handful of loci and morphological data. Phylogenetic studies of nonmodel organisms now routinely generate thousands to hundreds of thousands of loci to throw at a given question — despite the fact that only a fraction of these genes are necessary to fully resolve a tree in most cases. (And despite the fact that this glut of data can lead to major computational problems.)

However, the development of approaches intermediate between multiplex PCR and sequence capture / RAD-based methods has lagged behind the more extreme end of the spectrum. Where, then, does the biologist seeking to generate a reasonably-large-but-not-gratuitous number of loci turn? A new  method known as Molecular Inversion Probes (MIPs) may provide an answer.

Figure 2 from Niedzicka et al. 2016: molecular inversion probe structure.

Figure 2 from Niedzicka et al. 2016: Molecular Inversion Probe structure and implementation.

As described in an article published in Nature Scientific Reports by M. Niedzicka and colleagues earlier this year, MIPs are 112 bp single-stranded nucleotides characterized by the presence of specific ligation and extension sequences that flank a target sequence of interest, and are bridged by “linker” sequence (Figure 1). During hybridization of MIPs to target DNA, gap-filling and ligation produce molecules containing the targeted sequence joined with adaptors and barcodes ready for downstream use.

Originally popular for use in biomedical research and human genome sequencing, Niedzicka et al. tested MIPs on the nonmodel salamanders Lissotriton montandoni and L. vulgaris. The team designed probes to target sequence across the genome from transcriptome data, focusing on regions that were diagnostic at the species level and identifying exon boundaries through a homology-based approach that relied on the conservation of these regions across vertebrates. Of 248 designed markers, 234 amplified successfully, and 80% of those had median coverage within one order of magnitude. Additionally, 77% of the MIPs were confirmed as single copy Mendelian markers, and replicate samples were genotyped identically with MIPs 99% of the time.

Continue reading

Posted in genomics, next generation sequencing, phylogenetics, phylogeography, population genetics, transcriptomics | Tagged , | Leave a comment

What do dolphins, bivalves and algae have in common?

Collaboration as it turns out, between three scientists interested in vertebrates, invertebrates and algae!

A few days before we left for Evolution 2016 in Austin, one of my collaborators, Eric Pante, came to Charleston as the final stop in a North American sampling trip.

Eric had been a master’s student at the College of Charleston, where I was a post-doc. We knew the same people it turned out in the US and in France, where I did my PhD. Yet, we had never crossed paths until I wrote a post for TME about a paper he led on species as hypotheses. Thanks to TME, I had found an eager host for a trip to La Rochelle, France as part of the Northern Hemisphere Gracilaria sampling effort led by myself and colleagues at the College of Charleston.

Eric’s arrival in Charleston led to the realization that I had never finished my travelogue series. I left off with the German and Danish sampling anecdotes way back in December. My last two stops in France were sadly neglected.

While we were searching for Gracilaria in and around La Rochelle in September 2015, I got to talking to Eric’s collaborator Amelia Viricel. In addition to the TME connection with Eric, I started another collaboration with Amelia on invasive ascidians.

Once I was back in Charleston and started the travelogue posts, I wanted to try and highlight the research that was going on in the labs that opened their doors to us. While talking to Amelia, her work on highly mobile marine predators was certainly a departure from the things I normally think about or write about!

Continue reading

Posted in bioinformatics, blogging, career, conferences, DNA barcoding, haploid-diploid, natural history, phylogenetics, phylogeography, population genetics | Tagged , , , , , | Leave a comment