Population genomics in the “melting pot”

(Flickr: Amanda)

North America is full of new arrivals. Europeans colonized the continent starting half a millennium ago, displaced and eradicated native populations, and brought enslaved workers from Africa with them — and further immigrants have followed ever since. This mass movement of people is a huge complication for studies of human population genetics, but it’s also an opportunity to study how that movement is reflected in the diversity of the people who now live in North America. One study of people in the Caribbean, for instance, found the effects of colonization and the slave trade, but also evidence of migration across the region that pre-dated both.

An important tool for studying the complex human history of North America is emerging from a consumer trend you’ve probably heard about on a couple thousand podcast sponsorship messages — personalized genetic analysis. Services like 23andMe and Ancestry.com offer genome-wide genotyping and comparison to geographically-specific samples to identify your ancestors’ origins, and both companies ask customers to volunteer their data for research. Data collected by 23andMe allowed comparison of genetic ancestry to racial and ethnic identity that reveals how slippery the relationship between race and biology really can be. Now, a study of AncestryDNA customers helps link the history of colonization and migration across North America to individual Americans’ family histories.

Continue reading

Posted in genomics, pedigree | Tagged , , | Leave a comment

Friday Action Item: Remind your Senators to vote against Scott Pruitt

“‘Teepee’ burner incinerates lumber mill’s waste, 05/1972.” — a photo taken by Boyd Norton as part of an early EPA project to document the country, before modern environmental regulations took effect, highlighted by the Discover EPA Twitter feed. (Flickr: US National Archives)

On Fridays while the current administration is in office we’re posting small, concrete things you can do to help make things better. Got a suggestion for an Action Item? E-mail us!

It’s been almost a full month since the inauguration (good heavens) and it’s hard to keep track of the scandals and outrages, more or less as expected. But if you’ve got it in you to make one call to Congress this week (and we hope you do), let us suggest you spend it on reinforcing the resistance to the nominee for head of the Environmental Protection Agency.

Scott Pruitt has built a political career working against the very concept of evidence-based regulation — not just rules to reduce carbon emissions, but regulation of mercury and particulate pollution as well. The Senate is scheduled to vote on his confirmation today, even as Democrats have been requesting a delay while Pruitt’s office complies with a court order to release e-mails that could illuminate his ties to the fossil fuel industry, and EPA employees themselves are, extraordinarily, lobbying the Senate against a prospective boss who opposes the very mission of the agency.

So: Call your Senators. Tell them to vote against Pruitt.

Posted in Action Item | Tagged , | Leave a comment

Music to an amniote’s ears, an “accordion” model of genome size evolution

How did we get where we are? Genetically speaking, that is. A few posts ago, that whole genotype-phenotype question was discussed, how do genomes make plants and animals (and don’t forget the microbes!) look and act how they do. Another broader question linked to this topic involves trying to understand how genomes have evolved over time, clearly no small task.

Continue reading

Posted in adaptation, bioinformatics, evolution, genomics | Tagged , , , | 2 Comments

Friday Action Item: Time for another round of Donors Choosing

(Flickr: Paul Gorbould)

On Fridays while the current administration is in office we’re posting small, concrete things you can do to help make things better. Got a suggestion for an Action Item? E-mail us!

This week the U.S. Senate approved Donald Trump’s nominee for Secretary of Education, Betsy DeVos, on a one-vote margin and over unprecedented opposition. DeVos is a billionaire heiress with no direct experience of public education, and was unable to answer basic questions about Federal education policy in her confirmation hearing, but she has spent millions lobbying for efforts to transfer public funding to for-profit charter schools and religious education.

So this seems like a good week to return to Donors Choose. The last time I proposed this as an Action Item, I pointed out that the U.S. K-12 education system has been horrifically unequal long before the arrival of Betsy DeVos, because most of our school funding comes from local property taxes. As Terry McGlynn noted this week on Twitter, it’s so routine for public school teachers to buy classroom supplies out of their own pocket that it’s a standard deduction item on income tax forms:

Donors Choose lets us help teachers save their none-too-generous paychecks by pitching in money for individual projects or classroom resources. You can easily search for biology-specific requests, or for schools in your own neighborhood, but here’s a few good options to get you started:

For bonus #resistance points, maybe add a memo to dedicate your donation to the new Secretary of Education.

Posted in Action Item | Tagged | Leave a comment

Phylogenetic trees in R using ggtree

Recently, one R package which I like to use for visualizing phylogenetic trees got published. It’s called ggtree, and as you might guess from the name it is based on the popular ggplot2 package. With ggtree, plotting trees in R has become really simple and I would encourage even R beginners to give it a try! When you’ve gotten the hang of it, you can modify and annotate your trees in endless ways to suit your needs.

ggtree supports the two common tree formats Newick and Nexus. It also reads outputs from a range of tree-building software such as BEAST, EPA, HYPHY, PAML, PHYLDOG, pplacer, r8s, RAxML and RevBayes.

library("ape")
library("Biostrings")
library("ggplot2")
library("ggtree")
nwk <- system.file("extdata", "sample.nwk", package="ggtree")
tree <- read.tree(nwk)
tree

After you’ve loaded your tree in R, visualization is really simple. The ggtree function directly plots a tree and support several layouts, such as rectangular, circular, slanted, cladogram, time-scaled, etc.

Continue reading

Posted in bioinformatics, howto, phylogenetics, R | Tagged , , , | Leave a comment

False detection of “true” species under the multi-species coalescent model

The multi-species coalescent model (MSCM) is the biggest name in the game (if the game is genetic species delimitation). But a new paper from Proceedings of the National Academy of Sciences asks: is the MSCM really doing what we think it’s doing?

Species or Population Structure Meme

Some background: The MSCM, usually implemented in the program BPP (Yang & Rannala 2010), models speciation as an instantaneous event under the birth-death process.

But we know that the biological reality is more complex. Within most species there is some amount of gene flow restriction (e.g., due to environmental or geographic barriers), not all of which will eventually lead to speciation. Depending on the extent and duration of isolation and the strength of selection, speciation can be a gradual and stochastic process.

Sukumaran & Knowles (2017) tested the performance of the MSCM using data simulated under the “protracted speciation model,” which includes a few more biologically relevant parameters compared to the simpler birth-death model. Two key components of the protracted speciation model are the species conversion rate (c, the rate at which incipient species develop into true species), and another parameter that accounts for incipient species going extinct or merging back into their parent species.

Figure 1 from Sukumaran & Knowles 2017. The multi-species coalescent model may over-estimate the “true” number of species on a phylogeny.

The authors used two different simulation schemes: a “fixed duration” scheme, where the simulations ran for a fixed amount of time and produced varying numbers of species, and a “fixed species number” scheme, where simulations ran until five species were generated.

Perhaps you’ve already guessed what happened: Sukumaran & Knowles found that the MSCM is great at identifying lineages, but it overestimates the number of species. In fact, the MSCM can estimate 5 to 13 times more than the true number of species. It is also worth noting that the errors are all positive; i.e., BPP never underestimated the number of true species but only overestimated them.

Figure 2 from Sukumaran & Knowles 2017. The multi-species coalescent model infers more than the true number of species, under a variety of simulation conditions (A). However, it does correctly estimate the number of lineages (B).

Why does it matter? These methods lead to inflated diversity estimates, with direct consequences for conservation and ecology research. For now, the authors suggest using morphological, ecological, ethological, or other classes of data to correctly attribute MSCM results to either species-level or population-level processes – a call that has been echoed by other researchers in the last 6 months (e.g., Freudenstein et al. 2016).

SukumaranTwitterExchangeThis study also served as a call for new methods for genetic species delimitation, and the researchers have already tweet-hinted at a new method that may be coming down the pipe soon. I imagine the new method will have some basis in protracted speciation model? I’m looking forward to reading it.

Cited:

Sukumaran, J., & Knowles, L. L. (2017). Multispecies coalescent delimits structure, not species. Proceedings of the National Academy of Sciences, 201607921. doi: 10.1073/pnas.1607921114

Yang, Z., & Rannala, B. (2010). Bayesian species delimitation using multilocus sequence data. Proceedings of the National Academy of Sciences107(20), 9264-9269. doi: 10.1073/pnas.0913022107

Freudenstein, J. V., Broe, M. B., Folk, R. A., & Sinn, B. T. (2016). Biodiversity and the Species Concept—Lineages are not Enough. Systematic Biology. doi: 10.1093/sysbio/syw098

Posted in software, speciation | Tagged , , , , , | Leave a comment

That’s an H. erato of a different color!

Modified from Figure 1 (Belleghem et al., 2017). Sample of diversity among H. erato.

What drives different coloration among birds, insects, flowers? One of the major goals in evolutionary studies is understanding what is going on in DNA that makes organisms different. A fancy way to say this is studying how an organism’s genotype (the genome) influences the phenotype (observed characteristics).

Modified from Figure 1 (Belleghem et al., 2017). Geographical distribution, phylogeny and color pattern diversity among H. erato individuals

From yeast to Darwin’s finches (and everything in between), there are a variety of models that provide study systems to tease apart the link between genotype and phenotype. In particular, it’s helpful when the model system has undergone a recent adaptive radiation, so that there are a bunch of representatives that look diverse.

Continue reading

Posted in evolution, genomics, phylogeography | Tagged , , | Leave a comment