IMa2p – Parallel Isolation with Migration Analyses

I figured that it was time to write an update on my post from a year ago on Bayesian MCMC in inferring ancestral demography. Recently, my postdoctoral advisor, Jody Hey and I released a version of the popular IMa2 program, called “IMa2p” which extends all the functionalities of IMa2 (and more!) to run your divergence genomics runs faster than you could before. Here is a quick blurb from our recent paper where we describe the algorithms, and speedups in computation that IMa2p has to offer.

Speed-ups in computational time using IMa2p, using datasets of varying sizes. Image from Fig. 1 of Sethuraman and Hey (2015).

Speed-ups in computational time using IMa2p, using datasets of varying sizes. Image from Fig. 1 of Sethuraman and Hey (2015).


IMa2 (Hey and Nielsen 2007, and other programs in the IM suite) is a Bayesian MCMC based method that estimates ancestral demography (population mutation rates, divergence times, and migration rates) under an ‘Isolation with Migration’ (IM) model (Nielsen and Wakeley 2001). If you’ve used IMa2 (or any other Bayesian MCMC sampler) before, you would have also noticed that increasing the size of data (either number of genotyped loci, number of individuals, size of loci, number of populations, and correspondingly number of parameters) increases the computational time super-exponentially (also see Hey 2010). Larger data sets are also increasingly difficult to converge (see my earlier post on what this means), and computationally intensive.
IMa2p is a parallelized (OpenMPI-C++) version of IMa2, which allows distribution of the MCMC step (also called the ‘M’ mode in IMa2 parlance) across multiple cores, and collating sampled genealogies across processors while performing estimation of posterior density distributions, and likelihood ratio tests (also called the ‘L’ mode).
In our paper, we report (a) increased linearity in computational speed improvement with increasing number of loci analyzed, (b) increased departure from linearity with high variance in computational time among loci (for eg. while using large priors on migration rates), and (c) consistency in estimates of posterior density distributions with varying number of processors/cores.
You can download IMa2p and instructions on installation and running it on my Git page here.
Good luck, and do write to me (arun@temple.edu) if you have any questions, queries, or to report bugs!
References:
Sethuraman, Arun, and Jody Hey. “IMa2p–parallel MCMC and inference of ancient demography under the Isolation with migration (IM) model.” Molecular ecology resources (2015). DOI: http://dx.doi.org/10.1111/1755-0998.12437
Nielsen, Rasmus, and John Wakeley. “Distinguishing migration from isolation: a Markov chain Monte Carlo approach.” Genetics 158.2 (2001): 885-896.
Hey, Jody. “Isolation with migration models for more than two populations.”Molecular biology and evolution 27.4 (2010): 905-920. DOI: http://dx.doi.org/10.1093/molbev/msp296

Posted in bioinformatics, genomics, howto, software, theory | Tagged , , , , , | 1 Comment

Dispersal by land or by sea

Here, we compare and contrast the traits and selective forces influencing the evolution of dispersal in marine and terrestrial systems. From this comparison, a unifying question emerges: when is dispersal for dispersal and when is dispersal a by-product of selection on traits with other functions?

Dispersal sometimes seems like one of the “big things” that gets lost in the present trajectory of molecular ecology. We know a lot about how dispersal varies between species, populations, and individuals, but it sure is a tricky set of parameters to include in most modern popgen analyses.
One (of many) reasons why dispersal and gene flow are so difficult to generalize is that the why and how can vary so greatly between organisms. Is that jellyfish “dispersing” or just floating around? Is that frog dispersing because of the density of conspecifics or some other reason?

1024px-Chrysaora_Colorata

“Wait, uhh, am I dispersing here or what?” (Image by Sanjay Acharya)


Burgess et al. recently published a review that tackles some of these issues and points out that their are big differences between marine and terrestrial dispersal (surprise!) that mostly get left out of theory. However, a bigger goal of the review is asking scientists to think harder about dispersal as a direct adaption or as a by-product of some other process, and they outline a multivariate model for getting started.
Figure 1 (B) of Burgess et al. (2015)

Figure 1 (B) of Burgess et al. (2015)


I’ll leave it up to how adaptationist you are for deciding when a by-product is actually an adaptation. I’m not here to get all spandrel-y. For now, its hard to argue against understanding the complexity that underlies dispersal, whether by land or by sea.

A trait-based approach, focused on selection on traits that influence dispersal, will not only improve our understanding of when dispersal is a direct adaptation versus a by-product, but can also advance the integration of theory and data. Theories of dispersal evolution would benefit from considering the evolutionary causes of movement in general as well as additional agents of selection on the multiple traits that influence dispersal specifically.

 
Burgess, S. C., Baskett, M. L., Grosberg, R. K., Morgan, S. G., & Strathmann, R. R. (2015). When is dispersal for dispersal? Unifying marine and terrestrial perspectives. Biological Reviews. DOI: 10.1111/brv.12198
 

Posted in population genetics | Tagged , , | Leave a comment

Raising the NIH pay-line to 20%

I bet that title got your attention.
In the good ol’ days our funding record made the United States look like the land of milk and honey. As Bruce Alberts’ and colleague wrote in PNAS earlier this year:

“The United States has traditionally been viewed as the land of opportunity for young scientists, offering the most talented of them the chance to test their own ideas, raise radically new questions, and forge original paths to the answers. This feature of our system has drawn many of our most able young people to scientific careers, while simultaneously attracting outstanding young people to the United States from around the world.”

Well, those days are no more. Now young investigators are 6 times less likely to win an NIH grant than they were 30 years ago:

Percentage of NIH R01 Principal Investigators aged 36 and younger and aged 66 and older, 1980–2010 (from: http://acd.od.nih.gov/biomedical_research_wgreport.pdf)


So what can be done about it? As a junior researcher, I think about this issue a lot – mostly because I’m selfish and want to know when and where my next academic “meal” is coming from. So it pleases me to see that others (read: those with more clout) are organizing workshops to try and right the ship.

Continue reading

Posted in career, funding, NIH, politics, United States | Tagged , , , | Leave a comment

Genomics: the "four-headed beast" of Big Data

Big Data in the cloud. Photo from internap.com

Big Data in the cloud. Photo from internap.com


When I bought my first laptop in 2005, it came with a free 64MB flash drive*, which I thought was pretty awesome. Given the rate at which genomic data generation has increased in the past decade, the storage capacity of that flash drive is laughable today. In their new PLOS Biology paper, Stephens et al. talk about genomics as a Big Data science, compare it to other Big Data domains (Astronomy, YouTube, and Twitter, specifically), and project where genomics is headed in the next decade in terms of data acquisition, storage, distribution, and analysis. Continue reading

Posted in Uncategorized | Leave a comment

The Butterfly Effect

This might just take the prize for the ‘spiciest’ story in molecular co-evolution for 2015, yet. While a lot of the press coverage sounds like caterpillar thanksgiving, the science behind this study stands for the almost incredible power of molecular phylogenetics in unveiling the adaptive evolution of traits.

Edger et al. (2015) in a recent study report the coincident evolutionary arms-race of Brassicales (angiosperms that include mustard, horseradish, cabbage, broccoli, etc.), and their predatory butterflies in the family Pieridae. Using whole transcriptome sequences of Brassicales, and nuclear gene phylogneis for Pieridae, Edger et al. (2015) perform phylogenetic analyses across 1155 genes, calibrated using fossil estimates to date/delineate genome duplication, and diversification events in glucosinolate(purportedly evolved toxic plant defenses against predatory Pierid caterpillars, but rendering the characteristic sharp tastes to edible Brassicales) pathways.

Fossil calibrated phylogenetic reconstructions of Brassicales, and Pieridae showing co-evolutionary diversification (of glucosinolates) in Brassicales, and detoxification in Pieridae. Image courtesy: Fig. 1 of Edger et al. (2015).


Key findings of this study include:

  1. the appearance of glucosinolates after a whole genome duplication event in Brassicales (~78 Mya),
  2. the ability of Brassicales to synthesize glucosinolates from substrates was ancestral,
  3. the escalation of glucosinolate diversity in Brassicales was a result of an arms-race against butterflies due to retention and neofunctionalization of genes after single gene, and whole genome duplications, and
  4. subsequent evolution of adaptation to toxicity of glucosinolates in different Pieridae coincides with diversification events in Brassicales and vice versa.

 
Consistent with the hypothesis that retention of duplicates after WGD is driven by selective benefits, previous analyses indicates a high metabolic cost of glucosinolate production, a result incompatible with the retention of glucosinolate duplicate being neutral.

Reference:
Edger, Patrick P., et al. “The butterfly plant arms-race escalated by gene and genome duplications.” Proceedings of the National Academy of Sciences(2015): 201503926.

Posted in adaptation, evolution, genomics, natural history, population genetics, selection, speciation, transcriptomics | Tagged , , , , | 1 Comment

Can hybridization save a species, genes, or both?

Climate change is real, species are going to move around, and it will definitely cause some problems.
Even if you aren’t a conservation biologist, the above common knowledge has likely permeated into your scientific life at some level. What conservation biologists plan to do about it likely has not.
Continue reading

Posted in adaptation, conservation | Tagged | Leave a comment

Mixed modeling of methylation measures (increase your power by 60%)

Do you want to increase your power to detect differentially methylated CpG sites by 60%*? Yes?! Then do I have the pre-print for you.
Continue reading

Posted in bioinformatics, genomics, methods, next generation sequencing, software | Tagged , , | 1 Comment

marmap

A couple years ago, Benoit Simon-Bouhet ended up sharing an office with Eric Pante, then a post-doc fellow in his former lab. The two quickly realized they were in a lab in which few people had the expertise or taste for coding. Thus, on a daily basis, they were both approached by colleagues and students to take a look at their data analysis and graphics. Meanwhile, they had their own not-so-easy tasks of creating publication-quality maps for themselves as well as their colleagues.
They both had bits and pieces of R scripts scattered around their hard drives to (i) import bathymetric data previously downloaded locally from public databases, (ii) reshape these data in a form suitable for plotting in R and (iii) plot the bathymetry together with other data such as sampling sites or other locations of interest. The process was tedious, convoluted (especially for the manually download online bathymetric data) and required a good knowledge of the R scripting langage.
In order to ease this process, the two embarked on creating marmap (short for marine maps) …

Continue reading

Posted in community ecology, conservation, evolution, howto, natural history, R, software | Tagged , , , , | Leave a comment

Dōmo arigatō

Along with my collaborators, Erik Sotka, Courtney Murren, Allan Strand and our battery of students, we have embarked on an intense summer field season. Erik and I are leading the effort of sampling populations of the introduced red seaweed Gracilaria vermiculophylla. It is native to the northwest Pacific, but has been introduced to every continental margin in the Northern Hemisphere in the last few decades.
To date, studies on marine invasions focus principally on demographic and ecological processes, and the importance of evolutionary processes has been rarely tested. Moreover, there are surprisingly few studies that compare native and non-native populations in their biology or ecology. Our current project integrates population genetics, field surveys and common-garden laboratory experiments to address the role of rapid evolutionary adaptation in invasion success.

The weed that launched a massive collaborative project tracings its evolution during invasion

The weed that launched a massive collaborative project tracings its evolution during invasion



For my part, I knew my summer would be filled with a month long sojourn in Japan, with short trips every 10 – 15 days around North America bookended by a month long trip to sample European coastlines with a return to my old haunts in northwestern France.
Alas, Erik’s first leg in Japan (see some photos here) resulted in a mountain (or maybe seamount, see my next post on the R package marmap!) of live algae for culturing and phenotyping! Life became decidedly hurried!
Our students: Paige Bippus (CofC undergrad, Class of '16, middle right), Lauren Lees (CofC undergrad, Class of '17, middle left), Sarah Shainker (CofC undergrad, Class of '16, bottom middle) and Ben Flanagan (CofC GPMB grad student, Class of '17)

Our students: Paige Bippus (CofC undergrad, Class of ’16, middle right), Lauren Lees (CofC undergrad, Class of ’17, middle left), Sarah Shainker (CofC undergrad, Class of ’16, bottom middle) and Ben Flanagan (CofC GPMB grad student, Class of ’17)


No pre-emptive posts were penned … just shepherding our fantastic students into the ins and outs of red algal culturing, while keeping up morale with the endless playlists of Songza (80’s Prom being a particular favorite due to the aptly timed “Turning Japanese” while processing Japanese populations of G. vermiculophylla).
I had spoken with Jeremy about posting field work stories as well as highlighting the research of interest to TME readers from the marine labs we’d be visiting throughout the summer. Yet, once again, time did not allow for live posts while in country … so over the next few weeks, I’ll be posting some field anecdotes as well as a description of the different places we had the opportunity and good fortune to visit.
We are celebrating our Independence Day in the US this weekend, so I’ll leave all of you with a few pictures for the long holiday weekend! As well as a massive thank you to all of our hosts in Japan who made this a hugely successful as well as a once in a lifetime trip!
Kelp drying in Muroran, Hokkaido, Japan

Kelp drying in Muroran, Hokkaido, Japan


The tale of an urchin and an anemone

The tale of an urchin and an anemone


Erik Sotka, Rob Hadfield (my partner in crime and in the field in Japan!) and me at one field site in Akkeshi, Hokkaido, Japan

Erik Sotka, Rob Hadfield (my partner in crime in life and in the field in Japan!) and me at one field site in Akkeshi, Hokkaido, Japan


Benten-jinja Shrine in Akkeshi-ko

Benten-jinja Shrine in Akkeshi-ko


Marimo from Akan-ko

Marimo from Akan-ko


Fushimi Inari-taisha in Kyoto

Fushimi Inari-taisha in Kyoto


Kimono

Kimono


Dinner with one of our hosts, Dr. Masahiro Nakaoka in Kimitsu

Dinner with one of our hosts, Dr. Masahiro Nakaoka, in Kimitsu


Dōmo arigatō!

Posted in adaptation, blogging, community, evolution, haploid-diploid, natural history | Tagged , , , , , , | 3 Comments

Societal constructs, and Genetic diversity

While we grapple with numerous discoveries of variation in genomic diversity in humans, interest has subsequently risen in understanding their causes/results. Two recent papers describe experiments to determine (a) the effects of marital rules (who gets to marry whom) on genomic diversity (Guillot et al. 2015), and (b) the correlations between effectively random-mating, and inbreeding human populations and various health-related quantitative traits (Joshi et al. 2015).

Sums of Runs of Homozygosity (SROH) shown as a function of cohorts of human populations. Figure courtesy: Fig. 1 from Joshi et al. (2015) http://www.nature.com/nature/journal/vaop/ncurrent/fig_tab/nature14618_F1.html

Sums of Runs of Homozygosity (SROH) shown as a function of cohorts of human populations. Figure courtesy: Fig. 1 from Joshi et al. (2015) http://www.nature.com/nature/journal/vaop/ncurrent/fig_tab/nature14618_F1.html


Relaxed Observance of Traditional Marriage Rules Allows Social Connectivity without Loss of Genetic Diversity, Guillot et al. Molecular Biology and Evolution, 2015.
Marital rules – societal constructs on who marries whom are predominant in several human populations. Biologically, one would hypothesize that these rules also influence genetic diversity of the population, and thus the fitness of offspring. Guillot et al. (2015) attempt via simulations, and analyses of SNP diversity in an Indonesian population to quantify relaxed, or strict adherence to these rules, particularly the MBD rule (or Mother’s Brother’s Daughter) wherein men are required to marry their mother’s brother’s daughter. Key findings of the study include (a) strict MBD marital rules lead to a reduction in genomic diversity under simulations, (b) non-adherence of strict MBD rules in the Rindi community in Eastern Indonesia, an island population in which marital rules have been extensively studied.

Certainly, reduced genetic diversity under a strict interpretation of the APA marriage rules suggests that there was little biological incentive for communities to enforce marriage rules strongly, at least for long periods of time.

Directional dominance on stature and cognition in diverse human populations, Joshi PK et al. Nature, 2015.
While the detrimental effects of inbreeding (and marital rules like in Guillot et al. above) have been extensively studied in Mendelian traits in humans, most fitness traits are complex, and polygenic. Joshi et al. (2015) as part of the ROH (Runs of Homozygosity) consortium investigate 16 quantitative traits that have fitness consequences in humans and their correlations with homozygosity. Analyses of SNP arrays for ROH in more than 300,000 individuals revealed (a) differences in ROH lengths, and demography (with African populations containing the least homozygosity, and isolated populations, including Amish, and Hutterites containing the most homozygosity), (b) an average reduction of 1.2 cm in height, and 137 ml in forced expiratory volumes in offspring of first cousins, (c) 0.3 standard deviations reduction in general cognitive ability, and 10 months’ reduction in educational attainment in offspring of first cousins, and (d) no significant effect in 12 other fitness related traits (particularly to do with cardio-metabolism).

We have demonstrated the existence of directional dominance on four complex traits (stature, lung function, cognitive ability and educational attainment), while showing any effect on another 12 health-related traits is at least almost an order of magnitude smaller, non-linear or non-existent.

References:
Joshi, Peter K., et al. “Directional dominance on stature and cognition in diverse human populations.” Nature (2015) DOI:10.1038/nature14618
Guillot, Elsa G., et al. “Relaxed observance of traditional marriage rules allows social connectivity without loss of genetic diversity.” Molecular biology and evolution (2015). DOI: 10.1093/molbev/msv102

Posted in evolution, genomics, natural history, population genetics, selection, societal structure | Tagged , , , , , , , | Leave a comment