IMa2p – Parallel Isolation with Migration Analyses

I figured that it was time to write an update on my post from a year ago on Bayesian MCMC in inferring ancestral demography. Recently, my postdoctoral advisor, Jody Hey and I released a version of the popular IMa2 program, called “IMa2p” which extends all the functionalities of IMa2 (and more!) to run your divergence genomics runs faster than you could before. Here is a quick blurb from our recent paper where we describe the algorithms, and speedups in computation that IMa2p has to offer.

Speed-ups in computational time using IMa2p, using datasets of varying sizes. Image from Fig. 1 of Sethuraman and Hey (2015).

Speed-ups in computational time using IMa2p, using datasets of varying sizes. Image from Fig. 1 of Sethuraman and Hey (2015).

IMa2 (Hey and Nielsen 2007, and other programs in the IM suite) is a Bayesian MCMC based method that estimates ancestral demography (population mutation rates, divergence times, and migration rates) under an ‘Isolation with Migration’ (IM) model (Nielsen and Wakeley 2001). If you’ve used IMa2 (or any other Bayesian MCMC sampler) before, you would have also noticed that increasing the size of data (either number of genotyped loci, number of individuals, size of loci, number of populations, and correspondingly number of parameters) increases the computational time super-exponentially (also see Hey 2010). Larger data sets are also increasingly difficult to converge (see my earlier post on what this means), and computationally intensive.
IMa2p is a parallelized (OpenMPI-C++) version of IMa2, which allows distribution of the MCMC step (also called the ‘M’ mode in IMa2 parlance) across multiple cores, and collating sampled genealogies across processors while performing estimation of posterior density distributions, and likelihood ratio tests (also called the ‘L’ mode).

In our paper, we report (a) increased linearity in computational speed improvement with increasing number of loci analyzed, (b) increased departure from linearity with high variance in computational time among loci (for eg. while using large priors on migration rates), and (c) consistency in estimates of posterior density distributions with varying number of processors/cores.

You can download IMa2p and instructions on installation and running it on my Git page here.

Good luck, and do write to me (arun@temple.edu) if you have any questions, queries, or to report bugs!

References:
Sethuraman, Arun, and Jody Hey. “IMa2p–parallel MCMC and inference of ancient demography under the Isolation with migration (IM) model.” Molecular ecology resources (2015). DOI: http://dx.doi.org/10.1111/1755-0998.12437

Nielsen, Rasmus, and John Wakeley. “Distinguishing migration from isolation: a Markov chain Monte Carlo approach.” Genetics 158.2 (2001): 885-896.

Hey, Jody. “Isolation with migration models for more than two populations.”Molecular biology and evolution 27.4 (2010): 905-920. DOI: http://dx.doi.org/10.1093/molbev/msp296

Share

About Arun Sethuraman

I am a computational biologist, and I build statistical models and tools for population genetics. I am particularly interested in studying the dynamics of structured populations, genetic admixture, and ancestral demography.
This entry was posted in bioinformatics, genomics, howto, software, theory and tagged , , , , , . Bookmark the permalink.