On false positives in Isolation with Migration analyses

The IM suite of tools (IM, IMa, IMa2, IMa2p, etc.) are used widely by molecular ecologists at large for the analyses and estimation of ancestral demography under an Isolation with Migration (IM) model. However, these tools come with fundamental assumptions on the evolutionary processes underlying genomic loci under question – loci are assumed to be independent, freely recombining between loci, non-recombining within loci, and putatively neutral. These methods have been previously found to be robust for inference using data-sets that largely fit these assumptions. In a recent publication, we (Hey et al. 2015) analyzed via simulations, conditions under which utilizing likelihood ratio tests for inference under an Isolation with migration (IM) model may result in an excess of false positives for the presence of migration – an observation previously reported by Cruickshank and Hahn (2014). Particularly, we were interested in the case of data sets we call “SDLD”, or “Small Data Low Divergence” – i.e. the number of loci sampled is small (< 5 as reported by Cruickshank and Hahn (2014)), and exhibiting very low divergence between populations.

We simulated 20 data sets with two loci, low divergence (t=0.5), no migration (m=0), and analyzed these data using a modified version of IMa2, which approximates the joint posterior density of migration rates, and divergence times (having integrated out the population size parameters). Our key findings include: (1) high false positive rates for migration, (2) joint posterior density estimates indicate areas of high posterior densities for models with low m and t, and models with high m, and t, indicating model identifiability issues in the SDLD context, (3) these very different models show similar expected allele frequency spectra, and differentiation distributions (measured as Weir and Cockerham’s φst).

Two population AFS under two simulated scenarios - Fig. A showing  the case of low m, and low t, versus Fig. B showing the case of high m, and high t. Image courtesy: Hey et al. (2015) DOI: 10.1111/mec.13381

Two population AFS under two simulated scenarios – Fig. A showing the case of low m, and low t, versus Fig. B showing the case of high m, and high t. Image courtesy: Hey et al. (2015) DOI: 10.1111/mec.13381

 

Besides cautioning against poor sampling (an issue which I have also previously discussed here), our study also points to a high false positive rates for detecting migration using likelihood ratio tests while using the IM suite of tools on data that show low divergence (eg. very low Fst), and while using a small number of loci.

References:

Hey, J., Chung, Y, & Sethuraman, A. (2015). “On the occurrence of false positives in tests of migration under an isolation-with-migration model.” Molecular Ecology DOI: 10.1111/mec.13381

Cruickshank, Tami E., and Matthew W. Hahn. “Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow.” Molecular Ecology 23.13 (2014): 3133-3157.

Share

About Arun Sethuraman

I am a computational biologist, and I build statistical models and tools for population genetics. I am particularly interested in studying the dynamics of structured populations, genetic admixture, and ancestral demography.
This entry was posted in evolution, genomics, howto, IMa2, methods, Molecular Ecology views, natural history, population genetics, software, theory and tagged , , , , , , , . Bookmark the permalink.