The imitation game: simulating the genetics of large populations

The most adorable of simulations. Credit to Liza Gross


Computational simulations of genetic data are such a powerful and flexible tool for carrying out studies in molecular ecology.
Do you want to know how much explanatory power your data provides? Simulate it!
Predict the future response of species to hybridization, climate change, or translocation? Simulate it!
Do you want to know what it is like to run a city, drive a city bus, or be a goat? Ehhh, that’s not really what I’m talking about.
Many of the programs for simulating genetic data rely on constructing simulations based on individuals. Simulating individuals makes a lot of sense: easy to interpret and flexible for many evolutionary scenarios. However, the biggest limitation to individual-based simulators is that the computing power needed to simulate large numbers of individuals can be unwieldy. And if you are really trying to simulate biological phenomenon, large number of individuals is likely a requirement.
There are other types of models for simulation (analytical models) that focus more intently on a handful of genetic parameters of interest. These simulators obtain more accurate estimations of parameters of interest by sacrificing the complexity that may be more representative of those real-world large populations.
MetaPopGen, a new simulation package from Marco Andrello and Stéphanie Manel, offers a new approach to combine the strengths of these methods and simulate complex evolutionary scenarios in large populations. To do this, they ignore individuals and use genotypes as the basic unit of simulation.  This allows for the user to simulate huge sets of “individuals” and opens up a whole range of demographic and genetic complexity.
Sound too good to be true? The trade-off inherent in these simulations is a limitation to a single locus, making MetaPopGen inappropriate for multilocus investigations:

The strengths and weaknesses of MetaPopGen with respect to other forward-time simulators suggests which simulator can be used depending on the evolutionary scenario. While individual- based simulators are well adapted to multilocus systems where the number of individuals is not too large, MetaPopGen is adapted to simulate scenarios with large numbers of individuals but only one locus. The optimal forward-time simulator capable of dealing with multilocus populations of very large size probably does not exist, and the correct practice is to choose the most adapted simulator to the situation of interest.

So, if you are interested in in simulating the effects of complex demographic scenarios across large metapopulations (as the authors do in the example dataset), MetaPopGen might be just what you are looking for.
Additionally, if you aren’t familiar with genetic simulation software, this paper offers a nice entry point to the field. For example, did you know there is a database comparing different types of simulators? If you are just starting to think about simulating some data, the citations and explanations provided by Andrello and Manel could be helpful to you.
Andrello M. & Manel S. (2015). MetaPopGen: an R package to simulate population genetics in large size metapopulations, Molecular Ecology Resources, n/a-n/a. DOI: http://dx.doi.org/10.1111/1755-0998.12371

This entry was posted in R, software and tagged , . Bookmark the permalink.