Show me the power

Describing the patterns of genetic structure and mating system variation in presents challenges from the outset of sample collection to data analysis (see this post and this post). At the beginning of February, I had the pleasure to collaborate with Sean Hoban at NIMBioS (more about that in a later post). He has developed user-friendly software, such as SPOTG (Hoban et al. 2013c), and advocated the use of simulations and the quantification of the relationship between power and sampling strategy in molecular ecology studies.



While at NIMBioS, I took the opportunity to ask Sean a few questions.

For your PhD, you worked on the environmental influences of genetic diversity patterns, at local and large scales. How did this lead to your interest in the utility of simulations in molecular ecology?
During my PhD, I read a lot of molecular ecology and conservation genetics papers that struggled to come to strong conclusions about data- vague interpretations about FST, for example. Occasionally in my reading, though, I found that there were stronger ways of making conclusions, including simulations, model-based methods, and causal inference. I then wanted to use these approaches to make better conclusions about the ecological mechanisms behind my data.
Describe your work on developing tools, methods or software for the population genetics community.
While the core of my research interests are in evolutionary ecology, my research program also involves tool development, for several reasons. I feel that new methodologies and computational tools play a major role in rapidly advancing scientific knowledge by enabling new types of research or allowing new questions to be answered. In addition, new methods are needed to handle new data types and quantities. Lastly, developing a new software or approach that many people use is personally very motivating- one’s efforts are amplified through the work of many other scientists. During my first postdoc, I worked with two renowned theoretical population geneticists (Oscar Gaggiotti and Giorgio Bertorelle) to develop a software to help people choose a sampling strategy before implementing a genetic study- we all want to get the study design correct in order to avoid wasting time and resources, or possibly obtaining vague results (more details on that work in a minute). I was really excited for this because during my PhD, I had to choose a number of microsatellites and number of individuals to include in my studies, but had little evidence on which to make this decision beyond broad guidelines. I understood the need for a tool that gave quantitative recommendations! Importantly, as we demonstrate with our software, a sampling strategy should be customized to the species, situation and goal of a study. We have since used our tool in a number of other studies (Hoban et al. 2013b, 2014) about sampling (here and here), which highlights a final motivation for focusing on tool development- I use the new methods in my own work!
Is this what motivated your review on the use of simulations in molecular ecology (Hoban 2014)?
I first became interested in simulations for inference- for helping to interpret patterns of molecular genetic data and make conclusions about the underlying process. I can even tell you what paper first sparked this interest- it was about bullfrog invasions in Europe (Ficetola et al. 2008), and I brought it to Jason McLachlan’s lab meeting, hot off the press. Another one I liked was the use of simulations to quantify the error rates of Bayesian clustering software (Vaha & Primmer 2006), something that is still not frequently done, unfortunately! Anyway, in the time since, I’ve explored and discovered more and more uses for simulations- for prediction, planning studies, theoretical exploration, evaluating methods. Simulations are common for validating method performance (Girod et al. 2011) and for theoretical work (Schiffers et al. 2013), but are not yet widely used by the bulk of molecular ecologists. The goal of my 2014 review was to provide an overview of all these uses, and to hopefully encourage more integration of simulations into multiple aspects of a given study (see Figure 1 of Hoban 2014). I hope that, someday, simulations might be a normal and expected part of most studies published in Molecular Ecology. We are already seeing this with the explosion of Approximate Bayesian Computation, and with increasingly common forecasts of a species’ genetic viability.
Describe ConGRESS and your role while a post-doc in France and Italy.
ConGRESS stands for Conservation Genetic Resources for Effective Species Survival. At its core is a consortium of conservation geneticists who were funded from 2010-2013 to develop a website, a set of user-friendly tools, and a series of training and consultation workshops around the EU. The goal was to increase the understanding and application of genetic tools to applied conservation action. We worked with managers and policy makers throughout the project to reach this goal (Hoban et al. 2013a). ConGRESS can be found at where we have a growing community, a forum, and diverse tools. Our ConGRESS community also formed the nucleus of the newly formed IUCN Conservation Genetics Specialist Group, and continues to work on advancing methodologies and strategies for monitoring genetic change (Hoban et al. 2014). Many of us recently participated in the 1st annual Conservation Genetics meeting in Zurich (next year’s meeting to be held in Goettingen, Germany), a student-friendly intimate meeting modeled after the Population Genetics Group meetings.
SPOTG, the Sampling planning optimization tool for conservation and population genetics, facilitates the evaluation of statistical power. Describe the software details.
The goal is to estimate the power that an investigator might expect from applying a given sampling strategy (number and type of molecular marker, number of samples) to a given population genetic aim (estimate FST, perform exact test, perform assignment, detect bottleneck signature, describe a decline using ancient DNA). The user proposes a set of feasible sampling strategies and some bounds on potential population size and other parameters. The software is a series of java scripted pipelines that control other programs and do summary analysis. It uses SimCoal2 (Laval & Excoffier 2004) to simulate a series of datasets that, for example, underwent a bottleneck. The software then uses arlecore (Excoffier & Lischer 2010) to test how frequently a given sampling strategy could successfully describe that event. In the end, the software generates a graph of the different sampling strategies and their power (Hoban et al. 2013c). This is useful to a person planning a study, but also to someone justifying or evaluating a study (i.e. a grant proposal).
But, simulations can be used for addressing other questions about a species’ ecology and evolution.
Simulations are indeed useful for understanding aspects of species basic biology and history. Some of my favorite uses of simulations include making inference about species’ reproductive biology (such as number of mates), determining the appropriate number of individuals to re-introduce depending on mating strategy (Brekke et al. 2011), inferring divergence times (Marino et al. 2013), and more.
Your sampling interests have moved toward other aspects of population genetics and sampling strategies, such as seed banks and agricultural collections.
Right now I’m working on optimizing sampling protocols for seed banks and other ex situ collections (Hoban & Schlarbaum 2014; Hoban et al. 2015). Similar to the sampling problem in population genetics, it is all too common to use general rules of thumb or broad guidelines for designing a sampling strategy. This is a bit remarkable considering that a species’ basic biology, such as dispersal ability, is known to influence distribution of genetic diversity on a landscape and thus might influence sampling effectiveness. Many people have discussed this in the past (CPC 1991; Marshall & Brown 1999), but not yet quantitatively determined how much more to sample for species A than species B. I aim to develop sampling strategies tailored to species’ biology, historical fluctuations, adaptive loci, and the goal of the collection. We recently published our first ‘foray’ into these research topic, which demonstrated the capability of simulations for this task.
You mentioned to me that you remain interested in your PhD topic, the genetics of the tree butternut.
As many of us do, I remain invested in my PhD study species. Butternut (a forest tree also used for nut production) is unfortunately rapidly fading due to an introduced disease. I’m interested in several topics in butternut, including sampling the diverse wild populations for ex situ conservation, determining the phylogeographic history and identifying genetically distinct areas, and evaluating varieties used in cultivation. I recently worked with colleagues at Notre Dame and the Chicago Botanic Garden to use genetic data to show that butternut was isolated in two refuges during the last glacial period; the signature of such isolation persists today (Laricchia et al., submitted).
Anything else you’d like to mention?
Yes, I’d like to put in plug for NIMBioS! Anyone who is interested in modeling, computation, large datasets, meta-analyses, synthesis, and/or mathematical biology, check us out. We host short visits (proposals accepted anytime), postdocs, and a variety of working groups and workshops (next proposal deadline March 1).
Of course, if anyone is interested in my work, I can be found at:
Sean Hoban-
Sean’s Twitter-
Brekke P, Bennett PM, Santure AW, Ewen JG (2011) High genetic diversity in the remnant island population of hihi and the genetic consequences of re-introduction. Molecular Ecology, 20, 29–45.
CPC (1991) Genetic Sampling Guidelines for Conservation Collections of Rare Plants. In: Genetics and conservation of rare plants (eds Falk D, Holsinger KE), pp. 225–238. Oxford University Press, New York.
Excoffier L, Lischer H (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources, 10, 564–567.
Ficetola GF, Bonin A, Miaud C (2008) Population genetics reveals origin and number of founders in a biological invasion. Molecular Ecology, 17, 773–782.
Girod C, Vitalis R, Leblois R, Freville H (2011) Inferring Population Decline and Expansion From Microsatellite Data: A Simulation-Based Evaluation of the Msvar Method. Genetics, 188, 165–179.
Hoban SM, Arntzen JW, Bertorelle G et al. (2013a) Conservation Genetic Resources for Effective Species Survival (ConGRESS): Bridging the divide between conservation research and practice. Journal for Nature Conservation, 21, 433–437.
Hoban SM, Gaggiotti OE, Bertorelle G (2013b) The number of markers and samples needed for detecting bottlenecks under realistic scenarios, with and without recovery: a simulation-based study. Molecular Ecology, 22, 3444–3450.
Hoban SM, Gaggiotti OE, Bertorelle G (2013c) Sample Planning Optimization Tool for conservation and population Genetics (SPOTG): a software for choosing the appropriate number of markers and samples. Methods in Ecology and Evolution, 4, 299–303.
Hoban S, Arntzen JA, Bruford MW et al. (2014) Comparative evaluation of potential indicators and temporal sampling protocols for monitoring genetic erosion. Evolutionary Applications, 7, 984-998.
Hoban S, Schlarbaum S (2014) Optimal sampling of seeds from plant populations for ex-situ conservation of genetic biodiversity, considering realistic population structure. Biological Conservation, 177, 90–99.
Hoban SM, Strand A, Fraga N, Richards C, Schlarbaum S (2015) Developing quantitative seed sampling protocols using simulations: A Reply to Comments from Guja et al and Guerrant et al. Biological Conservation, in press.
Laval G, Excoffier L (2004) SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history. Bioinformatics, 20, 2485–7.
Marino IAM, Benazzo A, Agostini C et al. (2013) Evidence for past and present hybridization in three Antarctic icefish species provides new perspectives on an evolutionary radiation. Molecular Ecology, 22, 5148–5161.
Marshall DR, Brown AHD (1999) Sampling wild legume populations. In: Genetic Resources of Mediterranean Pasture and Forage Legumes (eds BennettSarita, Cocks PS), pp. 78–89. Kluwer Academic Publishers, Dordrecht, The Netherlands.
Schiffers K, Bourne EC, Lavergne S, Thuiller W, Travis JMJ (2013) Limited evolutionary rescue of locally adapted populations facing climate change. Philosophical transactions of the Royal Society of London: B, Biological sciences, 368, 20120083.
Vaha J-P, Primmer CR (2006) Efficiency of model-based Bayesian methods for detecting hybrid individuals under different hybridization scenarios and with different numbers of loci. Molecular Ecology, 15, 63–72.

This entry was posted in conservation, evolution, genomics, interview, methods, population genetics, software, Uncategorized. Bookmark the permalink.