Numerous methods have been developed over the last few years for the detection of selective sweeps (hard and soft – see my previous post). This week, we look at three new studies that (a) compare existing methods to detect sweeps (Vatsiou et al. 2015), (b) develop a new method to detect hard-sweeps (Pybus et al. 2015), and (c) develop the theory behind detecting soft-sweeps under a unique mutation sweeping in response to environmental perturbations (Berg and Coop 2015).
Vatsiou et al. 2015 – Comparison of methods
Vatsiou et al. (2015) in this study compare the performance of seven recent methods to detect selective sweeps from genomic data – broadly using “genome-scans” of differentiation, quantifying genetic variation along a chromosome within a population, or using physical linkage maps around selected SNP’s to study lengths of homozygous haplotypes (also called IBD segments), or on multilocus differentiation. The authors simulate data under three different models of population evolution (island, stepping-stone, and hierarchical island) models under hard and soft sweeps, with sweeps commencing at migration-mutation-drift equilibrium frequencies. By comparison of estimates of selection, and FDR’s on SNP’s across windows, the study reports (1) low power and high FDR while using EHHST and XP-EHHST, (2) strongly detrimental effect of increased migration on performance of all methods, (3) strong effect of initial allele frequency on power of all methods to detect soft sweeps.
As we have shown, no single method is able to detect both starting and nearly completed selective sweeps. Combining several methods (e.g. XPCLR or hapFLK with his or nSL) can greatly increase power to detect a wide range of selection signatures.”
Pybus et al. 2015 – Hierarchical boosting to detect hard sweeps
Pybus et al. (2015) develop a new method for the detection of hard-sweeps by training the model with different evolutionary scenarios resulting in final allele frequencies of the selected allele, and with the age of the sweep. Using a new “hierarchical boosting” (HB) algorithm, their method classifies the genome into different evolutionary scenarios (eg. complete versus incomplete sweeps). By analysis of SNP’s in the 1000 Genomes Project data, coalescent simulations of different selective scenarios by varying the times of sweeps, and final allele frequencies, Pybus et al. (2015) compare the performance of the HB algorithm against nine popularly utilized methods for detection of sweeps (including several methods used by Vatsiou et al. 2015 above). They report (1) highest sensitivity of the HB algorithm among all methods considered to detect complete hard sweeps, and (2) lower sensitivity in detecting incomplete sweeps using both simulated and real data.
This study offers a unique and powerful way of detecting candidate regions in the genome that have been evolving under positive selection in a more reliable way than many lists produced by single selection tests or even some other existing composite methods. It also distinguishes, in many cases, the final state (complete/incomplete) and the relative age (ancient/recent) of a given selective event.
Berg and Coop 2015 – Analytical formulae for polymorphism under soft-sweeps
Soft sweeps – i.e. adaptive, positive selection on standing allelic variation can plausibly be characterized by two processes; positive selection on multiple independent mutations at a locus, and associated hitchhiking of neutral variants, versus a single unique mutation that segregates neutrally as standing variation until perturbed by environmental change, and is thence swept. Berg and Coop build the theory to study the signatures of soft-sweeps under the latter model, particularly its effects on polymorphism after the sweep. By modeling the probabilities of escaping the sweep by recombination (during the ‘sweep’ phase), and that of coalescence in the ‘standing’ phase, the authors derive analytical expressions for the (a) reduction in diversity, (b) number of segregating sites, and (c) frequency spectra under the soft-sweep model, explored via simulations.
Unfortunately, our work largely confirms the intuition and existing results indicating that standing sweeps are likely to be rather difficult to identify, and characterize, on the basis of genetic data from a single population time-point, and when they can be identified, they may be difficult to distinguish from classic hard sweeps.
Vatsiou, Alexandra I., Eric Bazin, and Oscar E. Gaggiotti. “Detection of selective sweeps in structured populations: a comparison of recent methods.”Molecular ecology (2015). DOI: http://dx.doi.org/10.1111/mec.13360
Pybus, Marc, et al. “Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations.”Bioinformatics (2015): btv493. DOI: http://dx.doi.org/ 10.1093/bioinformatics/btv493
Berg, Jeremy J., and Graham Coop. “A Coalescent Model for a Sweep of a Unique Standing Variant.” Genetics (2015): genetics-115. DOI: http://dx.doi.org/ 10.1534/genetics.115.178962