The fickleness of P?

Halsey and colleagues (2015) raise an important issue regarding a certain letter with which we all are familiar:

© flickr

© flickr


They describe the sample-to-sample variability in the value as a major cause of lack of repeatability that is not generally considered. They explain

why P is fickle to discourage the ill-informed practice of interpreting analyses based predominantly on this statistic.

In their estimation, the omission of this variability reflects a general lack of awareness.
The statistical power of a test dramatically affects the capacity with which we can interpret a P value and as a consequence the result of the test.
I’ve been thinking more about power, with specific regard to molecular ecology and accurate sampling of organisms with complex life cycles (also see my interview with Sean Hoban and some of his work, also highlighted here).
The authors provide some background on the misunderstandings about P:

If statistical power is limited, regardless of whether the P value returned from a statistical test is low or high, a repeat of the same experiment will likely result in a substantially different P value and thus suggest a very different level of evidence against the null hypothesis.

To demonstrate this, they take samples drawn from two normally-distributed populations of data in which they knew there was differentiation. They take subsamples and find over replicate experiments (though in practice, we would likely only perform one experiment), that the P values vary quite a bit (see Figure 2, Figure 4)!

Only when the statistical power is at least 90% is a repeat experiment likely to return a similar P value, such that interpretation of P for a single experiment is reliable.

We usually want to find the direction of an effect, as well as its size and also its precision. Halsey et al. (2015) advocate for the increased use of effect size and its 95% CIs.

Discovering that P is flawed will leave many scientists uneasy. As we have demonstrated, however, unless statistical power is very high (and much higher than in most experiments), the P value should be interpreted tentatively at best. Data analysis and interpretation must incorporate the uncertainty embedded in a P value.

References
Halsey LG, Curran-Everett D, Vowler SL, Drummond GB (2015) The fickle value generates irreproducible results. Nature Methods 12, 179–185. doi:10.1038/nmeth.3288 

Posted in Uncategorized | 1 Comment

d(N)eutralist < d(S)electionist Part 4

Continuing our discussion of the neutralist-selectionist debate, recent findings by Schrider et al. (2015) bring us to the topic of selective sweeps, and their genomic signatures in a population. As we have discussed in previous posts, numerous studies (since the proposal of the neutral theory – Kimura 1968) have shown evidentially, the fixation of beneficial mutations due to positive selection, and their roles in adaptive evolution. While there are several proposed mechanisms driving positively selected alleles to fixation (see my previous post here for some thoughts on the effects of recombination in adaptive evolution), a very plausible (and increasing in evidence by the day) mechanism is one of selective sweeps, or the quick rise to fixation of a beneficial allele in a population (due to positive selection), and the subsequent depletion of linked neutral diversity around the allele (due to genetic hitchhiking). Classified into hard (initial frequency of the beneficial allele = 1/2N), soft (initial frequency > 1/2N due to presence of the allele near neutrality in the population until some perturbation, often environmental, that sets off the sweep), and partial (or incomplete, wherein the beneficial allele has yet to reach fixation in a population) classes, the detection of sweeps has been used extensively in recent years to describe signatures of selection across the genome.

Reduction in heterozygosity at a hitchhiking neutral locus - from a now classic manuscript by Maynard-Smith and Haigh (1974). Image courtesy: http://dx.doi.org/10.1017/S0016672308009579

Reduction in heterozygosity at a hitchhiking neutral locus – from a now classic manuscript by Maynard-Smith and Haigh (1974). Image courtesy: http://dx.doi.org/10.1017/S0016672308009579


Signatures of selection can be described using several summary statistics, including polymorphism levels, site-specific diversity, haplotype diversity, Tajima’s D, LD-based statistics, etc. Schrider et al. (2015) discuss via simulations, the efficacy of summary statistics in quantifying selective sweeps. In short, all summary statistics rely on (a) the depletion of genomic diversity around a selected site (eg. see Figure 2 from Maynard-Smith and Haigh 1974 above), and (b) haplotypic diversity – recent hard sweeps should produce one “fixed” haplotype around the selected site in high frequencies, versus soft/incomplete sweeps which should result in multiple haplotypes in intermediate frequencies around the selected site. But through the course of recombination between the selected allele, and a neutral allele, a not so recent hard sweep can yet produce multiple haplotypes of intermediate frequencies. Methods to detect sweeps would thus wrongly classify these as soft or partial sweeps, a phenomenon the authors term the “soft shoulder” effect.
To describe this effect, the authors perform coalescent simulations under different scenarios of sweeps, by varying (a) the initial frequency of the sweeping allele, (b) time(s) of sweeps, and (c) the selection coefficients. Analyses of several summary statistics indicate unanimous support for the “soft shoulder effect”, with numerous false positives for the presence of soft/partial sweeps in sites linked to hard sweeping alleles.
The authors thus recommend interpreting studies that perform genome-wide scans for the detection of positively selected sites (and sweeps) with care, and propose several suggestions:

  1. Analysis of flanking regions to detect selection (and sweeps), rather than just analysis of immediately surrounding the selected site.
  2. Applying methods that account for polymorphism, allele frequency, haplotype diversity, and LD based statistics,
  3. accounting for gene conversion rates,
  4. and importantly, checking for evidence of a nearby hard sweep, whenever a soft/partial sweep is found, to rule out the “shoulder effect”.

Reference:
Schrider, Daniel R., et al. “Soft shoulders ahead: spurious signatures of soft and partial selective sweeps result from linked hard sweeps.” Genetics (2015): genetics-115. http://dx.doi.org/10.1534/genetics.115.174912
Maynard Smith, J., and J. Haigh, 1974 The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23-35.
Kimura, Motoo. “Evolutionary rate at the molecular level.” Nature 217.5129 (1968): 624-626.

Posted in adaptation, evolution, mutation, population genetics, selection, theory | Tagged , , | 1 Comment

Live fast and reproduce young


Here is one for the “simple, elegant science” folder: a new paper in PNAS by Julia Schroeder and colleagues that demonstrates a fitness disadvantage in offspring from older parents. While there a multitude of papers out there showing that gametes have reduced quality as an organism ages, this new work is the first to demonstrate this phenomenon in a natural system.
Schroeder et al show that a parent’s age has no effect on the longevity of their offspring, but the offspring of older parents have lower reproductive success over their lifetime. In addition, these effects are sex-specific: older males negatively affected their sons and older females negatively affected their daughters. To ensure that these effects weren’t primarily caused by environment, some of the offspring were moved to different parents before hatching out of their eggs.

Our results challenge the currently favored hypothesis in evolutionary biology and behavioral ecology that old age signals high quality in mating partners. Our results imply a substantial cost of reproducing with older, rather than younger, partners. The results inform increasing concern about delayed reproduction in medicine, sociology, and conservation biology.

Schroeder J., Nakagawa S., Rees M., Mannarelli M.E. & Burke T. (2015). Reduced fitness in progeny from old parents in a natural population, Proceedings of the National Academy of Sciences, 201422715. DOI: http://dx.doi.org/10.1073/pnas.1422715112

Posted in pedigree, population genetics, societal structure | Tagged , | Leave a comment

How (not) to review papers on inclusive fitness

Hamilton’s Rule


There are few evolutionary concepts as polarizing as Hamilton’s rule. Some researchers feel that there is no mathematical grounding for it, while others beg to differ. Yet empirical evidence in support of Hamilton’s rule is scarce (but check out this recent review).
Peter Nonacs and Miriam Richards’ recent call to arms in TREE suggests that this dearth of support is partially due to two things:

1) To some reviewers, Hamilton’s rule is on par with He Who Shall Not Be Named. (Well, more specifically, reviewers never agree on the correct best way to test it, so nothing gets published). Or, as Nonacs and Richards write:

“Our proposed solution is a simple admonition to reviewers: ‘Reflective, not reflexive critique, please!’”

2) Researchers don’t always assess the costs AND benefits of Hamilton’s rule. Moreover, they don’t publish all of the accompanying data so that reviewers (and readers) can come to their own conclusions.
In sum, we need reviewers to stop acting like their favorite stick-in-the-mud and authors to be transparent in presenting and analyzing their data.

Clearly more work is needed to generate a consensus about the correct way to both calculate inclusive fitness and advance our understanding of the diversity in social evolution. We urge reviewers to be constructive, not obstructive, in this process.

REFERENCES
Nonacs P & Richards MH (2015) How (not) to review papers on inclusive fitness. Trends Ecol. Evol. http://dx.doi.org/10.1016/j.tree.2015.02.007

Posted in adaptation, population genetics, societal structure, theory | Leave a comment

l'oliva di mare: disturbance and genetic diversity

Seagrasses are important ecosystem-engineers of coastal regions around the world. Previous work has demonstrated the correlation of high genotypic diversity with resistance (e.g., Hughes and Stachowicz 2004) and resilience (e.g., Reusch et al. 2005).
In a recently accepted paper in Molecular Ecology, Jahnke, Olsen and Procaccini (2015) performed a meta-analysis of 56 meadows of Posidonia oceanica in which they tested for correlations of disturbance with genetic diversity.

© www.thelivingmed.org

© www.thelivingmed.org


Anthropogenic disturbances are the main threat to seagrass populations, but, P. oceanica  is a long-lived species. Past climate change may generate complex phylogeographic patterns that might result in

particular vulnerabilities under rapidly changing environmental stress.

Moreover, the longevity of species, like P. oceanica, can result in a temporal mismatch. In other words, a meadow may be characterized as healthy, but the allelic diversity may be slowly deteriorating.  
The authors advocate the necessity of placing genetic estimates from a single meadow in the context of a meta-population. The ability to sample at fine-scales and combine these data with connectivity matrices will be the way forward and enable an

understanding [of] the causes behind and evolutionary meaning of genetic diversity metrics for application in conservation management.

References
Hughes AR, Stachowicz JJ (2004) Genetic diversity enhances the resistance of a seagrass ecosystem to disturbance. PNAS, 101, 8998-9002.
Jahnke M, Olsen JL, Procaccini G (2015) A meta-analysis reveals a postive correlation between genetic diversity metrics and environmental status in the long-lived seagrass Posidonia oceanica. dpi: 10.1111/mec.13174
Reusch TBH, Ehlers A, Hämmerli A, Worm B (2005) Ecosystem recovery after climatic extremes enhanced by genotypic diversity. PNAS, 102, 2826-2831.

Posted in adaptation, community ecology, conservation, natural history, plants | Tagged , , , | Leave a comment

F-statistics Manhattan Plots in R

Characterizing differentiation across individual genomes sampled from different populations can be very informative of the demographic processes that resulted in the differentiation in the first place. Manhattan plots have grown to be very popular representations of genome-wide differentiation statistics in recent literature. And what’s better? They’re surprisingly easy to make in R!

In this post, I describe making these plots from scratch – starting with a VCF (Variant Call Format) file, which contains genotype information (and other meta data) across genomic positions.

fst
Plot of genome-wide Fst using the qqman package in R.

As an example, I downloaded the variant calls for Chromosome 22 from the Phase 3 of the 1000 genome project (see link), and estimated Weir and Cockerham estimates of Fst for two populations (GBR – Great Britain, and YRI – Yoruba, a total of 199 individuals out of 2504) using VCFTools. The .weir.fst file produced by VCFTools contains pairwise Fst values for your specified window size. To do this, I used the command:

vcftools –vcf chr22.vcf –keep allindivs –out gbryri –weir-fst-pop gbrindivs –weir-fst-pop yriindivs

where allindivs is a file with individual ID’s of all individuals from the GBR, and YRI populations, and gbrindivs, and yriindivs are files with individual labels from GBR, and YRI respectively. I pulled out unique ID’s for all these individuals from the meta information made available on the same FTP site. Now onto plotting!

While neat Manhattan plots can be created just by using R’s plot(), or qplot() functions, I found Stephen Turner’s “qqman” package to be very handy, and easy to use. Just as an example, I randomly replaced some of the chromosome 22 values from the output file above with chromosome number 1-3. Ideally, when you’re analyzing whole genome/transcriptome VCF’s, this shouldn’t be a problem. I also subset the data to avoid lines with NA values.

Thereon, install the “qqman” package, and plot by:

install.packages(“qqman”)
library(qqman)

fst <- read.table(“fst”, header=TRUE)
fstsubset <- fst[complete.cases(fst),]
SNP <- c(1:(nrow(fstsubset)-1))
mydf <- data.frame(SNP,fstsubset)

manhattan(mydf,chr="CHROM",bp="POS",p="WEIR_AND_COCKERHAM_FST"
,snp="SNP",logp=FALSE,ylab=”Weir and Cockerham Fst”)

And voila! You have your genome wide Fst Manhattan plot! The “qqman” package also has plenty of options for changing the color, displaying chromosomes, etc, which Stephen Turner explains in his blog here. Good luck!

Posted in bioinformatics, genomics, howto, population genetics, R, software | Tagged , , | 4 Comments

Killer genetic differentiation

RAD whales
Like most of you out there, I sometimes get bogged down in literature, and the pressure to keep up with new methods can lead to a towering “to-read” folder. I feel forced to read many of these papers no matter how deep the stacks get due to the desire to keep up with new analyses or techniques.
But sometimes I read a paper just because it captures the basic passion for wildlife that made me interested in biology in the first place. That’s the case with this investigation into the population genomics of killer whales (!!!) by Andre Moura and colleagues in Molecular Ecology.

We test the hypothesis that populations representing sympatric ecotypes (e.g. residents and transients) will show patterns of differentiation that reflect selection at functional loci. More broadly, we investigate the hypothesis that in addition to the process of genetic drift, disruptive selection is driving the differentiation of killer whale ecotypes in sympatry.

Moura and colleagues used the largest set of molecular data for killer whales to test multiple demographic hypotheses and document genetic structure of whale populations across the globe:

Taken together, these data suggest that differentiation in sympatry is based in part on ecological processes, but that differentiation is likely being facilitated by the life history of killer whales, founder events and differentiation by drift.

A solid investigation using great data and analyses. But honestly, I was just in it for the whales.
Moura A.E., Roy Chaudhuri, Margaret A. Hughes, Andreanna J. Welch, Ryan R. Reisinger, P. J. Nico de Bruyn, Marilyn E. Dahlheim, Neil Hall & A. Rus Hoelzel (2014). Population genomics of the killer whale indicates ecotype evolution in sympatry involving both selection and drift, Molecular Ecology, 23 (21) 5179-5192. DOI: http://dx.doi.org/10.1111/mec.12929

Posted in genomics, population genetics | Tagged , | Leave a comment

Panamanian golden frog skin microbiota predict ability to clear deadly infection

Panamanian golden frog (photo from Wikipedia)


The fungal skin infection, Batrachochytrium dendrobatidis (Bd), has pushed many amphibian species to the brink of extinction. One such species, the Panamanian golden frog, is likely extinct in the wild and has been maintained in captive breeding colonies since 2006. Successful reintroduction of this species hinges on the ability of the amphibians to fight off Bd infection.
Continue reading

Posted in Uncategorized | 2 Comments

A love letter to sponges

Aplysina fistularis. Photo courtesy of ryanphotographic.com

Aplysina fistularis. Photo courtesy of ryanphotographic.com


Like many kids interested in marine biology, growing up I wanted to work on sharks. After college I interned for a year at the Center for Shark Research at the Mote Marine Lab under the guidance of two great mentors, Jim Gelsleichter and Michelle Heupel. After my internship I started my Master’s degree with Mahmood Shivji, whose research focuses broadly on conservation genetics of sharks and billfish, but who had recently received funding from the National Coral Reef Institute. Mahmood asked me how felt about working on marine invertebrates instead of sharks and since I’m a go with the flow person (much like a sponge), I said, sure, why not? That was ten years ago and I’ve never looked back.
Sponges (phylum Porifera) have a global distribution and are found in both fresh and saltwater from polar seas to tropical coral reefs. There are over 8,000 valid species with an estimated 4,000 left to be described. Sponges are sessile as adults and disperse through a mobile larvae phase. Some sponge species are hermaphroditic and some have separate sexes. Many species have internal fertilization and brood their larvae to an advanced developmental stage while other species broadcast spawn their gametes and fertilization takes in the water column. Sponges come in a range of shapes, colors, textures, and sizes. In this post, I highlight some of the amazing research conducted on sponges focusing on topics related to molecular ecology, phylogenetics, and evolution, with some other fun facts thrown in.
Continue reading

Posted in Uncategorized | 3 Comments

Haploid-diploidy, a (brief?) history

Haploid-diploid life cycles are not only good exercise for the brain, but they’re also fantastic study systems to investigate a myriad of questions.

Yet, the majority of molecular studies have focused on the diploid-dominated life cycles of animal and plant taxa. In these organisms, the meiotically-produced haploid gametes immediately fuse to form a new diploid individual. In other words, the haploid stages never become functional, independent organisms.

In contrast, seaweeds, mosses, ferns and some fungi, have life cycles in which there is an alternation between separate, free-living individuals that differ in ploidy levels and reproductive modes. Unlike diploid-dominant plants and animals, the haploid stage in a haploid-diploid life cycle becomes an independent, functional organism with somatic development. Mature haploid adults produce female and/or male gametes that fuse to produce new diploid individuals. The mature diploids undergo meiosis, in which spores are produced and develop into new haploid individuals. Thus, each phase is dependent on the other to complete the sexual life cycle.

What impacts do these life cycles have on genetic structure or on mating systems? Is dioecy really a good proxy for outcrossing in these species (e.g., intergametophytic selfing can still occur, Klekowski 1969)? Why are they maintained, when theoretically, selection should eventually favor either diploidy or haploidy, but not both (Mable & Otto 1998).

There are many variations on the haploid-diploid theme found in mosses, ferns, fungi and seaweeds. As mentioned in this post in the context of colonizing species, we also have a very preliminary understanding of mating system variation and genetic structure in these organisms.

As a mini-review, I’ve compiled a brief summary of what we know, with a slight emphasis on seaweeds (unsurprisingly).

Continue reading
Posted in DNA barcoding, domestication, evolution, genomics, haploid-diploid, natural history, population genetics, selection, speciation | Tagged , , , , , | 2 Comments