# A statement on p-values that approaches significance*

Point-oh-five. It’s a pretty polarizing number. Sitting on either side of it could mean the difference between a [insert your favorite journal here] paper and an unpublished paper. But why do some researchers, reviewers, and journal editors put so much weight on this highly influential number? Particularly when it is so often misinterpreted?

There have been numerous articles written about p-value pitfalls**, and one journal even went so far as to ban the p-value from all future articles (something I disagree with and wrote about last year on TME). Clearly someone needed to come up with an interpretable definition in order to answer the question: “what exactly is a p-value?”. And since we’re talking about statistics, who better to do this than the American Statistical Association? Well, we’re all in luck, because they just released a statement on p-values.

“Over the course of many months, group [ASA] members … began to find points of agreement. That turned out to be relatively easy to do, but it was just as easy to find points of intense disagreement.”

[Dun dun dunnnnnnnn!]

But before you get too excited about some ground-breaking policy on or definition of the p-value, note these important words from the authors: “Let’s be clear. Nothing in the ASA statement is new.”

What is a p-value?

First, an “informal definition” from the statement that is intended for “researchers, practitioners and science writers who are not primarily statisticians”:

a p-value is the probability under a specified statistical model that a statistical summary of the data…would be equal to or more extreme than its observed value.

What can I do with or say about my p-value?

The ASA group then describes 6 principles — the “dos and don’ts” of p-values:

1. P-values can indicate how incompatible the data are with a specified statistical model.

This can be boiled down to “what’s your null?” The p-value is an indication of how incompatible your data are with your null hypothesis. Lower p-values == greater incompatibility with your pre-defined null.

2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

See point #1 — that’s what a p-value can tell you, not this.

3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

Just because a study results in a finding with a p-value < 0.05 does not mean that your pet hypothesis is now true (again, see point #1). Scientific findings can rarely, if ever, be described as “yes” or “no” (although manuscript writing would be much easier if this were true). Taking findings with p < 0.05 as gospel can lead to incorrect conclusions and poor policy decisions.

4. Proper inference requires full reporting and transparency.

Don’t “p-hack”.

5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

P-values are not “effect sizes” and can be really low even if you have a tiny effect. Tiny effects, estimated with large sample sizes and very small error, can result in miniscule p-values. But as biologists, we need to think critically think about what these small (but significant) effects might mean for the biology of an organism.***

6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

Like many things, p-values taken out of context are uninterpretable and as useless as a screen door on a submarine.

Fortunately, there are a couple of things that you can do:

1. Don’t throw the baby out with the bath water: p-values are useful when you have the right null, run the appropriate test, and properly interpret them (i.e., not in a vacuum).
2. Skin the cat multiple ways: Use other approaches (e.g., confidence/credibility intervals, Bayesian methods, or false discovery rates) to provide support for and/or complement your p-values.

Again, it’s nothing new.

Footnotes
*Given the current definitions and my own research (read: thought experiment) on what could be “equal to or more extreme” than this definition…