A Primer on the Great BAMM Controversy

Update, 26 August 2016, 2:30PM. A number of readers brought my attention to a series of blog posts by Moore et al. responding to Rabosky’s rebuttal of their published critique of BAMM. I’ve included links to the posts and summarized their contents below. 

A fundamental problem in evolutionary biology is detecting patterns of variation in rates of lineage diversification and working to understand their causes. One recent statistical method to detect if and where diversification rates have changed across the branches of a phylogeny is known as BAMM, an acronym for Bayesian Analysis of Macroevolutionary Mixtures. Over the course of its short life time, BAMM has proven extraordinarily popular — Google Scholar shows 182 citations for Rabosky’s 2014 paper describing the method alone. BAMM works by using a Bayesian statistical framework and MCMC implementation to identify the number and location of diversification-rate shifts across the branches of a tree and the associated diversification-rate parameters (speciation, extinction, and time dependence) on each branch. In doing so, it provides a number of improvements over earlier software: it is based on an explicit model of of how diversification rates shift, it features a complex and realistic model of branching, and it quantifies statistical uncertainty (rather than only providing fixed point estimates of parameters).

bamm

However, beginning with a heavily-attended talk at this year’s Evolution Meetings in Austin, TX, Brian Moore and colleagues have raised a number of important questions about the performance and reliability of BAMM, begining a dialogue with Rabosky and the other BAMM developers. What follows is a slimmed-down (although admittedly still complex!) summary of both sides’ major points, drawing on Moore et al.’s recent PNAS paper and rebuttals posted in the BAMM documentation.

Moore et al.’s critique

In their 2016 paper critiquing the program, Moore et al. (hereafter MEA) claim BAMM currently has three major theoretical problems, and that their simulations to demonstrate these problems render the program as a whole unreliable, invalidating its hypothesis testing procedure.

The first of these problems relates to the likelihood function implemented in BAMM, which is an extension of Maddison et al.’s (2007) binary state speciation and extinction (BiSSE) model. The original formulation of BiSSE describes the evolution of a binary trait where the rate of lineage diversification depends on the current state of that trait. Because there is no analytical solution for computing the probabilities of an observed phylogeny under this model, BiSSE implements a numerical algorithm that travels from the tips to the root of the tree, enumerating the probabilities of all possible scenarios — speciation, extinction, or character state change — that could occur in a fixed interval (delta T). BAMM extends this model in a number of ways, the most important of which probably being that while BiSSE has only two state-specific branching processes, BAMM allows a “countably infinite” number, which cannot therefore be enumerated through a numerical algorithim. Instead, BAMM samples the locations of diversification rate shifts from a prior distribution, which are then mapped onto the study tree. And this, MEA contend, is a big problem: BAMM is necessarily ignoring scenarios in which diversification rate shifts occur on extinct (unobserved) branches, therefore biasing extinction probabilities (Figure 1). (In the supplemental information to their paper, MEA provide an equation as an improvement to the likehood function in BAMM.)

Figure 1 from Moore et al. 2016, highlighting the unobserved rate shift scenario that potentially biases extinction probabilities in BAMM

Figure 1 from Moore et al. 2016, highlighting the unobserved rate shift scenario that potentially biases extinction probabilities in BAMM

The second of these problems relates to the above-mentioned prior model for diversification rate shifts, which describes the number, k, and location, Xi, of events. BAMM uses something known as a “compound Poisson process (CPP) relaxed molecular clock model,” where the waiting time between rate shifts is exponentially distributed and the locations of rate shifts are uniformly distributed across the tree. Using a CPP model can be problematic because it is statistically “nonidentifiable,” or at least “weakly identifiable.” What this means is that CPP might explain patterns of substitution-rate variation across branches equally well by either specifying relatively frequent rate shifts of small magnitude, or by specifying less frequent rate shifts of greater magnitude. Because of this, there are a theoretically infinite number of parameterizations in which the data have equivalent likelihood under CPP, with posterior estimates potentially very sensitive to the user’s choice of prior. In his original paper, Rabosky acknowledged this weakness, but after running simulations of trees diversifying at a constant rate (e.g., with no diversification rate shifts) and analyzing those trees in BAMM using a set of various priors, found the program didn’t find erroneous strong support for the rate values inputted as priors. Therefore, they concluding its estimates were robust in spite of this theoretical concern. On the contrary, MEA claim this is an erroneous interpretation of the simulation results, because it was reliant on a summary statistic called the maximum a posteriori estimate, or MAP. Reinterpeting Rabosky et al’s data, MEA found that since MAP is the mode of the posterior distribution, it will indicate zero diversification rate shifts even though the posterior distribution is actually heavily influenced by the prior chosen (Figure 4).

Figure 4 from Moore et al. 2016, showing the prior and posterior probability distributions of rate shift events on top for different prior means, with frequency and posterior mode value (MAP) on the bottom.

Figure 4 from Moore et al. 2016, showing the prior and posterior probability distributions of rate shift events on top for different prior means, with frequency and posterior mode value (MAP) on the bottom.

The third and final problem also relates to the CPP model, and how it treats substitution rate shifts and diversification rate shifts. Recall that under this model, events occur with a uniform probability over the tree length, with the number of events following a Poisson distribution. MEA hold that while CPP is a valid prior for substitution-rate shifts, which occur across the branches of the study tree, it cannot accurately describe diversification rate shifts, because these shifts are neither uniformly distributed nor does their number follow a poisson distribution. (They back this up with a simulation of a pure-birth process with two speciation rates.) Therefore, CPP is “statistical incoherent,” and cannot accurately describe the processes under study (Figure 5).

Figure 5 from Moore et al. 2016, illustrating that a pure birth process with two speciation rates does not follow a Poisson distribution (the upper frame) or a uniform distribution except in one case (the lower frame).

Figure 5 from Moore et al. 2016, illustrating that a pure birth process with two speciation rates does not follow a Poisson distribution (the upper frame) or a uniform distribution except in one case (the lower frame).

After describing these theoretical issues, MEA then simulated trees under both constant and variable birth-death processes, finding support for their claim that “…BAMM can reliably estimate the diversification-rate parameters when diversification rates are constant but cannot accurately estimate parameters when rates of speciation and extinction vary.” For users of BAMM, this is a damning prognosis. But is it substantiated?

Rabosky et al.’s preliminary response

While Rabosky et al. (REA) have yet to pen a formal response to Moore et al.’s claims in the published literature, the team has responded to aspects of the 2016 paper in the program’s documentation. Rather than offer a point by point rebuttal, REA have so far focused on what they claim are errors in MEA’s theory and the team’s implementation of the program in their simulations.

First, REA addressed MEA’s claim that BAMM is overly sensitive to choice of prior for diversification rate shifts (the CPP nonidentifiability issue.) They claim that MEA’s results cannot be replicated using BAMM version 2.5 and later — and that in more recent versions, even cranking up the diversification rate shift prior 100x than Moore et al. did in their simulation results in a negligible influence on the posterior distribution. Are result from BAMM version 2.4 and earlier invalid, then, and if so, what explains the discrepancy? REA point to an earlier (now fixed) error known as the “event rate bug.” But after plotting analytical prior probabilities against the corresponding simulated probabilities using the simulation algorithm containing the event rate bug, it appears that the influence of this bug was probably minor in most cases (Figure A).

Plot from REA response showing the bug responsible for some of Moore et al.'s (2016) results has limited influence on probabilities.

Figure A. A plot from REA response showing the bug responsible for some of Moore et al.’s (2016) results has limited influence on probabilities.

Second, REA claim that MEA’s empirical results (e.g., those from their simulations to test the influence of the theoretical problems they raise) cannot be replicated with the standard useage of the program — even in versions early than 2.5, as addressed above. REA contend that MEA added a hidden and undocumented setting to their BAMM control files, modified its default value, and failed to acknowledge this in their manuscript. In tweaking this value, a developer-only option that affects how extinction probabilities are calculated, MEA reactivated a bug they had previously pointed out to the developers. REA then run simulations with and without this setting to demonstrate MEA’s empirical results and conclusions are reliant on its modification, and that the default value in version 2.5 and later results in much less biased extinction probabilities (Figure B). (In a separate discussion, REA provide a list of reasons for why they believe that activating this hidden setting was not scientifically justified.)

Figure B. A reanalysis of some Moore et al.'s (2016) empirical results with and without the hidden developer setting activated.

Figure B. A reanalysis of some Moore et al.’s (2016) empirical results with and without the hidden developer setting activated.

Third, REA claim that MEA’s suggested improvement to the purportedly-flawed likelihood theorem (included in their 2016 paper’s supporting information) contains a mathematical error  that means the resulting probabilities violate an axiom of probability theory — they are unbounded, potentially exceeding zero to approach infinity. The team suggests this is because MEA’s likelihood theorem cannot handle a tree containing rate shifts. This is the result of a fundamental difference in the two algorithms: while BAMM interprets the probability of the extinction of a lineage as conditional on the rate shifts drawn from the prior and placed on the tree, MEA’s suggested improvement does not depend on mapped rate shifts.

Additional responses from MEA (added 082616 230pm MT): 

Following the initial publication of this summary, several readers directed me to a series of posts from MEA on the treethinkers.org blog, responding in turn to many points in REA’s rebuttal. These provide additional context for why MEA believe BAMM is extremely sensitive to prior selection, how this sensitivity affects extinction probabilities, and a response to REA’s claim that their finding of unreliable parameter estimates was not replicable, and are worth reading in full.

Specificially, MEA counter Rabosky’s claim that they used an undocumented developer setting in BAMM by agreeing that while it was indeed undocumented, it was easily modified in the BAMM input file, not through the source code as had been suggested. They argue that this modification was necessary to properly evaluate BAMM under a best-case scenario, as the default setting was invalid, and thus their choice is scientifically justified: “…the extinction probability for a lineage arising right before a node certainly does not depend in any sensible way on the product of extinction probabilities of two lineages that arise immediately after that node.” In REA’s reanalysis of their empirical results using the default option, MEA claim, prior sensitivity is concealed but not eliminated. (An addendum comment from Brian Moore discusses this choice further.)

MEA also say that that they are at “a loss to explain [why Rabosky couldn’t replicate their results], as we simply followed the summary procedure that he used in his study.” They argue that REA’s failure to replicate their empirical results in his rebuttal is an artifact of using a different summary procedure — mean squared error instead of proportional error — which has the effect of shrinking apparent error through taking the square of the error of speciation and extinction rates, both necessarily small.

Call for comments

My own work rarely deals with macroevolutionary models, and I’m more of a hack than a statistician. But given the interest both BAMM and its purported shortcomings have generated in the community, it seems useful to summarize the current state of the debate in one spot — and have a forum to discuss it. We’d particularly welcome comments from those of you who use BAMM on a regular basis, and might have poked around to test some of the issues raised above — e.g., input from impartial users of this important program. As always, let’s keep it civil — lots of work has gone into both developing BAMM and exploring its performance.

References:

Rabosky, D. 2014. Automatic Detection of Key Innovations, Rate Shifts, and Diversity-Dependence on Phylogenetic Trees. PLoS ONE. DOI: 10.1371/journal.pone.0089543

Moore, B.R., Höhna, S., May, M.R., Rannala, B., Huelsenbeck, J.P. 2016. Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures. PNAS. DOI: 10.1073/pnas.1518659113

Maddison, W.P., Midford, P., Otto, S.P. 2007. Estimating a Binary Character’s Effect on Speciation and Extinction. Systematic Biology. DOI: 10.1080/10635150701607033

About Ethan Linck

I'm a Ph.D. Candidate at the Department of Biology and the Burke Museum of Natural History, University of Washington, Seattle. My research uses museum specimens and genomic data to analyze and archive avian biodiversity and evolution, particularly in western North America and Melanesia.
This entry was posted in evolution, phylogenetics, software, speciation and tagged , , , , , , . Bookmark the permalink.
  • Thanks for this post, Ethan. I think it’s a great idea to have a summary about this issue. I’d suggest being cautious in this discussion (the summary is good, this is more a thought for people commenting below): it’s an issue that several of the involved parties clearly care about (and with good reason), and it’s the sort of thing where folks could tend to get upset. I know many of the people involved, and it’s not like if you say “BAMM is fine” or “BAMM is terrible” someone will be angry at you for the rest of your career, but discussions can tend to get heated in the moment, especially on the internet. For example, some folks online have publicly guessed at the motivations of some of the participants, but I’m planning to avoid speculating about that — for me, it’s not something that explains potential issues with the method or possible solutions, and it could tend to cause unnecessary hurt.

    • elinck

      Thank you, Brian — I tried to make of similar note of caution, but you put it better than I did. I hope any discussion can focus on the issues with the method and potential solutions themselves, not motivations and the understandably heated nature of the debate.

  • rvosa

    This reads like a fair summary of the issues. Totally agree with Brian on the civility issue: the underlying theory is hard, and writing good software is difficult – so we’re all best served by simply trying to understand what’s going on here.

  • One bit of vocab that might help clarify things is distinguishing what “BAMM” means:

    BAMM paper: Rabosky 2014 — the method as described and implemented there.
    BAMM <2.5: The various versions of BAMM before the Moore et al paper and their bug report were submitted.
    BAMM 2.5 vanilla: The current version of BAMM, with options set as typical users do.
    BAMM 2.5 random-setting: The current version of BAMM but with the non-default combineExtinctionAtNodes = random setting used by Moore et al.

    In the flurry of "BAMM works!" and "BAMM fails!" discussion it is important, since the software and method have changed, to indicate exactly what was meant. For example, for people deciding what to run for a new analysis, what matters most is whether BAMM 2.5 works (and what setting to use for combineExtinctionAtNodes). For people worried about their results from published analyses, what matters is probably BAMM <2.5. There are potential issues that cut across all versions, but others that are version-specific.

  • Pingback: An Update on the Great BAMM Controversy |()