Molecular dating is a key tool in deciphering the history of life. In a recent Molecular Biology and Evolution paper, Sudhir Kumar and Blair Hedges have reviewed the state of the subject, summarizing the philosophical and methodological history of this often-integrative endeavor. In particular, their binning of this history into 4 sequential “generations” provides a convenient way of thinking about the evolution of this science, which I paraphrase below.
Generation 1: Assume a strict molecular clock. Utilize all the data.
Generation 2: Don’t assume a strict molecular clock. Utilize only those data that pass tests of clocklike behavior.
Generation 3: Utilize all the data, and allow the molecular clock(s) to vary in rate across phylogeny according to some prior model.
Generation 4: Utilize all the data and estimate relative clock rates but without the need to model rate variation or speciation/extinction.
Generation 3 remains, at present, the dominant paradigm within which many phylogenetic, phylogeographic and epidemiological studies operate. Therefore, the temporal distinction between Generations 3 and 4 as asserted by these authors is somewhat tenuous; instead, these may be better thought of as alternative modern dating philosophies, with different advantages depending on the dataset in question.
Make no mistake, the Generation 3 framework has some well-known caveats. For example, as in any Bayesian analysis, choice of prior values (in this context, on node ages or speciation rate) in a dating analysis is complicated, controversial, and can have huge (YUGE?) influence on posterior estimates. And while prior knowledge can now be modeled using a variety of age distributions and with the use of soft bounds, etc., friction persists at the basic interface of prior information (e.g., fossil ages) and prior specification.
These well-known issues have been spurring new methods aimed at more holistic incorporation of fossil and extant taxa into time tree generation. But, as Kumar and Hedges point out, Generation 4 methods have a number of advantages that will be useful for dealing with genomic data efficiently. Specifically, they scale well with large numbers of characters or taxa, and they are free of the need for specification of clock models.
A particularly interesting result from the Kumar and Hedges paper is that, for a large number of studies compiled, dating results using Generation 2 and Generation 3 methods are highly consistent (the regression coefficient is nearly 1). This is despite the disparate philosophies and statistical implementations that underlie them. In the future, it will be interesting to see if a similar consistency can be found between the latter and Generation 4 methods.
Kumar, S., Hedges, S.B., 2016. Advances in time estimation methods for molecular data. Mol. Biol. Evol. 33, msw026. doi:10.1093/molbev/msw026