The Hype Cycle of Ancient DNA

Recently I saw a graph that I’ve learnt is called the Hype Cycle and is a methodology used in assessment of new technologies and their marketing. What strikes me about it is how well it fits my own research field, paleogenetics or the ancient DNA research.

The Hype Cycle is a graphical tool developed by Gartner, an information technology research and advisory company based in Connecticut. The Hype Cycle depicts five phases of evolution of a new technology, concentrating on the relationship between hype and real adoption of the technology.

Phase I: Innovation Trigger

A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest trigger significant publicity. Often no usable products exist and commercial viability is unproven. (© Gartner)

The good old times when anything containing the words “DNA” and “ancient/extinct” got to Nature.

It’s not difficult to identify what was the Innovation Trigger in ancient DNA. The origins of the field trace back to mid-1980s when reports of DNA from quagga, an extinct equid, and Egyptian mummies were published.

While the first study, on a 120-year-old quagga (Higuchi et al. 1984), came from the lab of the prominent evolutionary biologist Allan Wilson, the work on Egyptian mummies (Pääbo 1985) was done by then a PhD student, Svante Pääbo, as a secret side-project. Later, Pääbo went to Wilson’s lab and the collaboration of these two men yielded some outstanding studies, regularly occurring in Nature.

Phase II: Peak of Inflated Expectations

Early publicity produces a number of success stories — often accompanied by scores of failures. Some companies take action; many do not. (© Gartner)

In the following years, the world has seen DNA sequences of the Tasmanian wolf (Thomas et al. 1989), New Zealand moa (Cooper et al. 1992), and the woolly mammoth (Hagelberg et al. 1994) resurrected, but the scientists didn’t restrain to studying animals. Ancient DNA has been also retrieved from plants, e.g. maize (Rollo et al. 1988), and (fanfares) humans (Hagelberg et al. 1989).

Continue reading

Posted in evolution, natural history, Paleogenomics, phylogenetics, population genetics, theory | Tagged , , | 2 Comments

A solution to the N50 filtering problem

This is the fourth in a series of posts where we explain the N50 (Nx) metric, discuss the problems surrounding it (1, 2), give solutions to those problems, and suggest an alternative N50 metric for transcriptome assemblies.

In the two previous posts we described how the N50 metric can be easily manipulated in two common but different ways. The first problem is related to filtering of contigs. This N50 filtering problem can be easily solved if the approximate genome length of the organism is known. In this case, we can compute something called NG50 (where G stands for Genome) instead of N50. This statistic is defined similarly to N50 but instead of reaching 50% of the total assembly length, we would try to reach 50% of the genome length (see example in Fig. 5).

Note that NG50 may be larger than N50 (if the assembly length is larger than the genome length), may be equal (if both lengths are somewhat similar), may be less (if the genome is larger than the assembly), and may even be undefined (if the total assembly length is less than half of the genome length).

Fig. 5. Example assembly of a 500 kbp genome consisting of seven contigs. NG50 = 50 kbp, N50 = 60 kbp.

Continue reading

Posted in genomics | Tagged , , , | Leave a comment

To RADseq or not to RADseq?

In the end, we all want to do the best science we can, on the budget we have.

It’s a cliche to say that we live in a moment of unprecedented possibility for molecular ecology, as high-throughput sequencing methods drive the cost of collecting DNA sequence data ever lower. But at the same time, it’s a tricky moment, because the future — in which population genomic data for any species is within, say, the scope of a standard NSF grant proposal — is still unevenly distributed. For study species with small genomes and established resources like high-quality reference assemblies and deep annotation databases, the future is now. For species with large and complex genomes, or without good “infrastructure” to build on, it can still be challenging to obtain useful population-scale data without spending hundreds of thousands of dollars.

For going on a decade, now, the go-to solution for this problem has been reduced-representation sequencing. Led by RADseq, or restriction site-associated DNA sequencing, these methods solve the problem of genomes that are too big to easily sequence by, as it says on the tin, reducing them. Reduced representation offers us an accessible means to identify parts of the genome are involved in species’ adaptation to different environments and, ultimately, the formation of new species — one of the key questions of evolutionary ecology. So it’s no surprise that RADseq and its relatives have been hugely popular. The method was name-checked in the 2010 “Breakthrough of the Year” feature in Science, and the original RADseq papers, published in 2007 and 2008, have almost 2000 citations, as counted by Google Scholar.

So any paper that proposes there may be some problems with RADseq is bound to be controversial. An article published in Molecular Ecology Resources back in December leaned into that controversy right from its title: “Breaking RAD: An evaluation of the utility of restriction site associated DNA sequencing for genome scans of adaptation.” MER has now published the second of two response articles, and a response from the authors of “Breaking RAD” to those responses, so it seems like a good time to break down the reasoning for, and against, RADseq.

Continue reading

Posted in adaptation, association genetics, genomics, methods, next generation sequencing, selection | Tagged , , , , | 7 Comments

You can call her queen bee: the role of epigenetics in honeybee development

Insects have social lifestyles that are often organized in castes. Within the insect community, different individuals specialize, each having a unique role. This efficient method of doling out the workload, ultimately, is believed to be why social insect lifestyles are successful. However, how it’s determined who does what is really pretty cool.

Continue reading

Posted in genomics, haploid-diploid, Molecular Ecology, the journal, next generation sequencing, RNAseq | Tagged , , | Leave a comment

The N50 misassembly problem

This is the third in a series of posts where we explain the N50 (Nx) metric, discuss the problems surrounding it, give solutions to those problems, and suggest an alternative N50 metric for transcriptome assemblies.

In our previous post, we highlighted one problem with N50 and showed a common and easy way to inflate this metric by filtering of shorter contigs. There is, however, a second problem with the N50 metric: it does not consider correctness of an assembly at all. You can therefore easily increase your N50 by using an assembler that incorrectly joins contigs together. Let’s consider a trivial example.

You perform a de novo genome assembly of a 4 Mbp genome (Fig. 4a) and end up with four contigs of length 1 Mbp each (Fig. 4b). Let’s also assume these contigs are correct with respect to the reference genome. The N50 of your assembly will be 1 Mbp. However, you can easily create a new assembly with four times higher N50 by simply merging together contigs into one (Fig. 4c). Your new assembly will now be much worse than your previous one, with incorrect merging points (misassemblies), but it will have a much higher N50.

4a. Hypothetical reference genome

4b. Correct assembly with N50 = 1 Mbp and 0 misassemblies.

4c. Incorrect assembly obtained by merging of contigs. N50 = 4 Mbp, 3 misassemblies.

Continue reading

Posted in genomics | Tagged , , | Leave a comment

Right reads, wrong index? Concerns with data from Illumina’s HiSeq 4000

Commanding around a 70% share of a 1.3 billion USD market, Illumina is the major player in next-generation sequencing (NGS) technology. More likely than not, if you’re a molecular ecologist working with NGS data, you’ve run your samples on a Illumina platform. Until recently, this was probably a HiSeq 1500 or 2500, standard equipment for larger university-based and commercial sequencing facilities. Following its introduction in 2015, however, more and more users have switched to the HiSeq 4000, citing advantages in its increased data output, efficiency, lower cost per run, and the inevitable obsolescence of earlier entries in the HiSeq series. Which is why the results of a preprint posted Sunday alleging this new equipment had a flaw that could result in misidentified sequencing reads spread like wildfire on biology Twitter earlier this week. (As Gavin Sherlock put it: “I think this is a genuine cluster fuck.”)

Continue reading

Posted in genomics, next generation sequencing, RNAseq, technical, transcriptomics | Tagged , , , , | 5 Comments

Mapping genomes and navigating behavior for wildlife conservation

Virginia Aida wrote this post as a final project for Stacy Krueger-Hadfield’s Science Communication course at the University of Alabama at Birmingham. She is currently evaluating a potential pharmacotherapy in traumatic brain injury and anticipates graduating with her MS in summer 2017.  Although she thoroughly enjoys neurobiology, she aspires to pursue a career in conservation medicine. In the fall, she will be attending Auburn University’s College of Veterinary Medicine. 

Zoos and wild animal parks work hard every day playing matchmaker for conservation efforts.  However, there are other implications to consider when we propagate species in captivity.

McDougall et al. (2005) argued that captive breeding may cause undesirable permanent shifts in animal temperaments, such as anti-predator responses (McPhee, 2004). Moreover, some animals develop a co-dependency on humans, rendering certain individuals or even a species as a whole incapable reintroduction.

We know that the environment and genetics influence behavior. If we are already controlling the captive environment and choosing breeding pairs, why haven’t we avoided these undesirable behavior shifts?

Perhaps, we should to take a molecular approach.

Continue reading

Posted in adaptation, association genetics, bioinformatics, blogging, conservation, domestication, evolution, natural history | Tagged , , , , , , | Leave a comment