Using next generation on everything and more.

Developing genomics tools for ecological organisms is desirable because we can study a wider range of phenotypic traits over evolutionary timescales and in more populations than was possible previously. Through this we are likely to gain a more realistic and comparative understanding of how selection works on natural levels of genetic variation, where this genetic variation comes from and how it is maintained.” Stapley et al. 2010 (TREE)

Okay let’s just get this out of the way. We’re way past calling next generation sequencing technologies, “Next Gen.” I mean, really isn’t Next Gen, yesterday’s news? With the advent of the third generation sequencing technologies that can sequence a single-molecule of DNA, we’re out of date in our terms. If you’re wondering how is it possible to sequence a single DNA molecule, check this out. I recently learned of this new technology, called Ion Torrent at a biotech conference here in San Diego. Rather than using light, ion torrent uses a change in pH to detect the different bases in a single strand. Given the rate of technology advance, instead of “Next Gen” or “Third Generation,” I propose that we make our lives simple and just call everything High-Throughput Sequencing Technology (HTST).

When I started writing this series of posts on high-throughput sequencing technologies, I thought I would evaluate its usefulness to ecology and evolutionary questions. But as I started reading more and more about the topic, I realized that it probably wasn’t necessary given that there are already several good reviews that already do that (Hudson 2008 MER; Stapley et al. 2010 TREE; Siol et al. 2010 New Phytologist to name a few). And like the sequencing technologies, they are subtly different from one another, but all worth reading.

Instead of trying to give insightful comments about how to use this technology and the problems that might be associated with the technology, as I come across papers worth reading, I’ll just highlight a few things and then give my two cents worth.

The above quote is taken from a TREE review entitled “Adaptation genomics: the next generation.” where the authors identify how next generation sequencing tools will allow evolutionary biologists to investigate questions like “How many genes are involved in adaptation? What types of genetic variation is responsible for adaptation – standing or newly acquired mutations?

In this article, Stapley et al. (2010), suggest that ecologists tend to have a good idea of what traits might be involved in adaptation for their study organism. They also suggest that geneticists know a lot about the genomic architecture of a few classical model organisms but very little about the ecological relevance. This argument is a little bit of a strawman, because it sets up a false opposition between ecology and genetics. In their eyes, the importance of this technology is that it will make it easier to integrate both ecological and genomic data and to develop for ecologically interesting organisms “a range of genomic resources such as whole genome sequences, transcriptome sequences, and genome-wide marker panels can be generated within the scope of a three-year grant.”

When I first read this statement, I thought that the authors had found a practical way to explain the rate at which genomic data can be generated. But then I realized how uncomfortable the phrase, “can be generated within the scope of a three-year grant” made me feel. And while I can’t put my finger on the exact reasons, I think it’s because it underscores the stark reality that research has to operate within the confines of short-term constraints. Clearly the authors mean that this will shorten the timeframe for researchers to start answering the interesting questions on any organism.

And yes, it’s a pretty exciting time to be evolutionary/population geneticist. High-throughput sequencing has been used on several model organisms – like Arabidopsis thaliana, Drosophila melanogaster, and on non-model organisms like Coregonus spp (Lake whitefish), Arabidopsis lyrata, Gasterosteus aculeatus (three-spined stickleback), Heliconius butterflies with great success.

However, the link between generating genomic data for interesting ecological organisms and how high-throughput sequencing technology has already reinvigorated current studies of the genetic basis of adaptation is missing something. The tacit implication is that because we can use HTST to create extensive genomic toolkits on non-model organisms, we should be able to gain a stronger understanding of how selection operates on ecologically relevant variation. And thus answer some of the questions that have “puzzled ecological geneticists for decades.”

I don’t disagree that we’ll move science along, but all of the non-model organisms described in the review have had extensive conceptual legwork contributed by many, many scientists over several years. It is because these biological systems are so highly developed conceptually that the power of HTST can be fully realized. For example in the three-spine stickleback system, it has taken several generations of grad students and postdocs to work out that replicate isolated freshwater stickleback populations were independently derived from their oceanic ancestors, that there is no gene flow between these isolated populations of freshwater habitats, that there is significant variation in behavior, life history, and morphology, that diversification happened very rapidly, and that selection has acted in parallel in these different isolated freshwater habitats evoking similar phenotypic trajectories at local, regional and global scales (the references are too numerous to cite so I’ve included a select few: Orti et al. 1994 Evolution, McKinnon and Rundle 2002 TREE, Hohenlohe et al. 2010 PLoS Genetics).

This example is also where the power and limitation of HT sequencing is best understood. In the case of the stickleback system, Baird et al. 2008 and subsequently, Hohenlohe et al. 2010 used Illumina-sequenced RAD tags to gather genome-scale sequence data on natural populations. The data confirmed previous work that freshwater populations were independently derived from the oceanic populations. Furthermore, using high-throughput sequencing technology (RAD-tags), researchers identified 9 genomic regions (3% of the genome) that were differentiated between the two ecotypes (freshwater and oceanic) and thus, putative candidate regions associated with adaptation to freshwater. Some of these genomic regions co-localized with previously identified loci of major effect (e.g. the Ectodysplasin A (Eda) locus). But using this HT sequence data, researchers found several additional regions showing parallel differentiation across independent populations. The power of this much data is that now there is a list of novel candidate regions that may be important in adaptation to freshwater.

Even more interesting is that the data generated from HT sequencing did not find elevated divergence in a region previously identified as underlying a major phenotypic change between the marine and freshwater fish. This pelvic structure, a bony stomach with spines, is present in the marine fish but reduced in the freshwater. The region responsible is a cis-acting tissue-specific enhancer located in the Pituitary homeobox transcription factor 1 gene (Pitx1) found at the telomeric end of linkage group seven (Chan et al. 2008 Science) . So why did high-throughput sequencing data, which provided 45,000 SNPs to the researchers not detect this locus? Hohenlohe et al. (2010) suggest that multiple alleles were selected in different freshwater populations leading to a soft sweep pattern. If, as Hohenlohe et al. suggest, that the soft sweep pattern is true, then using only high-throughput sequencing data to detect regions of adaptive significance could potentially lead to a bias against detecting this form of selection.

High-throughput sequencing technologies do allow each lab to cheaply and in a relatively quick timeframe generate a specific type of genomic data that can inform our understanding of how ecology impacts the genomic architecture of an organism. But it does not mean that within the scope of a three-year grant we will generate anything remotely resembling a detailed picture of the genetics of adaptation. This rich picture will be formed after several decades of hair-pulling by grad students, postdocs and their supervisors all of whom will toil away testing, challenging and advancing our understanding of adaptation.


About Dilara Ally

Dilara Ally works as a Bioinformatics Scientist for one of the hottest biofuel companies in San Diego, CA called SG Biofuels.
This entry was posted in next generation sequencing. Bookmark the permalink.