Best practices in sample naming

Wherein I try to save me from myself
Let’s imagine a young scientist, bursting to the seems with enthusiasm and schemes to uncover the secrets of the biological world. Everything is new and she learns as she goes! Let’s call her… Kathryn.
Imagine past-Kathryn. She’s busy, she has things to do. She’s setting up a major experiment or planing out a collecting trip spanning thousands of miles. She has a crew of undergrads and precious volunteers to manage. When a sample came into her hand she named it in an expeditious fashion and moved on.
Now imagine current-Kathryn. She has been BURNED. Who was this capricious imp of chaos that decided this awkward and error prone sample naming system? How is she supposed to tell one person’s O7_4L from another persons 01-Al? Can she even trust her own handwriting? At every turn a different data handling system/program/manipulator has choked on some aspect or another – how many different iterations of these names exist, reflecting into infinity, like one mirror facing another?

Infinity mirror effect. (Wikimedia Commons: Elsamuko)


 
Let’s learn from her struggles, shall we?
Continue reading

Posted in data archiving, fieldwork, howto, methods | Tagged , | 2 Comments

DNA extraction for PacBio sequencing

PacBio is emerging as the favoured sequencing approach for assembling high-quality reference genomes. But the big issue with PacBio sequencing is that to get long sequence reads you need to start with high molecular weight DNA. For my first foray into PacBio sequencing back in 2016 I sent a single DNA sample from the parasitic plant Euphrasia  that I’d extracted from silica dried tissue with a standard commercial column-based DNA extraction kit (Qiagen DNeasy Plant Mini Kit). I did all that I could to minimise shearing by using wide-bore tips and by not vortexing the sample. The DNA looked fine when run on an agarose gel, with a single band above 20Kb, and with no smear that would indicate shearing.
The PacBio sequence data I got back from this DNA sample was disappointing. Most of the sequences were incredibly short, the size distribution showed a peak at less than 2 Kb, and  few reads were over 20 Kb. It seemed that the initial gel picture wasn’t really capturing the integrity of the DNA, and DNA damage such as breaks or nicks were present. This damage causes the polymerase to fall off during PacBio sequencing and results in short or failed reads.

PacBio read length distribution of a column-based plant DNA extract.


For my second attempt, I sent my silica dried tissue sample to a commercial company that offers a high molecular weight DNA extraction service (there are many companies to choose). I paid a hefty $1500 for them to extract DNA from a sample using their own proprietary DNA isolation protocol (similar to this). While I’d normally extract DNA myself, in this case I was short of time before some grant money ran out. The DNA they extracted looked excellent when run on a gel, with a smear above 50 Kb.
This time the PacBio data I got back was much better. While there were plenty of short fragments that are of little use, there is a good proportion over 10Kb and 20Kb, and the tail of read lengths is really long. There’s even a single 140Kb read! While the comparison between the read length distributions of the two libraries isn’t exactly like-for-like (the sequencing centres performed different size selections), I’ve now seen for myself the massive impact of DNA integrity on the quality of long-read sequence data.

PacBio read length distribution of a high molecular weight plant DNA extract.


What have I learnt from this experience?

  1. I think many of us need to reconsider our reliance on basic quality control (QC) checks for DNA samples. My QC checks usually involve measuring total yield using a fluorescent assay such as the Qubit, and the size distribution of DNA run on an agarose gel or a Tapestation/Bioanalyser. I don’t think any of these clearly show DNA breaks or nicking, though it may be indicated by a smear below a band on a gel. Perhaps we’ll have to accept that even what appears to be the ‘perfect’ DNA sample may perform poorly, and that we need to treat our DNA very carefully. Or perhaps we’ll have to adopt additional QC measures to look for DNA breaks or nicks.
  2. While there has been a massive and necessary shift from lab skills to bioinformatic skills, this has reminded me that lab skills are still important. There are a massive number of protocols for extracting high molecular weight DNA. Just about all of them forgo the easy-to-use extraction kits (putting DNA through a regular column is a bad idea if it is intended for long-read sequencing), with many protocols returning to old fashioned DNA extractions used for BAC sequencing. These protocols are often technically challenging and involve many stages, as well as species-specific optomisation. Perhaps the move to high molecular weight DNA extractions and long-read sequencing will require us to spend more time in the lab.
  3. Recent years have seen greater use of museum specimens and dried specimen collections for genetic analysis. I can’t help but think that many of these collections will prove not to be useful for these new long-read sequencing approaches and whole genome assembly. This may not be absolute—my freshly collected silica dried plant sample worked fine—but in some cases we may need to get back in the field and recollect samples for genomic analyses.

What is your experience with DNA extraction for PacBio sequencing? Let me know @alex_twyford.

Posted in genomics, next generation sequencing, Uncategorized | Tagged , , , | Leave a comment

Signal Boosting a Comprehensive Review of eDNA and Metabarcoding Studies

Everything is meta these days – metabarcoding, metagenomics, and now meta blog posts that are reviews of reviews. Much like every ecologist at least dabbles in the molecular world, so most of those predisposed to molecular ecology and population genetics are at least dabbling in (or teaching or reviewing) studies with an environmental DNA (eDNA) component. The number of metabarcoding, metagenomics, and/or eDNA studies has dramatically increased in recent years and if you find yourself dabbling, or at the precipice of designing experiments,  you probably need some all-encompassing reference to ground you. Fortunately, Denier et al (2017) put together a very helpful review for those of us drowning in eDNA and metabarcoding literature. They separate themselves from other reviews by focusing on four aspects: summary of eDNA studies focused on plants and animals , what’s known and unknown about the spatial and temporal scales of eDNA info, guidelines and challenges regarding experimental design, and emerging applications.
 What are the advantages of these types of studies?
The authors posit that with the explosion of high throughput sequencing (HTS), , the way we survey biodiversity has subsequently vastly changed. Being able to associate a taxonomic identity with a DNA barcode has led to the eDNA metabarcoding revolution. This technique of surveying biodiversity has obvious advantages, including the ability to survey entire communities that have been previously excluded due to the size of organisms and/or elusiveness, thereby increasing diversity measurements and increasing the resolution of taxonomic identifications and subsequent databases. The potential increase in scope of metabarcoding studies also allows for applications on the community and ecosystem scale, like determining whether “observed community changes surpass acceptable thresholds for certain desired ecosystem functions” and guide resource management at ecosystem scales. Furthermore, the taxonomic scope that can be sampled is positively grandiose, with one study using metabarcoding techniques spanning 5 different genomic regions to survey three domains of life in topsoil (Drummond et al 2015).
What types of studies have been done?
The authors emphasize the distinction between community DNA and eDNA.  Community DNA studies target groups collected in bulk, then separate organisms from debris, pool them together, and extract the DNA in bulk.  Sequences from community DNA extraction can be traced back to the source organism and taxonomically verified and Sanger sequences of voucher specimens can lead to direct verification of species.  This is untenable for eDNA sampling so species verification relies upon curated databases like Genbank, SILVA, and the Barcode of Life Data System (BOLD).  With community DNA the presence of a detected species in that time and that place can be inferred, but with eDNA, the presence of that species’ DNA may not necessarily mean that species was directly present at the time or place of sampling. For example, does the DNA you collect at river sites represent what’s present in the here and now, or what’s present upstream? Nevertheless, there have been a myriad of elegant studies done in freshwater, marine and terrestrial/aerial regimes. Examples in the review include early detection of invasive populations, the use of terrestrial haematophagous leeches to collect DNA from their endangered/elusive vertebrate host species in geographically remote regions, filtered air samples to collect pollen, and collections of spider webs, pollen from honey, and feces from generalist predators to estimate biodiversity of hard-to-capture taxa.
eDNA work can provide a glimpse into the ecological past as well.  Whereas sampling surface water in freshwater systems provides contemporaneous abundance estimates, sediment cores from those same systems tell you about present and past biodiversity.  Lake sediment cores have been used to look at ancient biodiversity levels from 6 – 12.6 thousand years before present. Sediment from ice cores have been used to look at species abundances 2000 years before present and to track previous extinctions associated with glacial events.  Ever wonder how in the world DNA is preserved in sediment for long periods of time?  The authors explain that adsorption of nucleotides onto sediment particles shields them from degradation – especially oxidation and hydrolysis. In fact, marine sediment eDNA concentrations have been shown to be 3 orders of magnitude higher than seawater eDNA (Torti et al 2015).  So the next time you are having trouble getting high quality DNA from your extractions, rub some dirt on them.
Is there evidence that sequence read abundance correlates with taxa abundances?
It seems like an obvious question, but this is really the heart of the matter.  Sure, it seems obvious that eDNA techniques are very useful in situations where you are looking for presence/absence of species that are hard to survey with conventional methods, but the Holy Grail is inferring abundances of species from eDNA collections. The authors cite many examples of studies in freshwater and marine aquaria and mesocosms where eDNA was successfully used to measure relative population abundance with species-specific primers and qPCR.  However, studies scaling this up to community level are rare.  Table 1 from the review offers many examples of studies comparing richness estimates with traditional sampling or historical data.  In every case they cite, the eDNA study produced similar or higher diversity estimates. In one example I find particularly interesting, Thomsen et al (2016) has shown a correlation between relative abundance of individuals and biomass from deep-sea fish trawls when sequence reads are pooled to family level.  If these guys can get results from deep-sea trawls, there’s really no excuse for the rest of us, is there?  Except there are plenty of excuses (see challenges below).  Ultimately, it is up to the “ecology of the DNA” (Barnes and Turner 2016) i.e. its state, origin, fate, and transport, which can vary greatly between studies.
What are the challenges?
Figure 3 from Denier et al (2017) nicely summarizes the workflow of a typical metabarcoding study.  It also illustrates the myriad challenges that can arise in the study design, field collections, laboratory, and data processing. It behooves any scientist preparing a study employing these methods to think carefully about the questions posed in each of these categories. Furthermore, though discussions of challenges in metabarcoding studies tend toward the technical, the authors stress that before those concerns come into play, it is imperative to be clear about the source material (community DNA vs eDNA).  Otherwise, this complicates the downstream analysis pipelines and subsequent interpretation of biodiversity patterns through space and time.  Another important consideration is subsampling during processing steps (see Figure 2 in the review for a demonstration), which will likely result in the loss of rare sequence reads.
 
Figure 3 from Denier et al 2017
The rise of HTS has led to the feasibility of multiplexing large numbers of samples.  However, it also creates the possibility for errors and biases, such as tag jumping, whereby indexes/adapters (oligos attached to sequences from different samples used to identify each sample uniquely), become associated with sequences from another sample.  Schnell et al (2015) found this to occur in roughly 2.5% of cases, where sequences had false tag combinations that led to erroneous assignments of sequences to samples.  This phenomenon also seems to occur at higher frequency when using the HiSeq 4000 platform.  HTS can also give rise to technical artifacts, like finding significant differences due to samples run on different machines or different days (batch effects), instead of biologically meaningful differences.  Splitting sample groups across platforms/runs is one way to minimize such effects. Mismatches between primers and the DNA of certain taxonomic groups (primer bias) are another common challenge.  This results in some taxa being preferentially sequenced over others or absent altogether from downstream biodiversity estimates.  Therefore, when designing new primers, testing in silico, in vitro, and in situ is imperative. Also, HTS sequencing favors the amplification of smaller products so eliminating excess indexing primers from reactions via purification steps after QC checks is a must.
Lack of taxonomic resolution can occur when the discriminatory power of the primers is weak. For example, many animals have been barcoded via the cytochrome oxidase (COI) gene using conventional sequencing methods, yields a gene fragment of ~400-600 bp, depending on the primers you use.  If you want to barcode a prokaryote using cloning techniques, you’ll get the full 16S gene at about 1450bp.  However, metabarcoding techniques using HTS require much shorter fragments, which lowers the marker’s capability to discern between taxonomic groups. Also, there are more bioinformatics pipelines and curated databases available for bacteria and microbial eukaryotes than for macro-organisms.  Although pipelines developed for microbes can be repurposed to preprocess data from macro-eukaryotes, the lack of comprehensive databases can prove challenging. However, an advantage to targeting megafaunal communities is that typically there will be less diversity than in microbial communities so less computational time/effort.  Also, species boundaries tend to be better defined. Huzzah!
Are there standards of practice?
Kinda. There are standard barcoding markers defined by the Consortium for the Barcode of Life (CBOL).  If you want to compare your results to other studies, you need to use these standards.  COI is the most common barcoding gene for many animal taxa, though with many exceptions.  Popular alternatives are 12S, 18S, 16S and/or cytB.  A combination of rbcL and matK plastid loci or ITS2 are the standards for most plant taxa and 16S, spanning variable regions V3 and V4, is the most commonly used for prokaryotes.  The Catch-22 is that deviating from the standards is the key for picking up previously unsurveyed taxa but also makes comparisons with curated databases difficult.
Quality control methods in the lab are of the upmost importance and may be higher stringency than what researchers using more conventional methods are used to.  For example, employing negative controls, not just at the PCR stage, but at each stage of lab work, and sequencing them.  Often contamination can be below detection limits but can be used to detect de-mulitplexing errors or used in statistical modeling to rule out false positive detection.  Constructing mock communities from pooled DNA extracts as positive controls alongside that of eDNA samples is good practice for standardization and comparison.  Typically, species not expected in the study area are used so that contamination can be detected.
 Data analyses in these studies require a strong commitment to transparency, which may be daunting due to the new methodologies, amount of data generated, etc.  The authors mention several important references that have addressed standards of practice, like MIMARKS  (minimum info about marker gene sequence, Yilmaz et al 2011)) and MIxS (minimum info about any “x” sequence).  Goldberg et al 2016 contains a thorough breakdown of recommendations of reporting standards and challenges specific to eDNA studies in aquatic environments. In addition, Sandve et al 2013 (not mentioned in the review) propose 10 rules for reproducible computational research in general, which should be applied to eDNA studies.  These include keeping track of how every result was produced, the version of every program used, recording all intermediate results, avoiding manual manipulation of data, version control of all custom scripts, and storing all raw data used to generate plots. Some of these outputs can be entered into data repositories, like DRYAD, GITHUB, or FIGSHARE.  The compliance of studies to these standards is variable, however, especially since this type of research can be published in many different types of journals, each with their own standards and requirements. The authors strongly recommend increasing transparency in published articles, though this can fly in the face of Reviewer 2, who wants you to cut the length of your manuscript by a third. Much like this blog post.
Where are the gaps in knowledge?
The authors give several examples of knowledge gaps in metabarcoding studies (as of publication date).  The following are some that caught my eye:

  • Batch effects have been shown in 16S bacterial diversity studies but unknown if prevalent in animal and plant studies.
  • eDNA studies surveying living aquatic plant communities.
  • Estimated sources of eDNA in surface water from lake’s catchment and relating it to diversity that occurs locally.
  • Macro-organisms known to inhabit groundwater (gastropods, isopods, fishes, etc).
  • Longitudinal transport of animal and plant DNA in marine environs.
  • Simulation studies on noisy data sets to see how they conform to neutral theory parameters and affect rank abundance curves to estimate the expected error distribution around estimates.
  • Coupling distribution or occupancy modeling with eDNA findings to improve species richness estimates. This technique is still rare in eDNA studies.

So take heart – if you often find yourself up against that inner voice telling you that all the cool ideas have already been taken. In the contemplative and melancholy words of Gillian Welsh, “there’s gotta be a song left to sing/Cause everybody can’t thought of everything”.
All in all, this review is a helpful, comprehensive reference for experimental design considerations in the field, laboratory and data analysis, common loci used for many taxa, summary of eDNA studies in different habitats and what was being measured.  It’s worth mentioning that Pompanon et al came out with a paper titled “Who is eating what: diet assessment using next generation sequencing” in 2012 that addresses many of the concepts and challenges found in Denier et al 2017, while also providing more detail and explaining more of the underlying concepts. If you are looking for a good starting point for teaching this field of study, I would definitely include that study and even start there.
Now I shall leave you, dear reader with a couple of links to impressive projects and technologies developed at Monterey Bay Aquarium Research Institute, leaders in development and implementation of in-situ eDNA studies in the deep-sea.
 
 Barnes, M. A., & Turner, C. R. (2016). The ecology of environmental DNA and implications for conservation genetics. Conservation Genetics17, 1–17.
Deiner K, Bik HM, Mächler E, et al. Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Mol Ecol. 2017;26:5872-5895. https://doi.org/10.1111/mec.14350
Drummond AJ, Newcomb RD, Buckley TR, et al. Evaluating a multigene environmental DNA approach for biodiversity assessment. GigaScience. 2015;4:46. doi:10.1186/s13742-015-0086-1.
Goldberg, C. S., Turner, C. R., Deiner, K., Klymus, K. E., Thomsen, P. F., Murphy, M. A., … Cornman, R. S. (2016). Critical considerations for the application of environmental DNA methods to detect aquatic species. Methods in Ecology and Evolution, 7, 1299–1307.
Pompanon, F. , Deagle, B. E., Symondson, W. O., Brown, D. S., Jarman, S. N. and Taberlet, P. (2012), Who is eating what: diet assessment using next generation sequencing. Molecular Ecology, 21: 1931-1950. doi:10.1111/j.1365-294X.2011.05403.x
Thomsen, P. F., Møller, P. R., Sigsgaard, E. E., Knudsen, S. W., Jørgensen, O. A., & Willerslev, E. (2016). Environmental DNA from Seawater Samples Correlate with Trawl Catches of Subarctic, Deepwater Fishes. PLoS ONE, 11, e0165252.
Torti, A., Lever, M. A., & Jørgensen, B. B. (2015). Origin, dynamics, and implications of extracellular DNA pools in marine sediments. Marine Genomics, 24, 185–196.
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285
Schnell, I. B., Sollmann, R., Calvignac‐Spencer, S., Siddall, M. E., Douglas, W. Y., Wilting, A., & Gilbert, M. T. P. (2015). iDNA from terrestrial haematophagous leeches as a wildlife surveying and monitoring tool–prospects, pitfalls and avenues to be developed. Frontiers in Zoology, 12, 1.
Yilmaz, P., Kottmann, R., Field, D., Knight, R., Cole, J. R., Amaral‐Zettler, L., … Cochrane, G. (2011). Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nature Biotechnology, 29, 415–420.
 
 

Posted in bioinformatics, community ecology, DNA barcoding, metagenomics, methods, microbiology, next generation sequencing, population genetics | Tagged , , , , , , , , , , | 1 Comment

We can make academia more family friendly

This one tickled me for too long. It became a serious itch and I feel I have to say something. Two weeks ago, Rebecca Calisi Rodríguez and a Working Group of Mothers in Science published an opinion article in the journal PNAS about the challenges of not only being a woman but also being a mother in academia. Specifically, they discuss the problem of attending conferences with young children. They give straightforward solutions on ‘How to tackle the childcare-conference conundrum’. I would like to point out that this is a fantastic piece and it should be read by everyone. Admitting weaknesses and exposing yourself as a mother can be very risky in a competitive environment. It makes you vulnerable and nobody wants to be vulnerable when your career is at stake.
Background of the paper is the fact that many mothers with young kids don’t participate in conferences because they struggle to find care for their children. As a solution, they describe four concrete suggestions they call CARE (for Childcare, Accommodate families, Resources, Establish social networks).
After I read this article and the accompanying blog in My Parenting Journal, I was left with an awkward feeling that would not go away. It has been several days now and it developed into a verbalized statement. Here is where I see the problem with parenting in academia. My view is written from the postdoc perspective. I wish academia would be a place for everyone, however, at my stage it is only a future for a few of us. Caring for a partner and children takes away time. Valuable time that we would otherwise invest into doing more research and other academic engagements.
Continue reading

Posted in blogging, career, community, conferences, politics, primates, United States | Tagged , , , , | Leave a comment

Found in translation: The evolutionary history of RNA viruses in vertebrates

I have to admit, viruses aren’t normally my thing, but this is pretty darn cool.
In a study out by Shi and colleagues this week, researchers identified 214 new viruses that, as the authors so succinctly state, reveal “diverse virus-host associations across the entire evolutionary history of the vertebrates”.

             “In summary, this study reveals diverse virus–host associations across the entire evolutionary history of the vertebrates.”

RNA viruses are important since they can have a huge influence on human health, which means that most of the time research isn’t so concerned with the basic understanding of their origins and how they have changed over time in relation to their hosts. The authors of this study highlight that another gaping hole in the understanding of RNA viruses exists because most research focuses on viruses from birds and mammals, the rest of vertebrate diversity (including reptiles and fish) hasn’t been well explored. Continue reading

Posted in bioinformatics, Coevolution, evolution, transcriptomics | Tagged , , | Leave a comment

The secret life of invaders

Invasive lionfish. (Wikimedia Commons:  LASZLO ILYES, Jacopo Werther)


So I have this pet theory. And damn if the evidence doesn’t seem to be piling up.
Am I living in the bubble of my own google alerts? Possibly.
I’m an evolutionary ecologist and invasion biologist, and (surprise!) my pet theory is about invasive species (and by that I mean, species introduced to novel habitats by humans, which detrimentally affect the habitats they invade). I’m interested in invasive species, not just for the dramatic impacts they can have on human and natural systems in their own right, but because, in the game of global change, they are #winning.
Continue reading

Posted in adaptation, evolution, hybridization, mating system, population genetics | Tagged , , | Leave a comment

Sequencing round-up 2018

The deluge of new sequencing approaches continues at a pace. It seems that you turn your back for five minutes and there’s a shiny new sequencing platform promising to deliver more for less. What is the current state of play in the sequencing world, and what developments in sequencing technology should we be keeping an eye out for in 2018?
Illumina
Market leader Illumina have consistently delivered new short-read sequencing platforms with increasing output, reducing the costs for most sequencing applications. Their Novaseq 6000 is the latest addition to their range, and given time this seems likely to replace other HiSeq platforms such as the HiSeq X and HiSeq 4000. There are various types of sequencing chemistry available, with the S4 chemistry promising to deliver a staggering 500+ Gb of data per lane, while the current S1 chemistry delivers a more modest 240 Gb per lane. As far as I know, most sequencing centres are still getting to grips with this machine, and it’s currently unclear whether the move to the rapid output of 2-colour SBS chemistry will compromise sequencing quality. Watch this space: Once up-and-running, this will be give the cheapest sequencing to date, and will greatly reduce the cost of whole genome sequencing. Continue reading

Posted in genomics, next generation sequencing | Tagged , , , | 2 Comments

Oh my ploidy … diploids evolve more slowly than haploids?

It’s been an embarrassingly long time since I last sat at my keyboard in a TME capacity (#NewPI chat doesn’t really count)!
One year ago today, to be exact (writing this on 28 March, for publication on 29 March). Thus, it is fitting that my post will be on ploidy to get me back into the blogosphere!
Ploidy varies A LOT across the natural world!
All sexually reproducing organisms alternate between haploid and diploid stages through the processes of meiosis and fertilization.
Yet, we don’t understand why some taxa are diploid dominant, whereas some are haploid dominant, and some mix it up with long-lived haploid and diploid stages.
In yeast, some recent studies have shown haploids may in fact adapt more quickly (Gerstein et al. 2011), but why?
Continue reading

Posted in adaptation, blogging, evolution, genomics, haploid-diploid, selection, yeast | Tagged , , , , , , | Leave a comment

Major new microbial groups expand diversity and alter our understanding of the tree of life

I still believe in revolutions. And sometimes they just happen, almost unnoticed. One such revolution happened on a boring 11th of April 2016 when Laura Hug et al. published their new tree of life in the journal of Nature Microbiology. Many textbook trees of life are centered on eukaryotic evolution and underestimate the true global diversity of organisms. This is because it is easier to find large critters than small ones and also because many environments have been under-sampled in the past. Now we have the tools to capture microorganisms from extreme and elusive places, for example hot vents on ocean floors, soils in the rainforest, or shrinking ice on the poles. The only thing we need to find new microbes is an environmental sample (e.g. water or soil), a DNA extraction kit, a sequencing machine, and a computer. Oh yes, and a few clever scientists like Laura and her colleagues. They added new genomic data from more than 1000 uncultivated and little known organisms to the current tree of life. The addition of these sequences to the tree revealed an astounding diversity and reminds us that we do not know much yet about the true diversity of life on this planet.
Cindy Castelle and Jillian Banfield (who is an author of the Hug paper, too) just published a review about the implications of this new tree of life for the rest of us in the journal Cell. In this blog post, I will summarize this review with an emphasis on a few topics that I find very exciting. In summary, Hug et al.’s publication quietly suggested a change in the structure of the tree of life and enriched our understanding of the biology, evolution and metabolic roles of microorganisms.
Continue reading

Posted in bioinformatics, community ecology, evolution, genomics, metagenomics, microbiology, next generation sequencing, phylogenetics | Tagged , , , , , , , | 1 Comment

Are population genomic scans for locally adapted loci too successful?

Last Friday, Molecular Ecology released an interesting new systematic review online ahead of print. Colin Ahrens and coauthors at a number of Australian research institutions compiled results from 66 papers reporting tests for locally adapted loci based on either FST or genotype-environment associations, and find some interesting trends. The one that raised some eyebrows on Twitter, though, is presented in the paper’s Figure 3:

For papers identifying locally adapted loci from SNP data in wild populations, the proportion of SNPs tested that were local adaptation candidates based on either (a) FST outlier status or (b) significant genotype-environment associations, in comparison to the log-scaled number of individuals sampled in the reported dataset. (Ahrens et al. 2018, Figure 3)


That’s right, there are papers in the dataset that identify almost 1 in 4 SNPs as FST outliers, and up to 8 in 10 SNPs as significantly associated with some environmental gradient. In fact, from the contents of the Dryad repository supporting the paper, it looks to me as though fully 24% of the compiled studies found that at least 10% of tested SNPs were FST outliers, and 15% found that 10% or more tested SNPs had significant environmental associations. That seems like a lot of SNPs coming up locally adapted — my first reaction was to snark that our field has forgotten what the word “outlier” means. To wit: an outlier is a data point that falls well beyond the range of values seen in the rest of the dataset. If ten percent of your SNPs are “outliers”, then it’s kind of odd to call them outliers.
Continue reading

Posted in adaptation, association genetics, evolution, genomics, population genetics, selection | Tagged , | Leave a comment