SpaceMix, and a brief history of Spatial Genetics

Incorporating spatial data to inform studies of the population demography of a species has a long history of interest. From inferring geographical clines in Principal Components Analyses (Menozzi et al. 1978), using location data as “informative priors” during model-based estimation of admixture (Hubisz et al. 2009), using phylogenetic trees (and other distance based methods) and superimposing them upon geographical distributions to make predictions about what has come to be known as “phylogeography” of a species, measuring the correlation between geographic distance matrices and genetic distance matrices (sensu Peakall and Smouse 1999), estimating spatial autocorrelation (eg. Moran’s I, correllograms – see Brian Epperson’s book for an excellent review, also Sokal and Oden 1991) to discover directional clines in the genetic-spatial distribution of a species, pruning variance-covariance matrices in genetic data using graph-theoretical/network algorithms to discover geographical-genetic structure, detecting differences in allele frequency spectra of populations to detect founder effects and range expansions (see Peter and Slatkin 2013), just to name a few.

SpaceMix

Inferred map of human admixture using SpaceMix from Bradburd et al. (2015).


At the core of all these methods is the variance-covariance structure in the genetics (primarily observed and/or ancestral allele frequency distribution), and the apparent geographical distribution of the species. Continue reading

Posted in bioinformatics, howto, population genetics, R, software | Tagged , , , | Leave a comment

A current review of modern and ancient eDNA


There is something romantic about environmental DNA. The ability to discover the presence of almost any species just by detecting the microscopic bread crumbs they leave behind? That is really just a deerstalker and pipette away from Sherlock-level science.
But if you are anything like me, aside from knowing that folks get excited about, you might not know what exactly is possible using eDNA and metagenomics. No matter what your familiarity with the field is, I’m betting you can learn something from this new review from Mikkel Pedersen and colleagues:
Here were my top three gee-whiz moments:

  1. The eDNA under examination may be hitching a ride.

Natural transformation is a process through which cells take up extracellular DNA from the surroundings and integrate it into their own genomes [46,47]. Many bacteria are known to be agents for natural transformation, as are some archaea and even a eukaryotic group of micro-invertebrates, the bdelloid rotifers [4851]. The majority of DNA that microbes take up is quickly degraded and re-metabolized in the cell, but some DNA persists for long enough to recombine with the host genome [52].

  1. I knew contamination was one of the most difficult aspects of eDNA studies….but even the reagents?!

…..contaminants can be difficult to distinguish from endogenous DNA. For example, DNA contaminants from various sources are found in reagents [10,21,7782]. Although most of these are from readily identified domesticated animals or cultivated plants, others such as Salix [83] are not and can be mistaken for genuine environmental diversity.

  1. Ice cores, soil samples, permafrost? Too easy. Let’s go find some whales!

Recently, two studies showed that seawater is also a source of macro-organismal eDNA for detection of whale species [18] and marine fish diversity [17] (figure 2). Importantly, eDNA from fresh and seawater appears to reflect contemporary rather than past diversity, as eDNA decays within a few days or weeks in the water column [16,17,61,196,197].

Pedersen M.W., L. Ermini, C. D. Sarkissian, J. Haile, M. Hellstrom, J. Spens, P. F. Thomsen, K. Bohmann, E. Cappellini & I. B. Schnell & (2014). Ancient and modern environmental DNA, Philosophical Transactions of the Royal Society B: Biological Sciences, 370 (1660) 20130383-20130383. DOI: http://dx.doi.org/10.1098/rstb.2013.0383

Posted in DNA barcoding, genomics, metagenomics, Paleogenomics | Tagged , | 2 Comments

A population genetic R-evolution

Uphill, both ways, in the snow, without shoes … quite apt when thinking of the dark days, in the not too distant past, in which a separate input file was needed for each popgen analysis in order to use a handful of separate programs (often for idiosyncratic reasons).
Add a complex life cycle into the mix, such as an alternation between haploid and diploid free-living phases, and you can multiply the number of input files by two. Yet, then you’d have to maintain a list of programs that only were compatible with diploids and the ones that will take diploids and haploids, but separately.
The reality is that not all organisms fit nicely into a diploid-only or even haploid-only box.
When I began my PhD and my first foray into the population dynamics of haploid-diploid seaweeds, GenAlEx (Peakall and Smouse 2006, 2012) was a revelation in terms of ease of use. There was one input sheet (though still per ploidy) and it could be stored with all output sheets in the same, albeit massive, Excel file, reducing the seemingly endless array of individual input and output files.
As 2015 dawns, the brave new world of population genetic analyses in R may make the multiple popgen input files of yesteryear a relict, not unlike floppy disks or Beta-decks.
For population geneticists, and especially those with a penchant for organisms that don’t conform, R is a limitless palette with a much larger popgen repertoire than before.
Continue reading

Posted in howto, methods, population genetics, R, software, Uncategorized | 3 Comments

Whip it. Population structure and cross-species transmission of Whipworms

Whipworm (photo from WikiMedia commons)


This may be my second worm-related post, but it comes from the PLoS journal that is first in my heart: PLoS Neglected Tropical Diseases. And, as the journal name suggests, it is about a neglected tropical disease: the Whipworm (Trichuris sp.).
Continue reading

Posted in Uncategorized | Leave a comment

Linking gene expression and phenotype in an emerging model organism

Female Tigriopus californicus with egg sack. Photo by Morgan Kelly

Female Tigriopus californicus with egg sack. Photo by Morgan Kelly


Last week in his post “Transcriptomics in the wild (populations),” TME contributor Noah Snyder-Mackler focused on a recent paper by Alvarez et al. that reviews the last decade of transcriptomic research including the goal of linking gene expression and phenotype. Researchers today routinely collect transcriptomic data for non-model organisms but without robust genomic resources, (for example, a well-annotated genome) and/or the ability to perform genomic manipulations (for example, knockout organisms), it is often difficult (and sometimes controversial) to assign function to candidate genes.
The tide pool copepod Tigriopus californicus (pictured above) is an up and coming model system for a wide range of research areas including physiology, neurobiology, ecology, speciation, hybridization, and local adaptation. The Burton, Edmands, Kelly, and Willett labs (among others) continue to generate genomic and transcriptomic data for Tigriopus and a new method published recently in Molecular Ecology Resources by Barreto, Schoville and Burton is an important contribution to the Tigriopus genomic toolbox.
Continue reading

Posted in genomics, howto, methods, Uncategorized | Leave a comment

Species and sensibility

Ciona intestinal is a species complex composed of 4 species. © SA Krueger-Hadfield 2012

Ciona intestinalis a species complex composed of 4 species. © SA Krueger-Hadfield 2012


Pante et al. (2014) performed a literature review of marine population connectivity in order to illustrate the biased estimates of connectivity which can result from the failure to recognize an evolutionary-relevant unit, such as a species.
When exploring the connectivity of a set of populations, it may be necessary to revise and reassess taxonomic status.  This is particularly true in the marine environment, which is vastly under-sampled as compared to terrestrial habitats.
Poor species delimitation doesn’t just affect an individual connectivity study, but can affect meta-analyses and reviews investigating evolutionary and ecological trends.  It can also affect studies of speciation, phylogenetic studies, invasion biology and biodiversity inventories.
The authors review relevant examples of over- and under-estimation of connectivity due to poor species delimitation.  They also provide a primer on delimiting a species and treating them as scientific hypotheses.
But, it’s important to note that the results from careful connectivity studies can provide evidence about divergence between different lineages.  However, in order to carefully explore connectivity, we need to keep in mind:

(1) the state of knowledge on the biology of the studied organisms, (2) the state of taxonomic treatments of the studied organisms, (3) the spatial and temporal scales of sampling, (4) the characters used to infer connectivity patterns and (5) how to synthetize information in multimarker studies

In other words, we need to take into account life history and ecological traits.  If the above knowledge is limited or nonexistent, the authors propose incorporating this uncertainty into the sampling design.  It could also be possible to include closely related taxa for groups in which the phylogeny is poorly understood, for example, deep-sea organisms.
The authors also stress the importance of including life-history traits and their spatio-temporal variability into the design of sampling effort, such as clonal reproduction.
Finally, they articulate the use of multiple and diverse markers, while pointing out the importance of moving away from the sole use of mitochondrial genes.
Pante E, Puillandre N, Viricel A, Arnaud-Haond S, Aurelle D, et al. (accepted) Species are hypotheses: avoid connectivity assessments based on pillars of sand. DOI: 10.1111/mec.13048

Posted in adaptation, community ecology, conservation, DNA barcoding, natural history, next generation sequencing, phylogenetics, population genetics, speciation, theory | 6 Comments

Recent Ancestry of the USA and the 100k Genome Project

Holiday presents for pop-gen enthusiasts come in the form of data – boatloads of it! The past two weeks saw the announcements of two neat studies that spell monumental steps toward our understanding of the genetics of mixed populations.

With a relatively recent migratory history, much of North America has been a mixture of peoples. While a lot of the ancestry analysis of North America has been anecdotal, a large scale study of the genetic make-up of the USA has yet to be conducted. In a recent study, Bryc at al., as a culmination of large scale genotyping from stocking-stuffers by 23andMe, fill in some of these blanks.

Mean European/Native American/Latino ancestry among 23andMe customers across North America. Image courtesy: http://www.cell.com/ajhg/ppt/S0002-9297(14)00476-5.ppt

Important conclusions from the study include a) greater variation in African ancestry among self-identified African-Americans, primarily Iberian ancestry among self-identified Latino-Americans, and localized (by state) variation in European ancestry across the USA, b) sex bias in ancestral composition, indicative of social contributors to genomic admixture, and c) larger correlation between self-identified ancestry and genomic ancestry than detected by previous studies.

The pipeline utilized in the study (termed “Ancestral Composition”) has been detailed in another study by Durand et al. In brief, the steps involved are (1) phasing high-density SNP chip genotype data, (2) identifying IBD (Identical By Descent, here used to represent phased genomic regions, with most SNP’s in the region being directly derived from the common ancestor) tracts, (3) assigning local ancestry to these IBD tracts using an SVM-based classifier.

Perhaps most importantly, however, our results reveal the impact of centuries of admixture in the US, thereby undermining the use of cultural labels that group individuals into discrete non-overlapping bins in biomedical contexts “which cannot be adequately represented by arbitrary ‘race/color’ categories.”

In other news, the NHS just announced plans to sequence 100,000 human genomes to quantify the dynamics of 110 hereditary disorders, including leukemia, breast, bowel, ovarian, and lung cancers. More data! 2015 definitely has a very promising outlook towards the applications of genomics in personalized medicine.

References:

Bryc, Katarzyna, et al. “The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States.” The American Journal of Human Genetics (2014). http://dx.doi.org/10.1016/j.ajhg.2014.11.010

Durand, Eric Y., et al. “Ancestry Composition: A Novel, Efficient Pipeline for Ancestry Deconvolution.” bioRxiv (2014): 010512. http://dx.doi.org/10.1101/010512

Posted in genomics, population genetics | Tagged , , | 1 Comment

Totally RAD, Part 2


Edit (8/20/15): I used the wrong web address for Kimberly Andrews! Go check out her work here. Sorry Kim!
Restriction site-associated DNA sequencing (RADseq) is quickly becoming the go-to methodology for collecting population genetic data, and the methodological difficulties of a technique that is exploding in popularity are coming along with it.
Last month, Stacy pointed you towards a review of RADseq protocols that detailed some methodological differences, but of course, there is always more detail out there. In the most recent issue of Molecular Ecology, Kimberly Andrews and colleagues provided a reply to the Puritz et al. paper, adding some additional clarity to the nuances that separate the different RADseq protocols.
Specifically, Andrews and colleagues go into more depth considering the consequences of PCR duplicates, the product of amplification biases during PCR.

The impact of PCR duplicates on population genomics analyses has not been quantified in the literature, but high frequencies of duplicates are expected to impact analyses by falsely increasing homozygosity and by making PCR errors appear to be true alleles (false alleles, Pompanon et al. 2005).

The simplest way to deal with this problem, as well as avoiding other issues of fragment size bias, is to make fragments different sizes from the beginning.

…the most straight-forward method currently developed for identifying RADseq PCR duplicates can only be used for data generated using methods that have a random-shearing step and also generate paired-end sequences (PE-RADseq). For these methods, PCR duplicates can be identified as fragments that are identical in insert length and sequence composition, because random shearing ensures that fragments at a given locus are unlikely to be of equal length unless they are duplicates

Unfortunately for RADseq protocols without a random-shearing step (which is most), there is currently no well-supported way to correct this issue. However, you can bet that there are a number of approaches in the works.
Lastly, the authors reiterate an important consideration for anyone who is considering RAD-seq data as an option for answering the scientific question of their choice: think hard about costs and technical complexity. Depending on whether you have the option to pool samples or not, resources devoted to a project can vary widely.
Welcome to the RAD fad. Better get a big cup of coffee, because you’ve got a lot of reading to do.
 
Andrews K.R., Michael R. Miller, Brian Hand, James E. Seeb & Gordon Luikart (2014). Trade-offs and utility of alternative RADseq methods, Molecular Ecology, n/a-n/a. DOI: http://dx.doi.org/10.1111/mec.12964
Pompanon F., Eva Bellemain & Pierre Taberlet (2005). Genotyping errors: causes, consequences and solutions, Nature Reviews Genetics, 6 (11) 847-846. DOI: http://dx.doi.org/10.1038/nrg1707

Posted in genomics, Molecular Ecology views, next generation sequencing, population genetics | Tagged , , | 1 Comment

Transcriptomics in the wild (populations)

modified book cover from "Wild" by Cheryl Strayed
The genomics revolution is coming has already come. The past decade has seen countless advances in genomic techniques – many of which are now commonly found in any molecular ecologist’s toolbox. For example, instead of measuring gene expression in one or a few genes using RT qPCR, we can now measure genome-wide transcriptional activity using microarrays and RNA-sequencing (‘RNA-seq’). The amount of data being generated using these techniques has been growing exponentially over the past few years. So, Mariano Alvarez and colleagues decided that it was as good time as any to take stock of the past decade of transcriptomics studies in the wild.
Continue reading

Posted in genomics, next generation sequencing | Tagged , , | 1 Comment

Hybrid speciation is for the birds (and plants, reptiles, fish, and insects)

The Italian sparrow

The Italian sparrow, Passer italiae, a hybrid species whose parentals are the house sparrow, Passer domesticus, and the Spanish sparrow, Passer hispaniolensis. Photo courtesy of Alessandro Landi


R. A. Fisher once called hybridization ‘‘the grossest blunder in sexual preference which we can conceive of an animal making.” While there may be negative fitness consequences for an individual who mates across species boundaries, the evolutionary significance of hybridization in speciation, introgression, and adaptive radiation is a fascinating question gaining research attention, particularly given the relative ease with which we can now collect genomic data.
Hybridization can lead to a reduction in biodiversity through “despeciation.” If we consider species to be distinct, relatively stable, genotypic clusters, it is easy to imagine that ecological or geographical change may facilitate gene flow sufficient to homogenize both species into one cluster if reproductive barriers are weak. Examples of species fusion include Darwin’s finches and cichlid fish.
In some cases, hybridization can lead to establishment of a new, third species, hence increasing biodiversity. Keeping with our definition of species as genotypic clusters, the hybrid species would be a third cluster of genotypes that remains distinct even when in contact with the parental species.
Continue reading

Posted in population genetics, speciation | Tagged | 2 Comments