Q&A: Julian Catchen helps us dig into STACKS – Part II

lego_timetrack_workweek
As promised, below is part II of our interview with Julian Catchen. These questions focus more on the specifics of using stacks (i.e., user-related questions). Please see the first post if you are interested a general overview. Even more information, including tutorials and a manual, can be found at the stacks website.  Without further ado…
What kind of data can Stacks accommodate? Is it better at accommodating certain types of data (e.g., Illumina vs. 454)?
Stacks is definitely optimized for Illumina data. The the amount of sequence the Illumina platform produces (150-200 million reads per lane) is ideal for multiplexing lots of samples together, while the types of errors produced are manageable using quality scores and Stacks’ likelihood-based SNP calling model. In principle, you can use 454 data, but despite its better length, its lack of sequencing depth makes it impractical and it is prohibitively expensive. We haven’t yet had a lot of experience with ION torrent data and I have heard that a few people are using MiSeq Illumina data with Stacks.
Continue reading

Posted in bioinformatics, genomics, howto, interview, methods | Tagged | Leave a comment

Q&A: Julian Catchen helps us dig into STACKS

Ready to dig into STACKS?


Julian Catchen is a post-doctoral researcher at the University of Oregon, where he uses computational solutions to facilitate the analysis of next-generation sequencing data. Prior to obtaining his PhD, Julian worked for both Intel and IBM, experiences that no doubt prepared him well for his future endeavors. One recent endeavor is the program STACKS. I have previously provided a general overview of the program here, but there is no better resource than Julian to answer a wide array of questions. In the first of this two-part series, Julian answers some general and overview-type questions. Later in the week, I’ll post the second part of the interview, which will focus on user-specific issues.

Can you briefly describe what the program Stacks was designed to do and who could benefit from using it?
Stacks is designed to handle short-read sequence data anchored to the genome at specific locations – reads that will stack together. This type of data is typically produced by digesting genomic DNA with a restriction enzyme, although newer protocols are experimenting with engineered constructs. Digesting DNA with a restriction enzyme is equivalent to sampling the genome every few kilobases and creates a reduced representation of the genome. The nice thing about this process is that it can be repeated in lots of different individuals returning nearly the same samples across everyone. And that’s where Stacks comes in – the software is designed to reconstruct the loci created by the restriction enzyme digestion in hundreds or thousands of individuals from multiple populations. The software will call SNPs within those loci and track the alleles segregating at each locus in all the populations.
Continue reading

Posted in bioinformatics, genomics, howto, interview, methods | Tagged | 2 Comments

What we're reading: Tapeworm genomes, population structure in rivers, and Mendelian pythons

Bookshelf
As we head into the weekend, here’s a few things we’ve noticed that might be worth your screen time.
In the journals
Tsai, I.J., Zarowiecki, M., Holroyd, N., Garciarrubio, A., Sanchez-Flores, A., Brooks, K.L., et al. 2013. The genomes of four tapeworm species reveal adaptations to parasitism. Nature doi: 10.1038/nature12031.

Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act.

Gaspar, J.M. & Thomas, W.K. 2013. Assessing the consequences of denoising marker-based metagenomic data. PLoS ONE 8: e60458. doi: 10.1371/journal.pone.0060458.

… we processed a real 16S rRNA metagenomic dataset through five denoising pipelines. By reconstituting the sequence reads at each stage of the pipelines, we determined how the reads were being altered.

Fourcade, Y., Chaput-Bardy, A., Secondi, J., Fleurant, C. & Lemaire, C. 2013. Is local selection so widespread in river organisms? Fractal geometry of river networks leads to high bias in outlier detection. Molecular Ecology 2065–2073. doi: 10.1111/mec.12158.

… using simulations we showed that FST outlier tests pro- vided a high rate of false-positives (up to 60%) in fractal environments such as river net- works. Surprisingly, the number of sampled demes was correlated with parameters of population genetic structure, such as the variance of FSTs, and hence strongly influenced the rate of outliers.

In the news
The trouble with Big Data is that big != useful.
Round and wrinkly peas got your genetics students snoozing? Try teaching Mendelian inheritance with ball pythons.

Posted in linkfest | Leave a comment

Our first Genomic Resources Note

We recently laid out the guidelines for our new article type, Genomic Resources Notes. Since it’s a little hard to visualise what they should look like, we’ve made the first accepted GR Note available here. We think there are a lot of NGS datasets out there that could be published in this category, so hopefully there will be many more of these in the future.
Since the paper is about sunflowers, here’s a picture of Helianthus anomalus (pic by L Rieseberg):
Helianthus anomalus

Posted in Molecular Ecology, the journal | Leave a comment

Into the Field

A great migration is soon upon us.  I’m not talking about wildebeest, caribou, bar-headed geese, sandhill cranes, or any other of these amazing migratory feats.

WildebeestMigration

Wildebeest and zebra crossing the Mara River, Kenya. Photo by Flickr user fveronesi1


Continue reading

Posted in howto | Tagged | 6 Comments

What we're reading: admixed cattle, evolution in response to harvesting, and genetics-targeted advertising

Reading

As we head into the weekend, here’s a few things we’ve noticed that might be worth your screen-time.
In the journals
McTavish, E.J., Decker, J.E., Schnabel, R.D., Taylor, J.F. & Hillis, D.M. 2013. New World cattle show ancestry from multiple independent domestication events. Proceedings of the National Academy of Sciences, doi: 10.1073/pnas.1303367110.

In this study, we show that, although European cattle are largely descended from the taurine lineage, gene flow from African cattle (partially of indicine origin) contributed substantial genomic components to both southern European cattle breeds and their New World descendants.

Van Wijk, S.J., Taylor, M.I., Creer, S., Dreyer, C., Rodrigues, F.M., Ramnarine, I.W., et al. 2013. Experimental harvesting of fish populations drives genetically based shifts in body size and maturation. Frontiers in Ecology and the Environment, doi: 10.1890/120229.

Here, we quantify genetic versus environmental change in response to size-selective harvesting for small and large body size in guppies (Poecilia reticulata) across three generations of selection. We document for the first time significant changes at individual genetic loci, some of which have previously been associated with body size.

Zhan, X., Pan, S., Wang, Junyi, Dixon, A., He, J., Muller, M.G., et al. 2013. Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle. Nature Genetics, doi: 10.1038/ng.2588.

Analysis of 8,424 orthologs in both  falcons, chicken, zebra finch and turkey identified consistent  evidence for genome-wide rapid evolution in these raptors.

In the news
A new startup proposes to target advertising using genetic information. (What could possibly be wrong with that?)
Our co-blogger Tim Vines takes the “post-publication review” at F1000 Research to the metaphorical woodshed.
In a sort of followup to our most recent Molecular ecology view, here’s a video about surveying salmon spawning with a “hexacopter” drone.

Posted in linkfest | Leave a comment

Molecular ecology views: It's a bird, it's a plane … it's a UAV

From the Laboratory of Geographic Information Systems (LASIG) – Landscape Genetics Group – at the Ecole Polytechnique Federale de Lausanne (EPFL), Stephane Joost sends along his view of molecular ecology—from high altitude. Joost’s group applies geographic information systems (GIS) in conservation and landscape genetics. He’s sent photos of an unmanned aerial vehicle, or UAV, used to collect Very High Resolution Digital Elevation Models, which have a spatial resolution of 50cm!

On the basis of this VHR DEM we derive environmental variables (solar radiation, wetness indices, etc.). These variables are then used in models to assess their association level with the frequency of AFLP markers (landscape genomics), to identify genomic regions possibly under natural selection.
Here our goal is to discover which genes are underlying local adaptation to differential radiation regimes in the Buckler Mustard, and what is their genetic architecture.

If you have photos of your own molecular ecology in action that you’d like to share, please send them our way!
[imagebrowser id=6]

Posted in Molecular Ecology views | Tagged , , , | 2 Comments

Molecular ecology views: Track a pika by its hair

Via the MolecularEcologistView tag on Flickr, Philippe Henry sends images of his doctoral dissertation work on American pika (Ochotona princeps) in the central Coast Mountains of British Colubmia. To understand the pikas’ population genetic structure, he captured DNA samples using “hair snares” of sticky tape.
If you have photos of your own molecular ecology in action that you’d like to share, please send them our way!
[imagebrowser id=5]

Posted in Molecular Ecology views | Tagged , , | 1 Comment

Phylogeny-aware comparisons of microbial communities – EdgePCA and Squash Clustering

I’m jumping on the bandwagon with a blog post about this new PLoS ONE paper (taking the lead from the man in charge in my lab) because the algorithms are just so exciting:
Matsen FA IV, Evans SN. (2013) Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison. PLoS ONE. 8(3):e56859.
Our lab works closely with Erick Matsen and his group at the FHCRC – we’ve implemented their software for phylogenetic placement and community comparisons of short read data (pplacer and guppy) into our in-house pipeline for phylogenetic analysis of environmental metagenomes (PhyloSift). The two algorithms they discuss in this new PLOS ONE paper, Edge PCA and Squash Clustring, are implemented within the guppy software package. I can vouch for the usability of the Matsen group’s software – it is well documented and typically pretty easy to install, so I suggest you try it out if you’re as excited as I am by the work described in the above-mentioned paper.
Now for the good stuff – what do EdgePCA and Squash Clustering do? Conceptually, they represent alternatives to traditional PCoA/MDS analysis and UPGMA clustering, respectively. The UniFrac algorithm (as implemented in QIIME) currently represents the default approach for carrying out these traditional ecological analyses on high-throughput rRNA amplicon datasets. However, although UniFrac uses a phylogenetic tree as input, it is still fundamentally a distance-based metric:
Continue reading

Posted in bioinformatics, genomics, next generation sequencing, software | Tagged , , , , , , , | 1 Comment

What we're reading: isolation with migration, starch-eating dogs, and politicized science funding

Bookshelf
As we head into the weekend, here’s a few things we’ve noticed that might be worth your screen time.
In the journals
Mailund, T., Halager, A.E., Westergaard, M., Dutheil, J.Y., Munch, K., Andersen, L.N., et al. 2012. A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS genetics 8: e1003125. doi: 10.1371/journal.pgen.1003125.

We provide a test for whether divergence is gradual or instantaneous, and we apply the model to three key divergence processes in great apes: (a) the bonobo and common chimpanzee, (b) the eastern and western gorilla, and (c) the Sumatran and Bornean orang-utan. We find that the bonobo and chimpanzee appear to have undergone a clear split, whereas the divergence processes of the gorilla and orang-utan species occurred over several hundred thousands years with gene flow stopping quite recently. We also apply the model to the Homo/Pan speciation event and find that the most likely scenario involves an extended period of gene flow during speciation.

Axelsson, E., Ratnakumar, A., Arendt, M.-L., Maqbool, K., Webster, M.T., Perloski, M., et al. 2013. The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature, doi: 10.1038/nature11837.

The results presented here demonstrate a striking case of parallel evolution whereby the benefits of coping with an increasingly starch- rich diet during the agricultural revolution caused similar adaptive responses in dog and human.

Cromie, G.A., Hyma, K.E., Ludlow, C.L., Garmendia-torres, C., Teresa, L., May, P., et al. n.d. Genomic sequence diversity and population structure of Saccharomyces cerevisiae assessed by RAD-seq. arXiv: 1303.4835.

Here, we apply a multiplexed, reduced genome sequencing strategy (known as RAD- seq) to genotype a large collection of S. cerevisiae strains, isolated from a wide range of geographical locations and environmental niches. The method permits the sequencing of the same 1% of all genomes, producing a multiple sequence alignment of 116,880 bases across 262 strains.

In the news
The U.S. Senate voted this week to forbid the National Science Foundation from funding political science.
Nature has a cool article about long-term scientific experiments.
Concerning the decision to send a paper to PLoS ONE.
A new, more complete version of the Neanderthal genome has just been released.
The SMBE satellite meeting on “Eukaryotic-omics” at UC Davis, which Holly Bik is organizing, is coming up soon—the extended deadline for abstracts is today!

Posted in linkfest | Leave a comment