When your programming may be inadequate to the task: new options for metagenome analysis

Big Data, JD Hancock photos

Big Data, JD Hancock photos

There’s a lot of data in the form of metagenomes out there, and picking apart those mountains of data to uncover meaningful results is difficult. Recently, we received a suggestion from a reader to discuss a recent program (CLARK-S) developed to quickly and precisely classify short reads from metagenomic data sets. As a participant this July in the EarthCube Oceanography Geobiology Environment Omics (ECOGEO) workshop focused on training participants on the use of metagenomic tools, I figured I’d go for it. EarthCube was started by NSF with the goal of improving access to and analysis of all sorts of geoscience-related data, and if you’re interested, you can access the lectures and training material we used at ECOGEO here.
There are multiple programs available to help you sort through the oceans (quite literally) of metagenomic data available, and it seems like daily there’s another option for analysis, and basically, they all also have an acronym (seems like us biologists just love them!) I’ve had a chance to highlight a few recent tools already, here and here.

In an attempt to find a comprehensive list of potential tools for analysis, I stumbled down a rabbit hole of sites, discussion boards, and papers summarizing “current metagenomic analysis options”. However, many of the articles were a few years old, although some did present helpful and interesting evaluations and reviews of recent tools (Lindgreen et al., 2016), and one review concerning human microbiome analysis just came out yesterday (Cui et al., 2016), what timing!
OMICtoolsEventually, I came across OMICtools with an extensive list of taxonomy dependent metagenomic analysis tools available, including short descriptions, links to the relevant websites for download, and related publications. While the original article presenting OMICtools by Henry and colleagues came out in 2014, the consortium of collaborators running the site continually adds new options to their online toolbox, and you can submit new tools to add to the list.

Thompson et al., 2016. Figure 4.

With another method for analysis seemingly published every other day, it’s difficult to keep that list current, and I’m going to take a second now to highlight a few recent options. The CLAssifier based on Reduced K-mers (CLARK) was introduced in 2015, however, Ounit and Lonardi have already one-upped upped themselves and published CLARK-S, this August. The goal of CLARK-S is to quickly and accurately classify more extensive proportions of datasets than was previously possible. The authors used publically available datasets from the NY City subway to test their software and compare their results to the original article assessing the subway data. Ultimately, the authors found that while CLARK-S might take a bit longer than some of its contemporaries, it was able to identify more 10% reads than Kraken (a comparable tool) and 27% more than its predecessor CLARK. An additional recent study by Thompson et al., also used CLARK / CLARK-S in an assessment of 45 samples from the Red Sea.

Quince et al., 2016. Figure 1

Another very hot off the press tool for the de novo resolution of fine-scale variation was just published online a few days ago by Quince and colleagues. The authors present a new software for the De novo Extraction of Strains from MetAgeNomes (DESMAN), to identify previously overlooked variants of core genomes obtained from metagenomic datasets. The software then utilizes these variants to assess the overall fine-scale diversity in metagenomic datasets. To test DESMAN, they analyzed a both simulated microbial community, as well as real, previously published data.
a desman, courtesy of wikipedia

a desman, courtesy of wikipedia

Interestingly, they were able to identify strain level variation, and use it for evidence of environmental niche assignment. And also interestingly, a “desman” is a type of European mole…I think we know who the logo for this software should include.
Einstein might have been right when he said “Information is not knowledge“, but recent advances in analysis have allowed us to draw more informed conclusions from the abundance of data currently available. One way or another, we’re boldly going where no one has gone…before.
Cui, Hongfei, Yingxue Li, and Xuegong Zhang. An overview of major metagenomic studies on human microbiomes in health and disease. Quantitative Biology: 1-15. (2016)
Henry, V.J., Bandrowski, A.E., Pepin, A.S., Gonzalez, B.J. and Desfeux, A. OMICtools: an informative directory for multi-omic data analysis. Database, 2014, p.bau069. (2014)
Lindgreen, Stinus, Karen L. Adair, and Paul P. Gardner. An evaluation of the accuracy and speed of metagenome analysis tools. Scientific reports 6 (2016)
Ounit, Rachid, Steve Wanamaker, Timothy J. Close, and Stefano Lonardi. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC genomics 16, no. 1 (2015): 1.
Ounit, Rachid, and Stefano Lonardi. Higher classification sensitivity of short metagenomic reads with CLARK-S. bioRxiv (2016): 053462.
Thompson, Luke R., Gareth J. Williams, Mohamed F. Haroon, Ahmed Shibl, Peter Larsen, Joshua Shorenstein, Rob Knight, and Ulrich Stingl. Metagenomic covariation along densely sampled environmental gradients in the Red Sea. bioRxiv (2016): 055012.
Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology (2014), 15:R46.

This entry was posted in bioinformatics, community ecology, metagenomics, methods and tagged , , , , , . Bookmark the permalink.