There are a lot of data out there, and if you haven’t already noticed the ‘omics train has steadily stayed its path through the fruitful (but challenging) world of metagenomics. Metagenomics offers the chance to unravel complex microbial communities without the need for individual cell sequencing or the isolation and cultivation of each and every member of the uncultured majority.
If you think of deciphering metagenomes as trying to complete wildly complex puzzles that have all been jumbled together into one box, (as Sieber and colleagues suggested recently) it makes sense that it would be hard to sort them all out. This would be challenging because you (a) don’t know what the final pictures should be (b) aren’t sure how many full puzzles are in the box, and (b) are pretty dang certain there are pieces missing. To make pulling out pieces of interest (sequences) and completing one entire puzzle (genome) more feasible, bioinformatic tools have been developed. These tools put contigs from a metagenome into bins, which essentially means putting all the contigs that seem to be from one thing together, making it possible to eventually assemble (completely…or mostly…or partly) entire genomes.
Sieber et al., recognized the importance of good binning, otherwise you might completely miss sections of genomes, or assign the wrong sequences to the wrong bin. They tested a variety of methods across multiple ecosystems, characterized by different levels of complexity. They found that no single binning method could really manage all the diverse ecosystems they tested and also found variation in the quality of results depending on method and data set.
The authors took this opportunity to develop a new strategy for genome binning, and essentially decided to combine results from multiple algorithms. The tool that they developed is called the dereplication, aggregation and scoring tool (DAS Tool), which is “an automated method that integrates a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly”.
Thinking about integrating multiple algorithms, and checking various combinations to see which ones appear to yield the highest quality bins, makes sense. As the authors note, tests are utilized to determine how complete the bins are and how contaminated they might be by incorrect contigs, which is figured out based on the frequency of single-copy marker genes.
“an automated method that integrates a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly”.
To test how well DAS Tool could get the job done, they first used it on three different data sets, which were assemblies of simulated microbial communities of increasing complexity (ranging from 40 genomes, to 132, to 596). The authors ran these data through 5 different binning tools, finally combining all the outputs with DAS Tool. They found that DAS Tool “reconstructs a higher number of high-quality genomes and resolves strain variation better than any of the individual tools” that they used across all three data sets.
The testing didn’t stop there, they also used DAS Tool to process previously published metagenomic data from a geyser and by combining the output from either 3 or 7 different binning methods. Again, DAS Tool managed to reconstruct more draft genomes than previously obtained when drawing from the results of 7 methods. Finally, they used DAS Tool to on multiple shotgun metagenomic data sets that also ranged from low to high levels of complexity.
The conclusion of all of these tests? Sieber and colleagues found that DAS Tool almost always pulled out more genomes from complex samples than any single binning tool could manage alone. They highlight that this method is a great way to combine outputs from a suite of current binning methods to obtain cleaner and more complete results. With all the many metagenomes available, and the many many (many) more most likely on the way, it might turn out that DAS Tool could be sehr gut.
Sieber, C.M., Probst, A.J., Sharrar, A., Thomas, B.C., Hess, M., Tringe, S.G. and Banfield, J.F., 2018. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nature microbiology, p.1.