Retrieving a million sequences and avoiding primer bias, a new method that might have it all

We have come a long way since the early days when sequencing was a breakthrough method initially used to identify uncultured microbes from the environment. It is now been almost three decades, in fact, since the first microbial 16S rRNA gene sequences were reported directly from environmental samples. As Soren M Karst and colleagues point out in a recent article in Nature Biotechnology, we are now in an era where sequencing has become an integral part of microbial ecology centered research.

Since it is well conserved, the 16S rRNA gene is a great marker to characterize microbial diversity. Using sequence data, it’s helpful to be able to characterize microbes of what might be deemed the same species together (getting into what defines a bacterial species is not what I am going to jump into here). The 16S is often used to make phylogenetic trees, allowing for the visualization of how different microbial groups or lineages are related to one another.
Let’s take a step back. Studies using 16S rRNA sequences depend on databases filled with reference sequences, in order to identify whatever their 16S rRNA samples are. Reference sequences are most helpful when they are the entire length of the 16S rRNA sequence (around 1,400 – 1900 base pairs), however, it’s not cheap to sequence anything this length. At the same time, sequencing has proven that we have only identified a fraction of the microbial diversity on this planet. Standard methods used to sequence the 16S from environmental samples rely on using primers, which were developed based on what was previously known. It’s hard to know what we don’t know, especially if we only have the tools to fish for what we already expect.

Figure 1 from Soren et al., Overall scheme of newly developed method.

Soren et al., present a new method involving poly(A)-tailing, reverse transcription of SSU rRNA molecules, and synthetic long-read sequencing to get around the primer bias while retaining high throughput capabilities. More base pairs for your buck, so it seems.
Ultimately, the authors obtained around 1.6 million primer-free 16S gene sequences from multiple environments (including fresh water, soil and sediment, the human gut, and anerobic digester sludge…yum). Some of the striking results from this study include the sheer number of sequences obtained for the Archaea, which was 61,266 16S rRNA sequences, more than are currently available in the SILVA database.

Figure 2 from Soren et al., Maximum-likelihood phylogenetic tree, showing coverage of the tree of life.

While we are smack dab in the ever evolving world of ‘omics, and although it is still important to obtain whole genome sequences from isolates, fostering ways to enhance our level of knowledge concerning global microbial diversity is also essential.

Figure 3 from Soren et al., A maximum-likelihood phylogenetic tree revealing OTUs clustered at 97% similarity.

In the study by Soren and colleagues, an amazing abundance of representative 16S rRNA genes were sequenced. The sequences obtained were found to represent diverse groups including the Candidate Phyla Radiation (CPR) and the Asgard Archaea superphylum.(Yes, that’s a thing). These Archaea might, in fact, represent ancestors of Eukaryotes, so understanding more about their diversity and evolution has broad implications. Overall, this study is pretty amazing, and an incredible number of new sequences were obtained across the domains of life.
There are seemingly new methods and programs available weekly (maybe even daily it seems), and it will be interesting to see if this technique becomes a new standard to assess the diversity microbial communities as well as curate more informative databases. Perhaps Soren et al have unlocked the way so that we can now uncover and understand more of the vastly diverse microbial landscape in which we live.
Karst, S.M., Dueholm, M.S., McIlroy, S.J., Kirkegaard, R.H., Nielsen, P.H. and Albertsen, M., 2018. Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences without primer bias. Nature biotechnology.

This entry was posted in bioinformatics, evolution, metagenomics, methods, microbiology, next generation sequencing and tagged , , . Bookmark the permalink.