What is the weight of a transcriptome? How about a thousand? Every day new sequencing machines are purring away, base pair by base pair, producing novel insights into the genomes of our favorite organisms. As technology improves, costs come down, and opportunities for “Big data” to have an impact on the most non-model of non-model species occur more often. Soon, even the darkest corners of the Eukaryotic branch of life, thought left only to the most artisanal of biological research, will benefit from proximity to annotated genomic data. Towards the end of 2019, the One Thousand Plant Transcriptome Initiative (1KP), a consortium of almost 200 researchers from around the world, released their capstone paper in the Journal Nature. In that paper, they detail the summary analyses of transcriptome data from 1,124 species across Archaeplastida (Green Plants, Glaucophytes, and Red Algae), spanning over a billion years of evolutionary time
For the uninitiated, a transcriptome is comprised of all RNA or mRNA molecules from a cell or group of cells. When the sample is collected, you retain a snap shot of the genes being expressed in those cells at that moment. Often this data is used for highly controlled expression studies that aim to measure genomic response to a stimulus, developmental stage, or tissue specific expression. The high number of genes captured, the relative low cost compared to long read genomic sequencing, and the versatile quality of the data, make transcriptomes a great source for phylogenomic inference. Great, but not perfect. For example, genes that we know are present across all land plants may go unexpressed in one sample by chance, resulting in that gene being excluded from gene tree inference. The sheer number of loci collected ultimately allows for stringent filtering of these cases, resulting still in a sizable phylogenomic dataset, even at evolutionary depths seen in this study. As plants left the sea to colonize land certain traits were necessary for survival.
As the 1KP consortium describes, “Multicellularity and the development of the plant cuticle, protected embryos, stomata, vascular tissue, roots, ovules and seed, and flowers and fruit”. These innovations required the raw material for evolutionary novelty, new gene duplicates that could be mutated for novel functions and interactions. To that end, their analyses identified a wave of gene family expansions that occurred right before the transition on to land. Although polyploidy, or whole genome duplication events were described to have occurred throughout the flowering plant and fern clades, this initial gene expansion was not directly associated with polyploidy. These events, coupled with patterns of reticulation, have shaped the complex history of plant evolution. Despite the level of sequencing data available, many of the tested relationships between taxa are left unresolved, though several polytomy topologies were rejected.
Arguably, the true weight of this data set is not in the findings presented in these capstone analyses but in the dataset itself and the tools developed to analyze it. Like that of SOAPdenovo-trans, a de novo assembler for transcriptome sequences was designed specifically for this effort, but has found broad popularity. Many studies have already benefited from early access to these transcriptomes sequences, but even if you don’t know where to start with analyzing gene expression or phylogenomic patterns, this dataset can still hold value. Available transcriptomes can be used to improve molecular studies for systems that previously were limited in molecular resources. For example, transcriptome sequences can be used to ease the development of nuclear markers like single sequence repeats (SSR). While a popular tool, SSR development can be costly and time consuming.
Access to a well annotated transcriptome sequence with your genes, and in some cases at broader phylogenic depths, can make the prospect of designing SSR much more achievable. This is just the beginning, the 10KP project announced in 2018, plans to sequence complete genomes from more than 10,000 plants and protists to address fundamental questions in plant diversity. As the bar of access to molecular data continues to drop, the future of research into uncovering life’s broad diversity will only continue to rise.
One Thousand Plant Transcriptomes Initiative. (2019). One thousand plant transcriptomes and the phylogenomics of green plants. Nature, 574(7780), 679. doi:10.1038/s41586-019-1693-2
Edmunds, S. (2019, October, 24). Harvesting the final fruits of the plant tree of life. Retrieved from http://gigasciencejournal.com/blog/1kp-capstone-data/
Lopez, L., Wolf, E. M., Pires, J. C., Edger, P. P., & Koch, M. A. (2017). Molecular resources from transcriptomes in the Brassicaceae family. Frontiers in plant science, 8, 1488. doi:10.3389/fpls.2017.01488