Many molecular ecologists spend weeks or even months in the field each year collecting samples for molecular analysis. Even before each sample reaches a molecular lab, it is characterized by important information that describes the what, where and when for that sample, that is the collection metadata. While granting agencies and publishers enforce data accessibility and open access requirements for genetic data, and there are huge public repositories dedicated to archiving these data, these repositories do not require or provide an easy way to archive the vital metadata that accompanies each DNA sequence (or file of massively parallel sequences) or genotype. Basic information such as latitude and longitude, date of capture, and habitat are often lost or attached in an ad hoc manner, such as supplementary files held by the publisher or deposited in a repository such as Dryad.
GeOMe, the Genomic Observatories Metadatabase, fills this “metadata gap” and makes archiving sample metadata easy. GeOMe links collection metadata to publically available genetic data in the NCBI Sequence Read Archive (SRA), providing a user-friendly search portal of contextual information that until now has been missing from widely shared databases. We are not re-inventing the wheel with respect to storage of the genetic data and these data remain BLAST-able. The GeOMe database thus enables a “genomic observatory” model of data collection, wherein individual researchers and institutions can easily share and reuse data in a global pool.
With GeOMe, researchers are able to find and access genetic data collected at specific times and places anywhere in the world, enabling them to ask big questions about the structure and sustainability of life on the planet. Future researchers might investigate how the inhabitants of a specific altitude throughout the world have shifted as our planet’s climate has changed, for example, or assess the stability of microbial communities facing increasingly acidic marine environments. Publically archiving these metadata is essential to ensure scientific reproducibility and synthesis as well to maximize potential re-use of sequence data as new techniques develop. GeOMe is a bottom-up effort with buy-in from over 50 laboratories. The database is growing and adding new capacity while also setting the industry standard for metadata publication.
The accompanying videos give a brief overview to GeOMe’s functionality. A full description of GeOMe can be found in PLOS Biology.
Search and Download data
Deck, J, MR Gaither, R Ewing, CE Bird, N Davies, C Meyer, C Riginos, RJ Toonen, and ED Crandall. 2017. The Genomic Observatories Metadatabase (GeOMe): A new repository for field and sampling event metadata associated with genetic samples. PLOS Biology, 15(8), e2002925. doi: 10.1371/journal.pbio.2002925