Guidelines for Genomic Resources Notes

The arrival of reliable, high throughput sequencing technologies has led to the generation of many large-scale data sets, and Genomic Resources Notes are aimed at making these available to the broader community. The publication process will be as follows:

1) authors will submit a brief manuscript describing how the resource was developed and where the data can be accessed;

2) we will email you a Dryad upload link*;

3) any data or code from the paper not archived elsewhere should be uploaded to Dryad – this could include reference files, sequence assemblies and data analysis pipelines.

4) the manuscript and the associated data will be assessed by an editor;

5) once accepted, the availability of the GR Notes paper and its data will be announced by the publication of a summary article listing all the notes accepted in that two month period.

[*we are happy to consider GR Notes that archive their data on other reputable public repositories]

Inclusion in the summary article will provide a way for the datasets to be found through standard literature searches, and the data themselves will be available through public archives like NCBI’s Sequence Read Archive and Dryad. The accepted GR Notes manuscript itself will be included as a supplemental file on the summary article.

Here we give the information on how these papers should be put together. First of all, data that are eligible for storage on NCBI’s or EMBL’s Sequence Read Archive (or similar database) should be archived there. Next, as the majority of GR Notes papers will all contain similar information, and this needs to be accessed quickly and easily by readers, we have decided not to use the traditional journal article format. Instead, GR Notes will look more like a form, with fields related to each aspect of the study. Many of these fields correspond to the data required during submission to the Sequence Read Archive.

Authors can add their own fields whenever necessary, but please put n/a if some of the fields below do not apply. We’re also willing to consider a broad range of other genomic resources papers, even if they do not fit into the format below. Please contact the managing editor (managing.editor@molecol.com) if you have any questions regarding the preparation of GR Notes papers.


Title:

Authors:

Affiliations:


Introduction: briefly describe the rationale for collecting the data and the study goals

Data Access: give Genbank accession numbers, SRA project number, or Dryad DOIs where data associated with this paper can be found, along with a brief description of what data can be found in each place. For example:

  • NGS sequence data: NCBI SRA: SRX0110215
  • Reference file (.fasta file), Sequence assembly (.bam file), putative SNP data (.vcf file): DRYAD entry doi:10.5521/dryad.12311
  • Scripts, code (specify language) used in the assembly: DRYAD entry doi:10.5521/dryad.12311

Meta Information:

  • Sequencing center – where the sequencing was done
  • Platform and model – the sequencing platform used in the experiment.
  • Design Description- Describes the rationale, setup and goals of this experiment
  • Analysis type – {DNA, RNA, etc}
  • Run date – date (yyyy-mm-dd) when the run was produced

Library:

Some or all of this information can be presented in a table.

  • Strategy – Sequencing strategy used in the experiment
  • Taxon – source of material sequenced
  • Sex – {male, female, unknown}
  • Tissue – source of material sequenced
  • Location – geo-referenced sources of DNA samples where possible
  • Sample handling – identification of potential sources of contamination and how remedied
  • Additional sample information – Optional free form text further describing the sample.
  • Selection – Method of selection or enrichment used in the Experiment
  • Layout – Configuration of the read layout. Paired, Fragment, etc.
  • Library Construction Protocol- An area to give a description on the library construction techniques and reagents used from DNA extraction to sequence output. Unique or innovative techniques should be thoroughly described, e.g., use of adapters, linkers, bar codes, etc

If appropriate:

  • Nominal Size (paired)- Size of the insert for Paired reads.
  • Targeted loci – Set of loci to be selected for sequencing {16S RNA, exome} and associated probes.
  • Nominal Size – Size of the insert for Mate pair libraries.

Processing:

Pipeline: describe the software used with program and version for each step, including filtering of data. We strongly encourage authors to submit scripts to DRYAD or a similar stable public archive. Scripts should be carefully annotated.

Runs: describe the files that belong to specific experiments. Experiments may contain many Runs depending on how many sequencer runs were involved in data acquisition

  • Run data file type- The storage format (srf, sff, fastq, etc.) of the sequence data being submitted.
  • File Name- Name of the file transferred to external databases.

Results:

  • Total number of reads, number of reads after filtering, mean length, quality, number of aligned/assembled reads, etc. This is often most easily presented as a Table.
  • Quality scoring system {phred,phred+33,phred+64, log-odds}
  • Quality scoring ASCII character range {“!” to “J”, “@” to “h”}

If appropriate:

  • Mean / Median coverage per contig
  • Polymorphism rate
  • Annotation or Gene Ontology results.

Acknowledgments


References