Submit SARS-CoV-2 sequences

Add your SARS-CoV-2 sequence data to the growing public archive

GenBank

Submit assembled SARS CoV-2 sequences to GenBank and make your data available worldwide.

What You Should Expect

Submit SARS CoV-2 assembled sequence data on the web where it will be automatically assessed for quality and annotated for you with the viral annotation tool VADR.

SARS CoV-2 submissions are generally processed and released into GenBank within ~2 hours unless a specific hold date was requested.

Prepare the following information for your GenBank submission of SARS CoV-2 data:

  • General: contact details, authors, publication, data release date
  • Sequencing technology information
  • FASTA-formatted sequence
  • Source metadata table

You do not need to prepare feature annotation. This will be added for you.

Prepare your sequence(s) in the FASTA format that starts with a definition line, followed with a hard return and the sequence.

Sequence length must be 50 - 30,000 bases, vector-free, and <50% Ns, excluding terminal Ns which will be trimmed automatically.

The simplest definition line requires the “> “ symbol and a sequence_ID. Sequence_IDs should be less than 25 characters and unique within a submission.

Example:

>Seq1
CCTTTAT...
>Seq2
GGTAGGT...

Use only ASCII characters for your definition line and IUPAC codes for your sequences. Upload a FASTA file as a plain-text file (prepared with a text editor). The file may have one or more sequences.

Tab-delimited text file which must include: sequence_ID, isolate, collection-date, host, country. Other source metadata may be included in additional columns.

Required source metadata:

  • Country, in the correct format. (Example - USA: Maryland, Bethesda)
  • Collection-date in ISO format. Date must be between 2019 and the current date
  • Host. Use scientific name or common name, and provide any additional information after a semi-colon (Example - Homo sapiens; female, age 25); using a prokaryote as host will fail
  • Isolate. Can either be ICTV format OR the sample ID. If using the sample ID, we will build the proper ICTV format for you

Optional source metadata:

  • Isolation-source. Physical environment where sample was collected (Example - nasal swab)
  • BioProject Accession. BioProject must be owned by same group as submitting group.
  • BioSample Accession. BioSample must be owned by same group as submitting group. For your convenience and to collect more useful data for public surveillance, use the BioSample package "SARS-CoV-2 clinical or host-associated package".
  • SRA accession. SRA must be owned by same group as submitting group.

SARS CoV-2 submissions are generally processed and released into GenBank within ~2 hours unless a specific hold date was requested. When submitting multiple sequences, you will have an option to prioritize error-free sequences so that they can be released. Sequences with errors will be deleted from the current submission but you can submit them later after further review. You will receive a detailed error report on any sequences with errors. To prepare to submit, we recommend running VADR to check your data. See github.com/ncbi/vadr/wiki/Coronavirus-annotation

If common problems such as misassembly, frame shifts and internal stop codons are detected, error reports with explanations of the errors will be provided. Descriptions of the errors can be found at: ncbi.nlm.nih.gov/genbank/sequencecheck/virus/

If you believe errors are due to naturally occurring mutations in the virus, please send an email describing the evidence for the mutation to: gb-admin@ncbi.nlm.nih.gov.

More information on SARS CoV-2 submission to GenBank: submit.ncbi.nlm.nih.gov/genbank/help/#SARS-CoV-2-sequences.

If you would like to set up programmatic submissions, then download and read the documentation, review the example files, and then contact gb-admin@ncbi.nlm.nih.gov to discuss requirements for this method.

Submit your sequence data on desktop. The desktop view allows you to easily:

  • Enter your information
  • Enter or upload metadata
  • Upload large source files
  • Review your submission

Email me a link to get started

Submit

Benefits

  • Make your sequence data available in the International Nucleotide Sequence Database Collaboration (INSDC) for global use in COVID-19 response
  • Provide rich sample data specific for SARS-CoV-2 to aid in sequence analysis
  • Link assembled sequences with underlying SRA read data to promote scientific discovery
  • Ensure your data contribution is included in NCBI Virus, BLAST, and other resources
  • Follow FAIR data-sharing principles

Other Resources

Find and analyze SARS-CoV-2 sequence data, and related data.