Submit SARS-CoV-2 sequences

Add your SARS-CoV-2 sequence data to the growing public archive

Sequence Read Archive (SRA)

Submit unassembled SARS CoV-2 sequence reads to SRA and make your data available worldwide.

What You Should Expect

Prepare the following information for your submission and be ready to:

  • Provide a project name and description
  • Provide sample metadata that is unique by sample
  • Provide sequence metadata
  • Upload your files

For a new project, prepare the information that creates a BioProject.

Required information:

  • Title
  • Description

Optional information:

  • Participants
  • Grants: Required if your project was funded by a National Institutes of Health (NIH) grant

Important: The required BioProject and BioSample can be created during submission. You can also link an existing BioProject and BioSample with accession numbers.

For new samples, prepare the details that will serve as BioSamples' metadata for individual biological specimens (collection date, location, etc.).

  1. Select the "SARS-CoV-2 clinical or host-associated package" as your BioSample type. Each package has a distinct set of required attributes which you can preview here.
  2. Each sample must have a unique set of attributes. Provide all required fields and any optional fields that apply to your samples.
  3. Add custom attributes to fully describe your samples and facilitate searching. You should submit at least one unique data file for each sample you create.

Prepare the following 'Library' information:

  • Which BioSample should be linked to which file(s)
  • Your library construction protocol
  • Other metadata like unique library names, sequencing platform, and filetype.

Ensure the file name(s) you choose do not contain any sensitive information. File names as submitted appear publicly on the Google and AWS clouds. The SRA can screen your data for human contaminant sequences. If you would like your sequences screened then please set the release date for a week into the future and contact the SRA after your submission completes.

Upload data files for your SRA submission. There are several options available:

  • File upload via the web interface using HTTP or the Aspera Connect browser plugin
  • FTP and Aspera on the command line
  • Upload from Amazon S3 storage or Google Cloud Platform bucket

Please note that your uploaded file names should be an exact match to the file names listed on the 'Metadata' tab, including file extension.

If you plan to submit regularly, or submit a large volume of data, programmatic submission may be useful.

Start by contacting sra@ncbi.nlm.nih.gov with the subject "Center account creation for XML submissions." Please provide:

  • Suggested center abbreviation (16 char max)
  • Center name (full), center URL & mailing address (including country and postcode)
  • Phone number (main phone for center or lab)
  • Contact person (someone likely to remain at the location for an extended time)
  • Contact email (ideally a service account monitored by several people)
  • Whether you intend to submit via FTP or command line Aspera (ascp)

A test area and a production area will be created. Deposit the XML file and related data files into a directory and follow the instructions SRA provides via email to indicate when files are ready to trigger the pipeline.

Complete several successful tests before making a production submission. The SRA team will assist you.

Important: The submission.xml file can link to either an existing BioProject/BioSamples or register new BioProject/BioSamples. The XML file is broken up into “Action” blocks and each “Action” is an instruction to the database referenced inside the “Action” block. Simple “Action” block for SRA: https://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/submit/public-docs/sra/samples/sra.submission.run.xml?view=co

Submit your sequence data on desktop. The desktop view allows you to easily:

  • Enter your information
  • Enter or upload metadata
  • Upload large source files
  • Review your submission

Email me a link to get started

Submit

Benefits

  • Make your sequence data available in the International Nucleotide Sequence Database Collaboration (INSDC) for global use in COVID-19 response
  • Provide rich sample data specific for SARS-CoV-2 to aid in sequence analysis
  • Link assembled sequences with underlying SRA read data to promote scientific discovery
  • Ensure your data contribution is included in NCBI Virus, BLAST, and other resources
  • Follow FAIR data-sharing principles

Other Resources

Find and analyze SARS-CoV-2 sequence data, and related data.