Submission Portal

Submit to the world's largest public repository of biological and scientific information

Genome (WGS or complete)

Use this tool to submit complete or incomplete prokaryotic and eukaryotic genome assemblies. Include plasmid and organellar sequences with the genome submission.

Not for viral, phage, or single locus sequences (for example: 16S rRNA). Submit those to regular GenBank.

New NCBI has a Foreign Contamination Screen (FCS) tool suite available in GitHub and now running on new submissions to help improve the quality of your genome submissions. See NCBI Insights and our preprint for more details.

What You Should Expect

When you submit, you will need to:

  1. Choose either: Single or Batch or Haplotypes. The genomes of a batch or haplotype submissions must have some common details.
  2. Provide a BioProject and BioSample. You may have registered these in advance OR you may create these within your genome submission. However, if you are submitting with annotation, please register these in advance.
  3. Fill out metadata on the sequencing and assembly of the genome.
  4. Provide chromosome, plasmid, and organelle assignments, if known, as explained in the single/batch/haplotype documentation (see links above).
  5. Indicate what the Ns in the sequences represent. The defaults in the form are the most common.
    Note: 10 or more Ns in a row are always called a gap when genome assembly statistics are calculated.
  6. Upload your file(s).

Each sequence in the genome submission must be at least 200 base pairs. Sequences cannot be randomly concatenated.

All genomes within a batch must be:

  • Part of the same BioProject, except for the pseudohaplotypes of a diploid genome assembly
  • Either WGS or non-wgs, not a mix of both types
  • Just a single layer (no AGP files)

All genomes within a batch must also have or contain:

  • Same (initial) release date
  • Same gap/Ns information
  • Either fasta files or ASN ( .sqn ) files, not a mix of file types. FASTA files recommended unless the submission includes annotation or the Genome-Assembly-Data structured comment
  • Single file for each genome, including any plasmid or organelle sequences
  • Separate file for each genome, not all the genomes together
  • Request for PGAP annotation or not (only relevant for prokaryotic genomes)

Pseudohaplotypes of a diploid/polyploid assembly have the restrictions of Batch/Multiple (see above) and must also:

  • Use the BioSample of the individual that was sequenced
  • Have their own BioProjects which can be created during this submission and will be connected by an Umbrella BioProject
  • Have one of the pseudohaplotypes asserted by you as the primary/principal assembly

Annotation is optional here. During submission, you can request to have prokaryotic genomes annotated by NCBI’s Prokaryotic Genome Annotation Pipeline (PGAP).

If you decide to submit a genome with annotation, it must contain the locus tag prefix generated for you so that your genes are uniquely identifiable. To receive the locus tag prefix:

  1. Register your BioProject and write down your BioProject accession.
    Note: make sure that the option 'autogenerate locus tag prefix' is selected
  2. Register your samples in BioSample and enter your BioProject accession into the sample metadata.
  3. Proceed to do the Genome submission

Should you not receive the autogenerated locus tag prefix, please email genomes@ncbi.nlm.nih.gov

BioProject and BioSample should be registered during the Genome submission unless you are submitting with annotation.

BioProject

The BioProject contains the description of the research effort, relevant grant(s), and has links to the public data. A genome must belong to a BioProject, and genomes sequenced as part of the same research effort can belong to a single BioProject. Use the same BioProject for the sequence reads and genome assembly made from those reads; do not create duplicate BioProjects.

BioSample

The BioSample contains the source information of the sample sequenced. Use the same BioSample for the sequence reads and genome assembly made from those reads; do not create duplicate BioSamples.

Please prepare the following information to submit as metadata:

  • Assembly method: name of the assembly algorithm(s)
  • Assembly method version or date
  • Assembly name (Optional)
  • Genome coverage
  • Sequencing technology or technologies
  • Full or Partial Genome in the sample
  • Reference genome if it is not a de novo assembly
  • Update: accession of the genome being updated, when appropriate
  • bacteria_available_from

Next, you will upload your data files. If there is no annotation, you can upload a FASTA file. If there is annotation, you will need to create a .sqn file and submit that. Learn more about data files.

There are two options available:

  1. File upload via the web interface using HTTP or Aspera Connect browser plugin
  2. FTP and Aspera on the command line

Please note that in a batch submission your uploaded filenames must exactly match the filenames provided on the 'Genome info' tab, including file extension.

Submit your sequence data on desktop. The desktop view allows you to easily:

  • Enter your information
  • Enter or upload metadata
  • Upload large source files
  • Review your submission

Email me a link to get started

Submit

Genomes FAQ

  • If you choose the single genome option, you will be prompted in the forms to provide information on which sequences belong to chromosomes, plasmids or organelles. If you choose the batch option, you should include this information in the FASTA headers. Additional requirements: https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#batch_assignment

  • You will need to confirm that you did not randomly merge the sequences into a single sequence and provide details on the approach you took. The default in the submission form is that 10 Ns in a row represent a gap and that “paired-ends” is the evidence that the sequences on either side of the gap are linked. If this is incorrect, then make the appropriate selections for your genome.

GenBank

GenBank is the world's largest nucleotide archive containing sequences from all branches of life. The archive is a foundation for medical and biological discovery.

  • Submit assembled SARS-CoV-2, Influenza, Norovirus, Dengue virus, rRNA, rRNA-ITS, metazoan COX1, Eukaryotic nuclear mRNA sequences.

  • Submit genomic DNA, organelle, ncRNA, plasmids, other viruses, phages, mRNA, synthetic constructs.

  • Submit assembled eukaryotic and prokaryotic genomes (WGS or Complete).

Sequence Read Archive (SRA)

SRA is the largest publicly-available repository of high throughput sequencing data. The archive accepts data from all branches of life as well as metagenomic and environmental surveys.

Other Tools

  • TSA

    Submit computationally assembled, transcribed RNA sequences after submitting unassembled reads to SRA. Learn more

  • GEO

    Submit RNA-seq, ChIP-seq, and other types of gene expression and epigenomics datasets. Learn more

  • BioProject & BioSample

    Choose a tool above if submitting sequence data. Learn more

Medical Genetics & Variation Tools

Submit clinical data, small & large human genomics variants, and genotype & phenotype data.

Other Resources