Submission Portal

Submit to the world's largest public repository of biological and scientific information

Genome (Prokaryotic and Eukaryotic)

Use this tool to submit prokaryotic and eukaryotic genome assemblies. Include plasmid and organellar sequences with the genome submission.

Not for viral, phage, or single locus sequences (for example: 16S rRNA). Submit those to regular GenBank.

New We recommend that you download and run NCBI’s new Foreign Contamination Screen (FCS) tool before submitting your genomes, to reduce the number of after-submission corrections and improve the quality of your genomes. See NCBI Insights and the FCS publication for more details.

What You Should Expect

Overview

These are the submission instructions for prokaryotic and eukaryotic genome submissions.

  • If you are submitting a viral genome, please see your submission options.
  • Do not submit organelle genomes here unless as part of a genome assembly. If you are only submitting organelle genomes, submit them in BankIt.
  • MAGs: To submit prokaryotic or eukaryotic Metagenome-assembled Genomes (MAGs), see the MAG instructions.
Detailed Instructions General Information

Each sequence in the genome submission must be at least 200 base pairs and less than the technical limit of 2,147,483,000 base pairs (roughly 2^31). Write to genomes@ncbi.nlm.nih.gov if your genome assembly includes sequences longer than that limit.

Sequences cannot be randomly concatenated.

Submissions have either FASTA files or ASN ( .sqn ) file formats, not a mix of file types.

  • Choose the FASTA format (.fsa) if you do not have feature annotations
  • Choose the ASN format (.sqn) if you want to include feature annotation or the Genome-Assembly-Data structured comment

If you have assignment information:

Prokaryotic Genomes Annotation Pipeline (PGAP)

Users have the optional opportunity to request PGAP (only relevant for prokaryotic genomes).

Frequently Asked Questions

See Genome‘s Frequently Asked Questions page.

Submission Type

There are three types of Genome submissions: Single, Batch, and Pseudohaplotypes/Haplotypes.

Single Submission Batch Submission

All genomes within a batch must:

  • Have the same BioProject
  • Be either WGS or non-WGS, not a mix of both types
  • Have the same (initial) release date
  • Have the same gap/Ns information
  • Have either FASTA files or ASN ( .sqn ) files, not a mix of file types
    • Choose FASTA files if you do not have feature annotations
    • Choose ASN files if you want to include feature annotation or the Genome-Assembly-Data structured comment
  • One file for each genome
    • One file for all the sequences of that genome
    • For example, a batch submission of 2 genomes would have 2 files
Pseudohaplotypes/Haplotypes Submission

The haplotypes of a particular individual genome must:

  • Meet the same restrictions as Batch submissions listed above, except each must have their own BioProject that will be connected by an Umbrella BioProject. These BioProjects can be automatically created during the genome submission. See Umbrella BioProject for information.
  • All haplotypes use the same BioSample of the individual that was sequenced

The pseudohaplotypes/haplotype pairs will be asserted as one of the appropriate types:

  • principal/alternate
  • maternal/paternal
  • haplotype 1/haplotype 2

BioProject and BioSample

All genome submissions must provide BioProject and BioSample information, which can be created before or during a genome submission.

Submitting with Feature Annotation: If you are planning to submit with feature annotation with locus-tag prefix, you must create a BioProject and BioSample before you start Genome submission. See Annotation at left for more information.

BioProject

A BioProject contains the description of the research effort, relevant grant(s), and links to the public data. It has a BioProject accession. See BioProject for more information.

A genome must belong to a BioProject. Genomes sequenced as part of the same research effort can belong to a single BioProject.

  • Use the same BioProject for the sequence reads and genome assembly made from those reads; do not create duplicate BioProjects.
  • Exception: Haplotypes of a diploid genome have different BioProjects, which can be created during pseudohaplotype/haplotype submission.
BioSample(s)

A BioSample contains the source information of the sample sequenced. It has a BioSample accession. See BioSample for more information.

  • Use the same BioSample accession for the sequence reads and genome assembly made from those reads; do not create duplicate BioSamples.
  • Exception: Metagenome Assembled Genomes (MAGs) need a BioSample for each individual organism bin. The reads are submitted using a BioSample that represents the mixed sample (i.e. soil metagenome or host name). See the MAG instructions.

Annotation

Feature annotation is optional for genomes submissions.

Submission without Feature Annotation
  • Please upload only the FASTA file on the Files tab.
  • During submission, you can request to have prokaryotic genomes annotated by NCBI’s Prokaryotic Genome Annotation Pipeline (PGAP).
Submission with Feature Annotation
  • Click for information on How to Submit an Annotated File.
  • Click for information on GFF3 or GTF annotation.
  • If you decide to submit a genome with feature annotation, it must contain the locus-tag prefix generated for you so that your genes are uniquely identifiable.
Prior to genome submission, request the locus tag prefix:
  1. Create a BioProject submission. Make sure the 'autogenerate locus tag prefix' checkbox is selected on the Project Type page of your BioProject submission.
  2. Create the BioSample submission and enter the corresponding BioProject accession into the sample metadata.
  3. After you finish your BioSample submission, the automatically assigned locus_tag prefixes will be posted to the BioProject submission portal. Should you not receive the autogenerated locus tag prefix, please email genomes@ncbi.nlm.nih.gov
  4. Create your Genome submission

Genome Assembly Metadata

Please prepare the following information to submit as metadata (Detailed information available here):

  • Assembly method: Name of the assembly algorithm(s)
  • Assembly method version or date
  • Genome coverage
  • Sequencing technology or technologies
  • Full or Partial Genome in the sample
  • Reference genome if it is not a de novo assembly
  • When updating, include the accession of the genome being updated

Files
Batch and Pseudohaplotype/Haplotype Submission
  • Filename: Filenames including extension must exactly match the filenames provided on the Genome info tab.
Uploading options
  • File upload via the web interface using HTTP or Aspera Connect browser plugin
  • FTP and Aspera upload. See How to Preload Files for details

After Submission

After completing your submission, the genomes will undergo several validations plus an initial review by our staff. See What Happens Next for details.

  • Automatic emails will be sent to the email account associated with the submission. These automated emails will include our contact information if you have any questions.

The following is a list of Genome possible submission statuses:

  • Queued: the submission is waiting for initial review.
  • Error: one or more genomes has errors in its files, so needs to be resubmitted. Use the same file name when resubmitting batch submissions.
  • Processing (no accession number): all the genomes that have passed the initial automated validations and are waiting for additional review.
  • Processing (with accession number): genome accessions have been assigned and the genomes will be processed by NCBI staff. Genomes will remain at this status until they are released. We will contact you during processing if the submission has issues that require additional information.
  • Processed: the genome has been publicly released.

The submission portal reflects a static image of the information at time of submission. Any requested changes after submission will not be reflected on the submission portal.

Submit your sequence data on desktop. The desktop view allows you to easily:

  • Enter your information
  • Enter or upload metadata
  • Upload large source files
  • Review your submission

Email me a link to get started

Submit

Genomes FAQ

  • If you choose the single genome option, you will be prompted in the forms to provide information on which sequences belong to chromosomes, plasmids or organelles. If you choose the batch option, you should include this information in the FASTA headers. Additional requirements: https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#batch_assignment

  • You will need to confirm that you did not randomly merge the sequences into a single sequence and provide details on the approach you took. The default in the submission form is that 10 Ns in a row represent a gap and that “paired-ends” is the evidence that the sequences on either side of the gap are linked. If this is incorrect, then make the appropriate selections for your genome.

GenBank

GenBank is the world's largest nucleotide archive containing sequences from all branches of life. The archive is a foundation for medical and biological discovery.

  • Submit assembled SARS-CoV-2, Influenza, Norovirus, Dengue virus, rRNA, rRNA-ITS, metazoan COX1, Eukaryotic nuclear mRNA sequences.

  • Submit genomic DNA, organelle, ncRNA, plasmids, other viruses, phages, other mRNA, synthetic constructs.

  • Submit assembled prokaryotic and eukaryotic genomes.

Sequence Read Archive (SRA)

SRA is the largest publicly-available repository of high throughput sequencing data. The archive accepts data from all branches of life as well as metagenomic and environmental surveys.

Other Tools

  • TSA

    Submit computationally assembled, transcribed RNA sequences after submitting unassembled reads to SRA. Learn more

  • GEO

    Submit RNA-seq, ChIP-seq, and other types of gene expression and epigenomics datasets. Learn more

  • BioProject & BioSample

    Choose a tool above if submitting sequence data. Learn more

Medical Genetics & Variation Tools

Submit clinical data, small & large human genomics variants, and genotype & phenotype data.

Other Resources