Use this tool to submit complete or incomplete prokaryotic and eukaryotic genome assemblies. Include plasmid and organellar sequences with the genome submission.
Not for viral, phage, or single locus sequences (for example: 16S rRNA). Submit those to regular GenBank.
NCBI has a Foreign Contamination Screen (FCS) tool suite available in GitHub
and now running on new submissions to help improve the quality of your genome submissions.
See NCBI Insights
and our preprint for more details.
What You Should Expect
When you submit, you will need to:
Choose either: Single
The genomes of a batch or haplotype submissions must have some common details.
Provide a BioProject and BioSample.
You may have registered these in advance OR you may create these within your genome submission.
However, if you are submitting with annotation, please register these in advance.
Fill out metadata on the sequencing and assembly of the genome.
Provide chromosome, plasmid, and organelle assignments, if known, as explained in the single/batch/haplotype documentation (see links above).
Indicate what the Ns in the sequences represent. The defaults in the form are the most common.
Note: 10 or more Ns in a row are always called a gap when genome assembly statistics are calculated.
Upload your file(s).
Each sequence in the genome submission must be at least 200 base pairs. Sequences cannot be randomly concatenated.
All genomes within a batch must be:
Part of the same BioProject, except for the pseudohaplotypes of a diploid genome assembly
Either WGS or non-wgs, not a mix of both types
Just a single layer (no AGP files)
All genomes within a batch must also have or contain:
Same (initial) release date
Same gap/Ns information
Either fasta files or ASN ( .sqn ) files, not a mix of file types. FASTA files recommended unless the submission includes annotation or the Genome-Assembly-Data structured comment
Single file for each genome, including any plasmid or organelle sequences
Separate file for each genome, not all the genomes together
Request for PGAP annotation or not (only relevant for prokaryotic genomes)
Pseudohaplotypes of a diploid/polyploid assembly have the restrictions of Batch/Multiple (see above) and must also:
Use the BioSample of the individual that was sequenced
Have their own BioProjects which can be created during this submission and will be connected by an Umbrella BioProject
Have one of the pseudohaplotypes asserted by you as the primary/principal assembly
Annotation is optional here. During submission, you can request to have prokaryotic genomes annotated by NCBI’s Prokaryotic Genome Annotation Pipeline (PGAP).
If you decide to submit a genome with annotation, it must contain the locus tag prefix generated for you so that your genes are uniquely identifiable. To receive the locus tag prefix:
Register your BioProject and write down your BioProject accession. Note: make sure that the option 'autogenerate locus tag prefix' is selected
BioProject and BioSample should be registered during the Genome submission unless you are submitting with annotation.
The BioProject contains the description of the research effort, relevant grant(s), and has links to the public data.
A genome must belong to a BioProject, and genomes sequenced as part of the same research effort can belong to a single BioProject.
Use the same BioProject for the sequence reads and genome assembly made from those reads; do not create duplicate BioProjects.
The BioSample contains the source information of the sample sequenced.
Use the same BioSample for the sequence reads and genome assembly made from those reads; do not create duplicate BioSamples.
Please prepare the following information to submit as metadata:
Assembly method: name of the assembly algorithm(s)
Assembly method version or date
Assembly name (Optional)
Sequencing technology or technologies
Full or Partial Genome in the sample
Reference genome if it is not a de novo assembly
Update: accession of the genome being updated, when appropriate
Next, you will upload your data files. If there is no annotation, you can upload a FASTA file. If there is annotation, you will need to create a .sqn file and submit that.
Learn more about data files.
There are two options available:
File upload via the web interface using HTTP or Aspera Connect browser plugin
FTP and Aspera on the command line
Please note that in a batch submission your uploaded filenames must exactly match the filenames provided on the 'Genome info' tab, including file extension.
Submit your sequence data on desktop. The desktop view allows you to easily:
If you choose the single genome option, you will be prompted in the forms to provide information on which sequences belong to chromosomes, plasmids or organelles. If you choose the batch option, you should include this information in the FASTA headers.
Additional requirements: https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#batch_assignment
You will need to confirm that you did not randomly merge the sequences into a single sequence and provide details on the approach you took.
The default in the submission form is that 10 Ns in a row represent a gap and that “paired-ends” is the evidence that the sequences on either side of the gap are linked.
If this is incorrect, then make the appropriate selections for your genome.
GenBank is the world's largest nucleotide archive containing sequences from all branches of life. The archive is a foundation for medical and biological discovery.