An official website of the United States government
The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before
sharing sensitive information, make sure you’re on a federal
government site.
The site is secure.
The https:// ensures that you are connecting to the
official website and that any information you provide is encrypted
and transmitted securely.
Use this tool to submit complete or incomplete prokaryotic and eukaryotic genome assemblies. Include plasmid and organellar sequences with the genome submission.
Do not submit viral genomes, organellar genomes, or plasmids by themselves.
Use BankIt
or email NCBI at gb-sub@ncbi.nlm.nih.gov
your ASN.1 (.sqn) files to submit those sequences to GenBank.
New NCBI now has a new Foreign Contamination Screen (FCS) tool suite that you can run before submitting to improve the quality of your genome submission.
This new public FCS has improved sensitivity and performance, but some additional contaminants might still be found post-submission as we continue to expand the capabilities of the public tool.
We will be incorporating this new tool into NCBI’s post-submission screening process later this year.
See NCBI Insights for more details.
What You Should Expect
When you submit, you will need to:
Choose either: Single or Batch or "resolved haplotypes of Diploid/Polyploid(s)". The genomes of a batch or "diploid" submission must have some common details.
Provide a BioProject and BioSample, either that have already been registered for an SRA submission or that you create during this genome submission.
Fill out metadata on the sequencing and assembly of the genome.
Indicate what the Ns in the sequences represent. The defaults in the form are the most common.
Note: 10 or more Ns in a row are always called a gap when genome assembly statistics are calculated.
Upload your file(s).
Each sequence in the genome submission must be at least 200 base pairs. Sequences cannot be randomly concatenated.
All genomes within a batch must be:
Part of the same BioProject, except for the pseudohaplotypes of a diploid genome assembly
Either WGS or non-wgs, not a mix of both types
Just a single layer (no AGP files)
All genomes within a batch must also have or contain:
Same (initial) release date
Same gap/Ns information
Either fasta files or ASN ( .sqn ) files, not a mix of file types. FASTA files recommended unless the submission includes annotation or the Genome-Assembly-Data structured comment
Single file for each genome, including any plasmid or organelle sequences
Separate file for each genome, not all the genomes together
Request for PGAP annotation or not (only relevant for prokaryotic genomes)
Pseudohaplotypes of a diploid/polyploid assembly have the restrictions of Batch/Multiple (see above) and must also:
Use the BioSample of the individual that was sequenced
Have their own BioProjects which can be created during this submission and will be connected by an Umbrella BioProject
Have one of the pseudohaplotypes asserted by you as the primary/principal assembly
Annotation is optional here. During submission, you can request to have prokaryotic genomes annotated by NCBI’s Prokaryotic Genome Annotation Pipeline (PGAP).
If you decide to submit a genome with annotation, it must contain the locus tag prefix generated for you so that your genes are uniquely identifiable. To receive the locus tag prefix:
Register your BioProject here and write down your BioProject accession. Note: make sure that the option ""autogenerate locus tag prefix"" is selected
Register your samples in BioSample and enter your BioProject accession into the sample metadata.
Proceed to do the Genome submission
Should you not receive the autogenerated locus tag prefix, please email genomes@ncbi.nlm.nih.gov
BioProject and BioSample should be registered during the Genome submission unless you are submitting with annotation.
BioProject
The BioProject contains the description of the research effort, relevant grant(s), and has links to the public data.
A genome must belong to a BioProject, and genomes sequenced as part of the same research effort can belong to a single BioProject.
Use the same BioProject for the sequence reads and genome assembly made from those reads; do not create duplicate BioProjects.
BioSample
The BioSample contains the source information of the sample sequenced.
Use the same BioSample for the sequence reads and genome assembly made from those reads; do not create duplicate BioSamples.
Please prepare the following information to submit as metadata:
Assembly method: name of the assembly algorithm(s)
Assembly method version or date
Assembly name (Optional)
Genome coverage
Sequencing technology or technologies
Full or Partial Genome in the sample
Reference genome if it is not a de novo assembly
Update: accession of the genome being updated, when appropriate
bacteria_available_from
Next, you will upload your data files. If there is no annotation, you can upload a FASTA file. If there is annotation, you will need to create a .sqn file and submit that.
Read more here.
There are two options available:
File upload via the web interface using HTTP or Aspera Connect browser plugin
FTP and Aspera on the command line
Please note that in a batch submission your uploaded filenames must exactly match the filenames provided on the 'Genome info' tab, including file extension.
Submit your sequence data on desktop. The desktop view allows you to easily:
If you choose the single genome option, you will be prompted in the forms to provide information on which sequences belong to chromosomes, plasmids or organelles. If you choose the batch option, you should include this information in the FASTA headers.
Additional requirements: https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#batch_assignment
You will need to confirm that you did not randomly merge the sequences into a single sequence and provide details on the approach you took.
The default in the submission form is that 10 Ns in a row represent a gap and that “paired-ends” is the evidence that the sequences on either side of the gap are linked.
If this is incorrect, then make the appropriate selections for your genome.
GenBank
GenBank is the world's largest nucleotide archive containing sequences from all branches of life. The archive is a foundation for medical and biological discovery.
Submit assembled eukaryotic and prokaryotic genomes (WGS or Complete).
Sequence Read Archive (SRA)
SRA is the largest publicly-available repository of high throughput sequencing data. The archive accepts data from all branches of life as well as metagenomic and environmental surveys.
Submit unassembled, high throughput sequencing reads