Genome (Prokaryotic and Eukaryotic)

Use this tool to submit prokaryotic and eukaryotic genome assemblies. Include plasmid and organellar sequences with the genome submission.

Not for viral, phage, or single locus sequences (for example: 16S rRNA). Submit those to regular GenBank.

New We recommend that you download and run NCBI’s new Foreign Contamination Screen (FCS) tool before submitting your genomes, to reduce the number of after-submission corrections and improve the quality of your genomes. See NCBI Insights and the FCS publication for more details.

What You Should Expect

Overview

Overview

These are the submission instructions for prokaryotic and eukaryotic genome submissions.

If you are submitting a viral genome, please see your submission options.
Do not submit organelle genomes here unless as part of a genome assembly. If you are only submitting organelle genomes, submit them in BankIt.
MAGs: To submit prokaryotic or eukaryotic Metagenome-assembled Genomes (MAGs), see the MAG instructions.

Detailed Instructions

For detailed instructions see the Prokaryotic and Eukaryotic Genomes Submission Guide.

General Information

Each sequence in the genome submission must be at least 200 base pairs and less than the technical limit of 2,147,483,000 base pairs (roughly 2^31). Write to genomes@ncbi.nlm.nih.gov if your genome assembly includes sequences longer than that limit.

Sequences cannot be randomly concatenated.

Submissions have either FASTA files or ASN ( .sqn ) file formats, not a mix of file types.

Choose the FASTA format (.fsa) if you do not have feature annotations
Choose the ASN format (.sqn) if you want to include feature annotation or the Genome-Assembly-Data structured comment

If you have assignment information:

For single genome submissions, provide information on which sequences belong to chromosomes, plasmids, or organelles.
Additional requirements for Batch and Haplotype submissions

Prokaryotic Genomes Annotation Pipeline (PGAP)

Users have the optional opportunity to request PGAP (only relevant for prokaryotic genomes).

Frequently Asked Questions

See Genome‘s Frequently Asked Questions page.

Submission Type

There are three types of Genome submissions: Single, Batch, and Pseudohaplotypes/Haplotypes.

Single Submission

Click for detailed information on Submitting a Single Genome

Batch Submission

Submitting up to 400 genomes that have some common information
Click for detailed information on Submitting a Batch of Genomes

All genomes within a batch must:

Have the same BioProject
Be either WGS or non-WGS, not a mix of both types
Have the same (initial) release date
Have the same gap/Ns information
Have either FASTA files or ASN ( .sqn ) files, not a mix of file types
- Choose FASTA files if you do not have feature annotations
- Choose ASN files if you want to include feature annotation or the Genome-Assembly-Data structured comment
One file for each genome
- One file for all the sequences of that genome
- For example, a batch submission of 2 genomes would have 2 files

Pseudohaplotypes/Haplotypes Submission

Submitting the haplotype assemblies generated from one or more diploid genomes.
Click for detailed information on Submitting Multiple Haplotype Assemblies

The haplotypes of a particular individual genome must:

Meet the same restrictions as Batch submissions listed above, except each must have their own BioProject that will be connected by an Umbrella BioProject. These BioProjects can be automatically created during the genome submission. See Umbrella BioProject for information.
All haplotypes use the same BioSample of the individual that was sequenced

The pseudohaplotypes/haplotype pairs will be asserted as one of the appropriate types:

principal/alternate
maternal/paternal
haplotype 1/haplotype 2

Previous Next

BioProject/BioSample

BioProject and BioSample

All genome submissions must provide BioProject and BioSample information, which can be created before or during a genome submission.

Submitting with Feature Annotation: If you are planning to submit with feature annotation with locus-tag prefix, you must create a BioProject and BioSample before you start Genome submission. See Annotation at left for more information.

BioProject

A BioProject contains the description of the research effort, relevant grant(s), and links to the public data. It has a BioProject accession. See BioProject for more information.

A genome must belong to a BioProject. Genomes sequenced as part of the same research effort can belong to a single BioProject.

Use the same BioProject for the sequence reads and genome assembly made from those reads; do not create duplicate BioProjects.
Exception: Haplotypes of a diploid genome have different BioProjects, which can be created during pseudohaplotype/haplotype submission.

BioSample(s)

A BioSample contains the source information of the sample sequenced. It has a BioSample accession. See BioSample for more information.

Use the same BioSample accession for the sequence reads and genome assembly made from those reads; do not create duplicate BioSamples.
Exception: Metagenome Assembled Genomes (MAGs) need a BioSample for each individual organism bin. The reads are submitted using a BioSample that represents the mixed sample (i.e. soil metagenome or host name). See the MAG instructions.

Previous Next

Annotation

Annotation

Feature annotation is optional for genomes submissions.

Submission without Feature Annotation

Please upload only the FASTA file on the Files tab.
During submission, you can request to have prokaryotic genomes annotated by NCBI’s Prokaryotic Genome Annotation Pipeline (PGAP).

Submission with Feature Annotation

Click for information on How to Submit an Annotated File.
Click for information on GFF3 or GTF annotation.
If you decide to submit a genome with feature annotation, it must contain the locus-tag prefix generated for you so that your genes are uniquely identifiable.

Prior to genome submission, request the locus tag prefix:

Create a BioProject submission. Make sure the 'autogenerate locus tag prefix' checkbox is selected on the Project Type page of your BioProject submission.
Create the BioSample submission and enter the corresponding BioProject accession into the sample metadata.
After you finish your BioSample submission, the automatically assigned locus_tag prefixes will be posted to the BioProject submission portal. Should you not receive the autogenerated locus tag prefix, please email genomes@ncbi.nlm.nih.gov
Create your Genome submission

Previous Next

Assembly Metadata

Genome Assembly Metadata

Please prepare the following information to submit as metadata (Detailed information available here):

Assembly method: Name of the assembly algorithm(s)
Assembly method version or date
Genome coverage
Sequencing technology or technologies
Full or Partial Genome in the sample
Reference genome if it is not a de novo assembly
When updating, include the accession of the genome being updated

Previous Next

Files

Files

If there is no feature annotation, upload FASTA file(s)
If there is feature annotation, upload an ASN format (.sqn file)

Batch and Pseudohaplotype/Haplotype Submission

Filename: Filenames including extension must exactly match the filenames provided on the Genome info tab.

Uploading options

File upload via the web interface using HTTP or Aspera Connect browser plugin
FTP and Aspera upload. See How to Preload Files for details

Previous Next

After Submission

After Submission

After completing your submission, the genomes will undergo several validations plus an initial review by our staff. See What Happens Next for details.

Automatic emails will be sent to the email account associated with the submission. These automated emails will include our contact information if you have any questions.

The following is a list of Genome possible submission statuses:

Queued: the submission is waiting for initial review.
Error: one or more genomes has errors in its files, so needs to be resubmitted. Use the same file name when resubmitting batch submissions.
Processing (no accession number): all the genomes that have passed the initial automated validations and are waiting for additional review.
Processing (with accession number): genome accessions have been assigned and the genomes will be processed by NCBI staff. Genomes will remain at this status until they are released. We will contact you during processing if the submission has issues that require additional information.
Processed: the genome has been publicly released.

The submission portal reflects a static image of the information at time of submission. Any requested changes after submission will not be reflected on the submission portal.

Submit your sequence data on desktop. The desktop view allows you to easily:

Enter your information
Enter or upload metadata
Upload large source files
Review your submission

Email me a link to get started

Genomes FAQ

If you choose the single genome option, you will be prompted in the forms to provide information on which sequences belong to chromosomes, plasmids or organelles. If you choose the batch option, you should include this information in the FASTA headers. Additional requirements: https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#batch_assignment
You will need to confirm that you did not randomly merge the sequences into a single sequence and provide details on the approach you took. The default in the submission form is that 10 Ns in a row represent a gap and that “paired-ends” is the evidence that the sequences on either side of the gap are linked. If this is incorrect, then make the appropriate selections for your genome.

GenBank

GenBank is the world's largest nucleotide archive containing sequences from all branches of life. The archive is a foundation for medical and biological discovery.

Submit assembled SARS-CoV-2, Influenza, Norovirus, Dengue virus, rRNA, rRNA-ITS, metazoan COX1, Eukaryotic nuclear mRNA sequences.
Learn more Submit
Submit genomic DNA, organelle, ncRNA, plasmids, other viruses, phages, other mRNA, synthetic constructs.
Learn more Submit
Submit assembled prokaryotic and eukaryotic genomes.
Learn more Submit

Sequence Read Archive (SRA)

SRA is the largest publicly-available repository of high throughput sequencing data. The archive accepts data from all branches of life as well as metagenomic and environmental surveys.

Submit unassembled, high throughput sequencing reads

SARS-CoV-2 submission instructions

Learn more Submit

Other Tools

TSA

Submit computationally assembled, transcribed RNA sequences after submitting unassembled reads to SRA. Learn more
GEO

Submit RNA-seq, ChIP-seq, and other types of gene expression and epigenomics datasets. Learn more
BioProject & BioSample

Choose a tool above if submitting sequence data. Learn more

Medical Genetics & Variation Tools

Submit clinical data, small & large human genomics variants, and genotype & phenotype data.

Submission Portal

Genome (Prokaryotic and Eukaryotic)

What You Should Expect

Overview

Submission type

BioProject/BioSample

Annotation

Assembly Metadata

Files

After Submission

Submit

Genomes FAQ

GenBank

Sequence Read Archive (SRA)

Other Tools

TSA

GEO

BioProject & BioSample

Medical Genetics & Variation Tools

Other Resources

Submission Portal

Genome (Prokaryotic and Eukaryotic)

What You Should Expect

Overview

Submission type

BioProject/BioSample

Annotation

Assembly Metadata

Files

After Submission

Submit

Genomes FAQ

How do I include chromosome, organelle, or plasmid assignments?

What information should I be prepared to provide about gaps in my Genome submission?

GenBank

Sequence Read Archive (SRA)

Other Tools

Medical Genetics & Variation Tools

Other Resources