BioProject and BioSample should be registered during the Genome submission unless you are submitting with annotation.
The BioProject contains the description of the research effort, relevant grant(s), and has links to the public data.
A genome must belong to a BioProject, and genomes sequenced as part of the same research effort can belong to a single BioProject.
Use the same BioProject for the sequence reads and genome assembly made from those reads; do not create duplicate BioProjects.
The BioSample contains the source information of the sample sequenced.
Use the same BioSample for the sequence reads and genome assembly made from those reads; do not create duplicate BioSamples.
Please prepare the following information to submit as metadata:
Assembly method: name of the assembly algorithm(s)
Assembly method version or date
Assembly name (Optional)
Sequencing technology or technologies
Full or Partial Genome in the sample
Reference genome if it is not a de novo assembly
Update: accession of the genome being updated, when appropriate
Next, you will upload your data files. If there is no annotation, you can upload a FASTA file. If there is annotation, you will need to create a .sqn file and submit that.
Read more here.
There are two options available:
File upload via the web interface using HTTP or Aspera Connect browser plugin
FTP and Aspera on the command line
Please note that in a batch submission your uploaded filenames must exactly match the filenames provided on the 'Genome info' tab, including file extension.
Submit your sequence data on desktop. The desktop view allows you to easily:
If you choose the single genome option, you will be prompted in the forms to provide information on which sequences belong to chromosomes, plasmids or organelles. If you choose the batch option, you should include this information in the FASTA headers.
Additional requirements: https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#batch_assignment
You will need to confirm that you did not randomly merge the sequences into a single sequence and provide details on the approach you took.
The default in the submission form is that 10 Ns in a row represent a gap and that “paired-ends” is the evidence that the sequences on either side of the gap are linked.
If this is incorrect, then make the appropriate selections for your genome.
GenBank is the world's largest nucleotide archive containing sequences from all branches of life. The archive is a foundation for medical and biological discovery.