Published February 17, 2022 | Version v1
Dataset Open

A high-quality genome assembly and annotation of the gray mangrove, Avicennia marina

  • 1. New York University Abu Dhabi
  • 2. University of Barcelona
  • 3. King Abdullah University of Science and Technology
  • 4. Sultan Qaboos University

Description

The gray mangrove [Avicennia marina (Forsk.) Vierh.] is the most widely distributed mangrove species, ranging throughout the Indo-West Pacific. It presents remarkable levels of geographic variation both in phenotypic traits and habitat, often occupying extreme environments at the edges of its distribution. However, subspecific evolutionary relationships and adaptive mechanisms remain understudied, especially across populations of the West Indian Ocean. High-quality genomic resources accounting for such variability are also sparse. Here we report the first chromosome-level assembly of the genome of A. marina. We used a previously release draft assembly and proximity ligation libraries Chicago and Dovetail HiC for scaffolding, producing a 456,526,188 bp long genome. The largest 32 scaffolds (22.4 Mb to 10.5 Mb) accounted for 98 % of the genome assembly, with the remaining 2% distributed among much shorter 3,759 scaffolds (62.4 Kb to 1 Kb). We annotated 45,032 protein-coding genes using tissue-specific RNA-seq data in combination with de novo gene prediction, from which 34,442 were associated to GO terms. Genome assembly and annotated set of genes yield a 96.7% and 95.1% completeness score, respectively, when compared with the eudicots BUSCO dataset. Furthermore, an FST survey based on resequencing data successfully identified a set of candidate genes potentially involved in local adaptation, and revealed patterns of adaptive variability correlating with a temperature gradient in Arabian mangrove populations. Our A. marina genomic assembly provides a highly valuable resource for genome evolution analysis, as well as for identifying functional genes involved in adaptive processes and speciation.

Notes

Warning:

The genome assembly here provided has also been deposited at DDBJ/ENA/GenBank along with raw sequence data from Chicago and HiC libraries under the accession JABGBM000000000, Bioproject (SRA) accession: PRJNA629068; Biosample accession: SAMN14766548. For the use of this genome as a reference, and especially for the identification of functional genes based on the annotation, we recomend the version here submitted, given that the version avalable in GenBank has gone through further trimming due to requirements of the submission process, which entails a disadjustment with respect the gene coordinates provided in the gff3 file and supplementary resources. 

For detailed information about the datsets here provided, check the our publication https://academic.oup.com/g3journal/article/11/1/jkaa025/6026961

Funding provided by: Center for Genomics and Systems Biology*
Crossref Funder Registry ID:
Award Number: CGSB Sustainability Program

Funding provided by: New York University Abu Dhabi
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100012025
Award Number: 73 71210 CGSB9

Funding provided by: King Abdullah University of Science and Technology
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100004052
Award Number: Baseline funding

Funding provided by: Center for Genomics and Systems Biology
Crossref Funder Registry ID:
Award Number: CGSB Sustainability Program

Files

Files (492.1 MB)

Name Size Download all
md5:b75e3c464579d3081615599e6ff27935
492.1 MB Download