Dataset 1: Genomic assemblies and annotations of Coffea species and subgenomes
Authors/Creators
Description
This dataset contains filtered genome assemblies and corresponding GFF3 annotation files for Coffea species. The original assemblies and annotations were obtained from public repositories:
Coffea arabica ET-39 – NCBI: GCF_036785885.1
Coffea arabica Caturra – NCBI: GCA_003713225.1
Coffea arabica Bourbon – NCBI: GCA_030873655.1
Coffea arabica Gesha – Zenodo: https://zenodo.org/records/10059814
Coffea arabica Typica – Figshare: https://figshare.com/articles/dataset/b_A_chromosome-level_genome_assembly_of_b_b_Coffea_arabica_b_b_L_var_Kona_Typica_b/28425329/2
Coffea eugenioides CCC68 – NCBI: GCA_003713205.1
Coffea humblotiana – NCBI: GCA_023065735.1
Coffea canephora – NCBI: GCA_900059795.1
After obtaining the assemblies from public repositories, filters were applied to retain only chromosomes and contigs ≥500 kb for all species and cultivars. For Coffea arabica (cultivars Gesha, Caturra, Bourbon, ET-39, and Typica), the assemblies were additionally separated into subgenomes. The corresponding GFF3 annotation files were also filtered to match the retained chromosomes/contigs. Subsequently, the filtered GFF3 files were processed with the AGAT toolkit to generate new longest isoform GFF3 annotation files. Using these longest GFF3 files together with the filtered genomes, protein and CDS sequences were extracted in AGAT. For Coffea arabica Bourbon and Coffea humblotiana, no GFF3 annotations were available, but their genome assemblies were filtered in the same way as for the other species.
- Genome file extension = .fasta | Annotation file extension = .gff3 | Protein file extension = .faa | CDS file extension = .fna
- Scripts used for AGAT processing are available at: https://github.com/daisysotero/Coffea-analyses-2025
- sgC = canephora-derived subgenome | sgE = eugenioides-derived subgenome
Files
Files
(8.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:1f836665f51f5b1641cb05c1ffe921a3
|
580.6 MB | Download |
|
md5:6aee95e7cd30c5cb76f20010c29a9cb6
|
550.2 MB | Download |
|
md5:0dab7377ea5932aa0f7f92047a25db94
|
11.7 MB | Download |
|
md5:bf3f5fd1adc19f87827eb9f9998f6f43
|
524.8 MB | Download |
|
md5:ea329be0d5f19fe12fc7594f49d45799
|
31.8 MB | Download |
|
md5:434c27f044b41bfc5b484b8c4faa7c8c
|
98.3 MB | Download |
|
md5:09130ac085e691e6232d5e4e4cd71e30
|
12.1 MB | Download |
|
md5:dbeb6f3123dedd71c9ed1581be1f0238
|
499.3 MB | Download |
|
md5:a7d822e50578046c204a75adcc01c76c
|
33.0 MB | Download |
|
md5:08b649196e88face9d73cdbf9a51454a
|
96.6 MB | Download |
|
md5:2d1cdd70f9b5895c76d86c34c4a0fefb
|
13.6 MB | Download |
|
md5:3ac449188d25b707b474fdb52f5416d7
|
593.7 MB | Download |
|
md5:cc30cf777a84b639d7a90976ed51817d
|
37.3 MB | Download |
|
md5:2b276b7f12724915aa0632efc34b5e7e
|
118.5 MB | Download |
|
md5:8e905aa692b8834ff89fee83bf2ee0af
|
14.3 MB | Download |
|
md5:6ccf1167f597c4dabdaf6e20828b7c00
|
618.8 MB | Download |
|
md5:ffde8cf92e804995a777337fcc01f437
|
39.2 MB | Download |
|
md5:3259a03923420828ea0f2927b02022eb
|
129.2 MB | Download |
|
md5:1ec93d6e9edd34f0fac81a03906d8733
|
10.7 MB | Download |
|
md5:eb0c9b3ffe9c57c85bb00a07f8cabd1e
|
513.5 MB | Download |
|
md5:24579b363a74d8ac1af2930fff5b5de3
|
29.8 MB | Download |
|
md5:ce9cc12c1201d4e690831b30cb089026
|
36.0 MB | Download |
|
md5:6fb91ff9c5c6f492bf590a2f4f0d3dd7
|
10.9 MB | Download |
|
md5:8c383db0e44a5e9acafad5778def7ae4
|
499.1 MB | Download |
|
md5:3632111fb2fb6f4c14d9a6f7257b3d86
|
30.4 MB | Download |
|
md5:eb6e7b0a08b8359d12902c23c6975ad5
|
36.9 MB | Download |
|
md5:80941525f128b90ce096a1d00e0c6125
|
14.7 MB | Download |
|
md5:aa65ad96fde5196a2e53fec475cc0bf7
|
580.3 MB | Download |
|
md5:33461f3d85fd3c942e3e29a8114ee68a
|
40.7 MB | Download |
|
md5:118f0e02b900d0b53efaad3709523bde
|
28.2 MB | Download |
|
md5:88a9de28ab6eba24c58545f5094d4d64
|
15.0 MB | Download |
|
md5:a57500be69293f4dae66f49c192e59d8
|
575.2 MB | Download |
|
md5:25ebe750eddb710f36a8f74eb049e987
|
41.2 MB | Download |
|
md5:b272726abe09b448ffca68538a140143
|
28.5 MB | Download |
|
md5:37d9e604302f5687811560043d946afe
|
11.0 MB | Download |
|
md5:a82e801a1ab3d7d3274595516298b4d5
|
402.6 MB | Download |
|
md5:a7bfedaebb8c254b930abb8b3fde2069
|
29.6 MB | Download |
|
md5:787c99f87e6f6277c5132facc45ca32b
|
72.2 MB | Download |
|
md5:83795f86603212ab8695be35b4d42a70
|
393.0 MB | Download |
|
md5:62d5815d53429b383cf32897faf4c43e
|
14.9 MB | Download |
|
md5:7752e8e6102ed7674c60b99aa3f426b5
|
574.6 MB | Download |
|
md5:819b805940ef5220dd7b9cfa372dacd9
|
40.7 MB | Download |
|
md5:4650f53cbcb9056b334a619a1ccb189f
|
108.0 MB | Download |
Additional details
Related works
- Is cited by
- Dataset: 10.3390/foods14040614 (DOI)
- Dataset: 10.1093/pcp/pcaa160 (DOI)
- Dataset: 10.1093/g3journal/jkae262 (DOI)
Funding
- Fundação de Amparo à Pesquisa do Estado de São Paulo
- Integrative genomic analyses: from the superpangenome to biosynthetic pathways in Coffea species 25/05520-0
- Fundação de Amparo à Pesquisa do Estado de São Paulo
- Integrative bioinformatics in wild Coffea species: use of genomic and transcriptomic data for pangenome 24/14461-4
Software
- Repository URL
- https://github.com/daisysotero/Coffea-analyses-2025