Published October 7, 2025 | Version v1
Dataset Open

Dataset 1: Genomic assemblies and annotations of Coffea species and subgenomes

  • 1. EDMO icon University of Sao Paulo
  • 2. ROR icon Universidade de São Paulo

Description

This dataset contains filtered genome assemblies and corresponding GFF3 annotation files for Coffea species. The original assemblies and annotations were obtained from public repositories:

Coffea arabica ET-39 – NCBI: GCF_036785885.1
Coffea arabica Caturra – NCBI: GCA_003713225.1
Coffea arabica Bourbon – NCBI: GCA_030873655.1
Coffea arabica Gesha – Zenodo: https://zenodo.org/records/10059814
Coffea arabica Typica – Figshare: https://figshare.com/articles/dataset/b_A_chromosome-level_genome_assembly_of_b_b_Coffea_arabica_b_b_L_var_Kona_Typica_b/28425329/2
Coffea eugenioides CCC68 – NCBI: GCA_003713205.1
Coffea humblotiana – NCBI: GCA_023065735.1
Coffea canephora – NCBI: GCA_900059795.1

After obtaining the assemblies from public repositories, filters were applied to retain only chromosomes and contigs ≥500 kb for all species and cultivars. For Coffea arabica (cultivars Gesha, Caturra, Bourbon, ET-39, and Typica), the assemblies were additionally separated into subgenomes. The corresponding GFF3 annotation files were also filtered to match the retained chromosomes/contigs. Subsequently, the filtered GFF3 files were processed with the AGAT toolkit to generate new longest isoform GFF3 annotation files. Using these longest GFF3 files together with the filtered genomes, protein and CDS sequences were extracted in AGAT. For Coffea arabica Bourbon and Coffea humblotiana, no GFF3 annotations were available, but their genome assemblies were filtered in the same way as for the other species.

  • Genome file extension = .fasta | Annotation file extension = .gff3 | Protein file extension = .faa | CDS file extension = .fna 
  • Scripts used for AGAT processing are available at: https://github.com/daisysotero/Coffea-analyses-2025
  • sgC = canephora-derived subgenome | sgE = eugenioides-derived subgenome

Files

Files (8.1 GB)

Name Size Download all
md5:1f836665f51f5b1641cb05c1ffe921a3
580.6 MB Download
md5:6aee95e7cd30c5cb76f20010c29a9cb6
550.2 MB Download
md5:0dab7377ea5932aa0f7f92047a25db94
11.7 MB Download
md5:bf3f5fd1adc19f87827eb9f9998f6f43
524.8 MB Download
md5:ea329be0d5f19fe12fc7594f49d45799
31.8 MB Download
md5:434c27f044b41bfc5b484b8c4faa7c8c
98.3 MB Download
md5:09130ac085e691e6232d5e4e4cd71e30
12.1 MB Download
md5:dbeb6f3123dedd71c9ed1581be1f0238
499.3 MB Download
md5:a7d822e50578046c204a75adcc01c76c
33.0 MB Download
md5:08b649196e88face9d73cdbf9a51454a
96.6 MB Download
md5:2d1cdd70f9b5895c76d86c34c4a0fefb
13.6 MB Download
md5:3ac449188d25b707b474fdb52f5416d7
593.7 MB Download
md5:cc30cf777a84b639d7a90976ed51817d
37.3 MB Download
md5:2b276b7f12724915aa0632efc34b5e7e
118.5 MB Download
md5:8e905aa692b8834ff89fee83bf2ee0af
14.3 MB Download
md5:6ccf1167f597c4dabdaf6e20828b7c00
618.8 MB Download
md5:ffde8cf92e804995a777337fcc01f437
39.2 MB Download
md5:3259a03923420828ea0f2927b02022eb
129.2 MB Download
md5:1ec93d6e9edd34f0fac81a03906d8733
10.7 MB Download
md5:eb0c9b3ffe9c57c85bb00a07f8cabd1e
513.5 MB Download
md5:24579b363a74d8ac1af2930fff5b5de3
29.8 MB Download
md5:ce9cc12c1201d4e690831b30cb089026
36.0 MB Download
md5:6fb91ff9c5c6f492bf590a2f4f0d3dd7
10.9 MB Download
md5:8c383db0e44a5e9acafad5778def7ae4
499.1 MB Download
md5:3632111fb2fb6f4c14d9a6f7257b3d86
30.4 MB Download
md5:eb6e7b0a08b8359d12902c23c6975ad5
36.9 MB Download
md5:80941525f128b90ce096a1d00e0c6125
14.7 MB Download
md5:aa65ad96fde5196a2e53fec475cc0bf7
580.3 MB Download
md5:33461f3d85fd3c942e3e29a8114ee68a
40.7 MB Download
md5:118f0e02b900d0b53efaad3709523bde
28.2 MB Download
md5:88a9de28ab6eba24c58545f5094d4d64
15.0 MB Download
md5:a57500be69293f4dae66f49c192e59d8
575.2 MB Download
md5:25ebe750eddb710f36a8f74eb049e987
41.2 MB Download
md5:b272726abe09b448ffca68538a140143
28.5 MB Download
md5:37d9e604302f5687811560043d946afe
11.0 MB Download
md5:a82e801a1ab3d7d3274595516298b4d5
402.6 MB Download
md5:a7bfedaebb8c254b930abb8b3fde2069
29.6 MB Download
md5:787c99f87e6f6277c5132facc45ca32b
72.2 MB Download
md5:83795f86603212ab8695be35b4d42a70
393.0 MB Download
md5:62d5815d53429b383cf32897faf4c43e
14.9 MB Download
md5:7752e8e6102ed7674c60b99aa3f426b5
574.6 MB Download
md5:819b805940ef5220dd7b9cfa372dacd9
40.7 MB Download
md5:4650f53cbcb9056b334a619a1ccb189f
108.0 MB Download

Additional details

Related works

Is cited by
Dataset: 10.3390/foods14040614 (DOI)
Dataset: 10.1093/pcp/pcaa160 (DOI)
Dataset: 10.1093/g3journal/jkae262 (DOI)

Funding

Fundação de Amparo à Pesquisa do Estado de São Paulo
Integrative genomic analyses: from the superpangenome to biosynthetic pathways in Coffea species 25/05520-0
Fundação de Amparo à Pesquisa do Estado de São Paulo
Integrative bioinformatics in wild Coffea species: use of genomic and transcriptomic data for pangenome 24/14461-4