Published December 5, 2025 | Version 1.0.0
Dataset Open

GloSED - Global standardised soil eukaryome dataset

Description

Global Standardised Soil Eukaryome Dataset (GloSED)

Dataset description

The GloSED dataset is a metabarcoding-based dataset encompassing the entire spectrum of soil eukaryotes collected and analysed using standardized protocols.

Key characteristics

- Sampling sites: 4,147 globally distributed locations across 121 countries  
- Taxonomic scope: Complete soil eukaryome including fungi, protists, animals, and plants  
- Operational taxonomic units: 988,824 curated OTUs  
- Sequencing technology: PacBio long-read sequencing of full-length ITS  

Data collection and processing

- Standardized sampling design: 50x50 m plots  
- Soil cores: 40 cores per plot (5 cm diameter x 5 cm depth), pooled by volume  
- DNA extraction: PowerMax Soil DNA Isolation kit (Qiagen) with FavorPrep cleanup  
- Primers: universal eukaryotic primers ITS9mun/ITS4ngsUni  
- Processing: NextITS v.1.0.0 workflow (DOI: 10.5281/zenodo.15074882)  
- Taxonomic annotation: EUKARYOME v.1.9.4 database (DOI: 10.1093/database/baae043)  

 
Data files and formats

Core data

- `GloSED__OTU_sequences.fasta.gz`: Quality-filtered representative sequences for all OTUs, FASTA format  
- `GloSED__OTU_table.tsv.zip`: Sample-by-OTU abundance matrix (TSV format)  
- `GloSED__Taxonomy.tsv.zip`: Complete taxonomic annotations with UNITE-based species hypotheses (TSV format)  

- `GloSED__OTU_table.parquet`: Columnar format of abundance data for efficient querying (Parquet format)  
- `GloSED__Taxonomy.parquet`: Columnar format of taxonomic data (Parquet format)  

- `GloSED__phyloseq.RData`: phyloseq object for R-based analyses  
- `GloSED__BIOM.biom`: BIOM v.2.1 format compatible with QIIME2  

Metadata files

- `GloSED__Sample_metadata.xlsx`: Sample metadata  
- `DRI.json`: Data Reuse Information tag with ORCID identifiers  
- `DRI.csv`: Tabular format mapping accession IDs 

- `Contributors.xlsx`: List of contributors

Data reuse information

This dataset includes Data Reuse Information (DRI) tags to support equitable data sharing (Hug et al., 2025). The DRI identifies creators who prefer to be contacted before reuse:

DRI: `{0000-0002-1635-1249, 0000-0003-2786-2690}`

Please contact these individuals prior to reuse of the data.

Related resources

  • Raw sequence data:
    • European Nucleotide Archive (ENA) project: PRJEB103811
    • Sample accessions: ERS27941879 - ERS27946063  
    • Sequence accessions: ERR15957609 - ERR15964175
  • Bioinformatics pipeline: NextITS
  • Reference databases: EUKARYOME, UNITE

Files

DRI.csv

Files (1.9 GB)

Name Size Download all
md5:b1386b6b73df4a6c80e6e9bf715a5edd
13.3 kB Download
md5:206987b42366f762bde2a5ae693c74a8
558.3 kB Preview Download
md5:2756e2f149f260cd3071bedb28594ee5
269 Bytes Preview Download
md5:88bbfa018e890ee78749a4b1920a9c12
907.3 MB Download
md5:a013f52bc7c16390b25e48b536f659ef
189.5 MB Download
md5:9b0f54ae618c449dfa54fa0bdd0d52cf
63.0 MB Download
md5:c445a7ee47e7c87c87602bf848509586
46.1 MB Preview Download
md5:355b02ee9557f1e73ce557f159d31c81
265.9 MB Download
md5:aa70ae9ca5df89e50fdec01cbefbd6e2
1.0 MB Download
md5:66d8f1d918d7c64b47b2843008ba04d6
169.2 MB Download
md5:5b167a8eb8b079d68b68267aa5b26606
220.6 MB Preview Download

Additional details

Funding

Estonian Research Council
PRG632
Estonian Research Council
MOBERC116
Estonian Research Council
PRG1789
Ministry of Education and Research
Agroecology and new crops in future climates TK200
European Research Council
101200758