Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published May 7, 2024 | Version v2
Dataset Open

Data from: Global Spore Sampling Project: A global, standardized dataset of airborne fungal DNA

Creators

Description

Novel methods for sampling and characterizing biodiversity hold great promise for re-evaluating patterns of life across the planet. The sampling of airborne spores with a cyclone sampler, and the sequencing of their DNA, have been suggested as an efficient and well-calibrated tool for surveying fungal diversity across various environments. Here we present data originating from the Global Spore Sampling Project, comprising 2,768 samples collected during two years at 47 outdoor locations across the world. Each sample represents fungal DNA extracted from 24 m3 of air. We applied a conservative bioinformatics pipeline that filtered out sequences that did not show strong evidence of representing a fungal species. The pipeline yielded 27,954 species-level operational taxonomic units (OTUs). Each OTU is accompanied by a probabilistic taxonomic classification, validated through comparison with expert evaluations. To examine the potential of the data for ecological analyses, we partitioned the variation in species distributions into spatial and seasonal components, showing a strong effect of the annual mean temperature on community composition.

The database is organized in five datasets in a csv format (columns separated by commas): (1) metadata providing the location, date, and time for each sample, along with sequencing depth and other essential information (metadata.csv); (2) species-level OTU tables per sample describing the number of sequences assigned to each species (otu.table.csv 3); (3) taxonomic classification of each species-level OTU (taxonomy.csv); (4) closest matching sequences and their taxonomy for ASVs in putatively fungal pseudophyla, which are included in (2) and (3) (fungi_pseudophyla.csv); and (5) closest matching sequences and their taxonomy for ASVs in putatively non-fungal pseudophyla, which are not included in the other datasets (nonfungi_pseudophyla.csv). The first four datasets can be linked to each other using the unique sample codes and the unique identifiers for species-level OTUs. The three first datafiles are also provided in allData.RData which can be read into R as load("allData.RData").

Methods

The database is generated and the results of the manuscript are created with the following pipeline:

  • The files GSSP-data_v1.0.tar.gz and protaxFungi.tgz contain the bioinformatics pipeline that produces the files provided in the file bioinformatics_pipeline_outputs.zip.
  • The script S1 reads the output of the bioinformatics pipeline as well as the metadata (sample_data.RData and site_data.RData) and exports the data tables described in the manuscript as an RData-file (the file allData.RData) as well as csv-files (metadata.csv, taxonomy.csv, otu.table.csv).
  • The script S2 downloads GBIF data for comparison.
  • The script S3 makes species-specific maps that visually compare GBIF and GSSP data.
  • The script S4 performs the analytical comparison between GBIF and GSSP data.
  • The script S5 shows the results of the comparison from script S4.
  • The script S6 implements the HMSC-analysis that quantifies the main sources of variation in the data.

Files

bioinformatics_pipeline_outputs.zip

Files (642.6 MB)

Name Size Download all
md5:77296e7198f1e071a9c9dcdcb93bb58d
3.3 MB Download
md5:530e0aaaec4049088ca73e373672c2a2
3.1 MB Preview Download
md5:5a3539e477442e99de971532367a8458
150.8 kB Preview Download
md5:3cc3c052b7959ad9b0ccefdbd2a1d4ea
121.5 kB Download
md5:b5dd5776fccb9efd0fb7f6bcc58791da
414.0 kB Preview Download
md5:0e24a98969297c4a37eaf4819a07bc6f
1.3 MB Preview Download
md5:57ecce5617b6b7a946496dc0ed384c8c
155.3 MB Preview Download
md5:a8ade8a42f3c88ea331637ac79429248
466.6 MB Download
md5:36d9447ca6251fd25ffcfce6f00c3ead
2.0 kB Download
md5:702b5bfd11c15bf75a88e29eaaadbd20
1.8 kB Download
md5:181bd5e2c5b558bfcb69051a926d9145
1.6 kB Download
md5:710f675c40c32048fb78c20e4c5a5717
1.3 kB Download
md5:a6f8ff7d05c96112c912fbf938b68d22
879 Bytes Download
md5:e3ac0ca2cca54aa86de850e3b447ba7b
2.0 kB Download
md5:aacd4611bdc3b7a35bbb5e82a75e7e5f
23.3 kB Download
md5:72e71829d2c23d5544eb2553ce768c97
1.8 kB Download
md5:1a588e2dbb89f9cf6aa1a756139ccd54
12.4 MB Preview Download