Published April 3, 2025 | Version v2
Dataset Open

Code and Data Schimmelradar manuscript 1.1

  • 1. Wageningen University & Research

Contributors

Data manager:

Researcher:

  • 1. Wageningen University & Research

Description

Read me – Schimmelradar manuscript

The code in this repository was written to analyse the data and generate figures for the manuscript “Land use drives spatial structure of drug resistance in a fungal pathogen”.

This repository consists of two original .csv raw data files, 2 .tif files that are minimally reformatted after being downloaded from LGN.nl and www.pdok.nl/introductie/-/article/basisregistratie-gewaspercelen-brp-, and 9 scripts using the R language. The remaining files include intermediate .tif and .csv files to skip more computationally heavy steps of the analysis and facilitate the reproduction of the analysis.

Data files:§1

Schimmelradar_360_submission.csv:  The raw phenotypic resistance spatial data from the air sample

-        Sample: an arbitrary sample code given to each of the participants

-        Area: A random number assigned to each of the 100 areas the Netherlands was split up into to facilitate an even spread of samples across the country during participant selection.

-        Logistics status: Variable used to indicate whether the sample was returned in good order, not otherwise used in the analysis.

-        Arrived back on: The date by which the sample arrived back at Wageningen University

-        Quality seals: quality of the seals upon sample return, only samples of a quality designated as good seals were processed. (also see Supplement file – section A).

-        Start sampling: The date on which the trap was deployed and the stickers exposed to the air, recorded by the participant

-        End sampling: The date on which the trap was taken down and the stickers were re-covered and no longer exposed to the air, recorded by the participant

-        3 back in area?: Binary indicating whether at least three samples have been returned in the respective area (see Area)

-        Batch: The date on which processing of the sample was started. To be more specific, the date at which Flamingo medium was poured over the seals of the sample and incubation was started.

-        Lab processing: Binary indication completion of lab processing

-        Tot ITR: A. fumigatus CFU count in the permissive layer of the itraconazole-treated plate

-        RES ITR: CFU count of colonies that had breached the surface of the itraconazole-treated layer after incubation and were visually (with the unaided eye) sporulating.

-        RF ITR: The itraconazole (~4 mg/L) resistance fraction = RES ITR/Tot ITR

-        Muccor ITR: Indication of the presence of Mucorales spp. growth on the itraconazole treatment plate

-        Tot VOR: A. fumigatus CFU count in the permissive layer of the voriconazole-treated plate

-        RES VOR: CFU count of colonies that had breached the surface of the voriconazole-treated layer after incubation and were visually (with the unaided eye) sporulating.

-        RF VOR: The voriconazole (~2 mg/L) resistance fraction = RES VOR/Tot VOR

-        Muccor VOR: Indication of the presence of Mucorales spp. growth on the voriconazole treatment plate

-        Tot CON: CFU count on the untreated growth control plate Note: note on the sample based on either information given by the participant or observations in the lab. The exclude label was given if the sample had either too little (<25) or too many (>300) CFUs on one or more of the plates (also see Supplement file – section A).

-        Lat: Exact latitude of the address where the sample was taken. Not used in the published version of the code and hidden for privacy reasons. 

-        Long: Exact longitude of the address where the sample was taken. Not used in the published version of the code and hidden for privacy reasons. 

-        Round_Lat: Rounded latitude of the address where the sample was taken. Rounded down to two decimals (the equivalent of a 1 km2 area), so they could not be linked to a specific address. Used in the published version of the code. 

-        Round_Long: Rounded longitude of the address where the sample was taken. Rounded down to two decimals (the equivalent of a 1 km2 area), so they could not be linked to a specific address. Used in the published version of the code. 

 

Analysis_genotypic_schimmelradar_TR_types.csv: The genotype data inferred from gel electrophoresis for all resistant isolates

 

-        TR type: Indicates the length of the tandem repeats in bp, as judged from a gel. 34 bp, 46 bp, or multiples of 46.

-        Plate: 96-well plate on which the isolate was cultured

-        96-well: well in which the isolate was cultured

-        Azole: Azole on which the isolate was grown and resistant to. Itraconazole (ITRA) or Voriconazole (VORI).

-        Sample: The air sample the isolate was taken from, corresponds to “Sample” in “Schimmelradar_360_submission.csv”.

-        Strata: The number that equates to “Area” in “Schimmelradar_360_submission.csv”.

-        WT: A binary that indicates whether an isolate had a wildtype cyp51a promotor.

-        TR34: A binary that indicates whether an isolate had a TR34 cyp51a promotor.

-        TR46: A binary that indicates whether an isolate had a TR46 cyp51a promotor.

-        TR46_3: A binary that indicates whether an isolate had a TR46*3 cyp51a promotor.

-        TR46_4: A binary that indicates whether an isolate had a TR46*4 cyp51a promotor.

 

 

 

Script 1 - generation_100_equisized_areas_NL

NOTE: Running this code is not necessary for the other analyses, it was used primarily for sample selection. The area distribution was used during the analysis in script 2B, yet each sample is already linked to an area in “Schimmelradar_360_submission.csv". This script was written to generate a spatial polygons data frame of 100 equisized areas of the Netherlands. The registrations for the citizen science project Schimmelradar were binned into these areas to facilitate a relatively even distribution of samples throughout the country which can be seen in Figure S1. The spatial polygons data frame can be opened and displayed in open-source software such as QGIS. The package “spcosa” used to generate the areas has RJava as a dependency, so having Java installed is required to run this script. The R script uses a shapefile of the Netherlands from the tmap package to generate the areas within the Netherlands. Generating a similar distribution for other countries will require different shape files!

Script 2 - Spatial_data_integration_fungalradar

This script produces 4 data files that describe land use in the Netherlands: The three focal.RData files with land use and resistant/colony counts, as well as the “Predictor_raster_NL.tif” land use file.

In this script, both the phenotypic and genotypic resistance spatial data from the air samples taken during the Fungal radar citizen science project are integrated with the land use and weather data used to model them. It is not recommended to run this code because the data extraction is fairly computationally demanding and it does not itself contain key statistical analyses. Rather it is used to generate the objects used for modelling and spatial predictions that are also included in this repository.

 

The phenotypic resistance is summarised in Table 1, which is generated in this script. Subsequently, the spatial data from the LNG22 and BRP datasets are integrated into the data. These dataset can be loaded from the "LGN2022.tif" and "Gewas22rast.tiff" raster files, respectively. Link to webpages where these files can be downloaded can found in the code.

Once the raster files are loaded, the code generates heatmaps and calculates the proportions of all the land use classes in both a 5 and 10-km radius around every sample and across the country to make spatial predictions. Only the 10 km radius data are used in the later analysis, but the 5 km radius was generated to test if that radius would be more appropriate, during an earlier stage of the analyses, and was left in for completeness. For documentation of the LGN22 data set, we refer to https://lgn.nl/documentatie and for BRP to https://nationaalgeoregister.nl/geonetwork/srv/dut/catalog.search#/metadata/44e6d4d3-8fc5-47d6-8712-33dd6d244eef, both of these online resources are in Dutch but can be readily translated. A list of the variables that were included from these datasets during model selection can be found in Table S3.  Alongside land-use data, the code extracts weather data from datafiles that can be downloaded from https://cds.climate.copernicus.eu/datasets/sis-agrometeorological-indicators?tab=download for the Netherlands during the sampling window, dates and dimensions are listed within the code.  The Weather_schimmelradar folder contains four subfolders for each weather variable that was considered during modelling: temperature, wind speed, precipitation and humidity. Each of these subfolders contains 44 .nc files that each cover the daily mean of the respective weather variable across the Netherlands for each of the 44 days of the sampling window the citizen scientists were given.

All spatial objects weather + land use are eventually merged into one predictor raster "Predictor_raster_NL.tif". The land use fractions and weather data are subsequently integrated with the air sample data into a single spatial data frame along with the resistance data and saved into an R object "Schimmelradar360spat_focal.RData". The script concludes by merging the cyp51a haplotype data with this object as well, to create two different objects: "Schimmelradar360spat_focal_TR_VORI.RData" for the haplotype data of the voriconazole resistant isolates and "Schimmelradar360spat_focal_TR_ITRA.RData" including the haplotype data of itraconazole resistant isolates. These two datasets are modeled separately in scripts 5,9 and 6,8, respectively.  This final section of the script also generates summary table S2, which summarises the frequency of the cyp51a haplotypes per azole treatment.

If the relevant objects are loaded it is not necessary to run this script prior to running the subsequent scripts.

 

Script 2B – pairwise_correlations

This script requires only “Schimmelradar_360_submission.csv” and contains a supplementary analysis to explore the relationships between the different aspects of phenotypic resistance and the cyp51a haplotypes investigated during this study, per sampling area. It generates Figure S3.  

Script 3 –  Spatial_model_CFU_count_data_s

This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal.RData". With these objects, it generates Figures 1A, 1G and S3. It also generates and cross-validates the median CFU counts land-use + weather model (Tables S4 and S5).

Script 4 - Spatial_models_pheno_res_s

This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal.RData".  Optionally,  the intermediate files "qualified_models_ITRRES_naive_df.csv" and "qualified_models_VORRES_naive_df.csv" may also be loaded to skip the computationally heavy model selection step (glmmTMB is slower than the glm.nb function used in script 3). These files contain the formulas of the best-performing models (AIC <= minimal AIC + 2) and their respective AIC values. With these objects, it generates  Figures 1B, 1C, 1H, 1I. It also generates and cross-validates the phenotypic itraconazole resistance land use model (Tables S6 and S7) and the phenotypic voriconazole resistance land use model (Tables S8 and S9). Additionally, this script also generates the neutral map with all sampling locations used in Figure 1F, and Figure 2, which is based on the minimal AIC phenotypic voriconazole resistance land use model (Table S9).

Script 5 - Spatial_model_vori_res_TR_s

This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal_TR_VORI.RData".  Optionally,  the intermediate files "qualified_models_TR34VORRES_naive_df.csv"and "qualified_models_TR46VORRES_naive_df.csv" may also be loaded to skip the computationally heavy model selection step. These files contain the formulas of the best-performing models (AIC <= minimal AIC + 2) and their respective AIC values. With these objects, it generates  Figures 1D, 1E, 1J, and 1K. It also generates and cross-validates the Voriconazole-resistant TR34 land-use model (Tables S12 and S13) and the Voriconazole-resistant TR46 land-use model (Tables S10 and S11). If all objects generated in scripts 4 and 5 are still loaded this script also merges Figures 1A-E and Figures 1G-K together, each in the side-by-side subplots as they are displayed in Figure 1.

Script 6 - Spatial_model_itra_res_TR_s

This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal_TR_ITRA.RData". Optionally,  the intermediate files "qualified_models_TR34ITRRES_naive_df.csv"and "qualified_models_TR46ITRRES_naive_df.csv" may also be loaded to skip the computationally heavy model selection step. These files contain the formulas of the best-performing models (AIC <= minimal AIC + 2) and their respective AIC values. With these objects, it generates Figure S6 which is analogous to Figures 1D, 1E, 1J, 1K but then for the itraconazole-resistant isolates. It also generates and cross-validates the Itraconazole-resistant TR34 land-use model (Tables S14 and S15), and the Itraconazole-resistant TR46 land-use model (Tables S16 and S17).

Script 7 - Spatial_high_fungicide_model_pheno_res_s

This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal.RData". Optionally,  the intermediate files. "qualified_models_ITRRES_fungicide_df.csv" and "qualified_models_VORRES_fungicide_df.csv" may also be loaded to skip the computationally very heavy model selection step. These files contain the formulas of the best-performing models (AIC <= minimal AIC + 2) and their respective AIC values. With these objects, it generates all the subplots of Figure S7. It also generates and cross-validates the high-fungicide phenotypic itraconazole resistance land use model (Tables S19 and S20) and the phenotypic voriconazole resistance land use model (Tables S21 and S22).

 

Script 8 - Spatial_high_fungicide_model_itra_res_TR46_s

This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal_TR_ITRA.RData". Optionally,  the intermediate file "qualified_models_TR46ITRRES_fungicide_df.csv" may also be loaded to skip the computationally very heavy model selection step. With these objects, it generates Figure S8. It also generates and cross-validates the Itraconazole-resistant TR46 high-fungicide land-use model (Tables S23 and S24).

Script 9 - Spatial_high_fungicide_model_vori_res_TR46

This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal_TR_VORI.RData".  Optionally,  the intermediate file "qualified_models_TR46ITRRES_fungicide_df.csv" may also be loaded to skip the computationally very heavy model selection step. With these objects, it generates Figure S9. It also generates and cross-validates the Voriconazole-resistant TR46 high-fungicide land-use model (Tables S25 and S26).

Script 10 - Spatial_high_fungicide_model_vori_res_TR46_s

This script requires "Gewas22rast.tiff" and  "Schimmelradar360spat_focal.RData" along with “apple_fract.tif", "consumption_potato_fract.tif", "gladiolus_fract.tif", "hyacinth_fract.tif", "lily_fract.tif", "daffodil_fract.tif", "pear_fract.tif", "set_onion_fract.tif", "seed_potato_fract.tif", "beetroot_fract.tif", "tulip_fract.tif", "seed_onion_fract.tif", "starch_potato_fract.tif", which can be generated in script 2. Optionally, to skip the requirement of all these .tif files "relative_fungicide_intensity.tif" can be loaded. With these objects, the script generates Figure S11. It also generates and cross-validates the fungicide-use models for phenotypic Voriconazole resistance and phenotypic itraconazole resistance, the outputs of which are referenced in the main text.

 

Files

Analysis_genotyping_schimmelradar_TR_types.csv

Files (1.0 GB)

Name Size Download all
md5:13fd165433b4986af4ac473a53b55894
53.4 kB Preview Download
md5:09d8712b073879f6ee614c87887e0ff2
558.5 MB Preview Download
md5:b8fecab729fe509a81a4d61890e16798
350.3 MB Preview Download
md5:3a23ef3daeba32a37ad0a76be120a430
120.5 MB Preview Download
md5:065ddaae4e6c889a964045e1a33b4869
3.0 kB Preview Download
md5:3cc74c90d60f89d4875ad22bdf19466d
2.7 kB Preview Download
md5:944c04321df20b3840f7d779f970bd24
931 Bytes Preview Download
md5:b92cbeb9bb554a30d3a4aff2761a1e43
6.8 kB Preview Download
md5:e3a863be3a1d47e55e25f971aecdd026
11.5 kB Preview Download
md5:b4039622384b52fcb896434139563375
1.2 kB Preview Download
md5:c5fcbfaa06ea03abebb69568a01d24aa
19.0 kB Preview Download
md5:57b36ff449980ed981b5975d07dfebb8
1.7 kB Preview Download
md5:d8d771feaeed521aa1e259833c9f5b7e
8.9 kB Preview Download
md5:77dc500635e34f48367f828ee9d62410
1.3 kB Preview Download
md5:6aff9c812f134ed86cd4acae06219156
29.3 kB Download
md5:2c2c46b7fd36a5aef3cf76cf4452a097
3.3 MB Preview Download
md5:2770c3b4bbbb23c7e7e1856f6219c468
172.9 kB Download
md5:d40a67f6f9aa0a3971f4712ec70d5a3d
178.6 kB Download
md5:fcacd12fda2b29a8c827bf31739a7833
177.8 kB Download
md5:aec2d752c1a5ff5d20df234273cb587e
60.1 kB Preview Download
md5:ef91f83c490976e15a3b012fd35d5f60
36.2 kB Download
md5:eea8b529d68e67d28a54c07c24830170
3.0 kB Download
md5:65fec696c2e213c755bc96eab6d4537e
64.5 kB Download
md5:5811395bd7f346b5a6ab145ac17d5f28
5.4 kB Download
md5:96e7ace15ae41e5f73b3afd338bf3b21
17.2 kB Download
md5:e1897cfffdcc07029f4dfc919cbb4fed
39.7 kB Download
md5:07398890712bd353b4cf171ea3d9df67
40.3 kB Download
md5:cfc85f1c1dadd75fcb92984de3d3af83
39.4 kB Download
md5:9f326260772a051e68ced7463a12e92d
46.5 kB Download
md5:f2057a52ca1f8a8a8213795903f679c6
38.1 kB Download
md5:fcd4347ab1464b3965f817792db86bbb
22.2 kB Download
md5:862f0779253c0231697619455d5f46e4
22.3 kB Download
md5:9305b2fd7c2f5c9b26f9466702e8b09a
73.2 kB Preview Download

Additional details

Dates

Submitted
2025-04-03
In relation to intial manuscript submission

Software

Programming language
R
Development Status
Active