Code and Data Schimmelradar manuscript 1.1
- 1. Wageningen University & Research
Description
Read me – Schimmelradar manuscript
The code in this repository was written to analyse the data and generate figures for the manuscript “Land use drives spatial structure of drug resistance in a fungal pathogen”.
This repository consists of two original .csv raw data files, 2 .tif files that are minimally reformatted after being downloaded from LGN.nl and www.pdok.nl/introductie/-/article/basisregistratie-gewaspercelen-brp-, and 9 scripts using the R language. The remaining files include intermediate .tif and .csv files to skip more computationally heavy steps of the analysis and facilitate the reproduction of the analysis.
Data files:§1
Schimmelradar_360_submission.csv: The raw phenotypic resistance spatial data from the air sample
- Sample: an arbitrary sample code given to each of the participants
- Area: A random number assigned to each of the 100 areas the Netherlands was split up into to facilitate an even spread of samples across the country during participant selection.
- Logistics status: Variable used to indicate whether the sample was returned in good order, not otherwise used in the analysis.
- Arrived back on: The date by which the sample arrived back at Wageningen University
- Quality seals: quality of the seals upon sample return, only samples of a quality designated as good seals were processed. (also see Supplement file – section A).
- Start sampling: The date on which the trap was deployed and the stickers exposed to the air, recorded by the participant
- End sampling: The date on which the trap was taken down and the stickers were re-covered and no longer exposed to the air, recorded by the participant
- 3 back in area?: Binary indicating whether at least three samples have been returned in the respective area (see Area)
- Batch: The date on which processing of the sample was started. To be more specific, the date at which Flamingo medium was poured over the seals of the sample and incubation was started.
- Lab processing: Binary indication completion of lab processing
- Tot ITR: A. fumigatus CFU count in the permissive layer of the itraconazole-treated plate
- RES ITR: CFU count of colonies that had breached the surface of the itraconazole-treated layer after incubation and were visually (with the unaided eye) sporulating.
- RF ITR: The itraconazole (~4 mg/L) resistance fraction = RES ITR/Tot ITR
- Muccor ITR: Indication of the presence of Mucorales spp. growth on the itraconazole treatment plate
- Tot VOR: A. fumigatus CFU count in the permissive layer of the voriconazole-treated plate
- RES VOR: CFU count of colonies that had breached the surface of the voriconazole-treated layer after incubation and were visually (with the unaided eye) sporulating.
- RF VOR: The voriconazole (~2 mg/L) resistance fraction = RES VOR/Tot VOR
- Muccor VOR: Indication of the presence of Mucorales spp. growth on the voriconazole treatment plate
- Tot CON: CFU count on the untreated growth control plate Note: note on the sample based on either information given by the participant or observations in the lab. The exclude label was given if the sample had either too little (<25) or too many (>300) CFUs on one or more of the plates (also see Supplement file – section A).
- Lat: Exact latitude of the address where the sample was taken. Not used in the published version of the code and hidden for privacy reasons.
- Long: Exact longitude of the address where the sample was taken. Not used in the published version of the code and hidden for privacy reasons.
- Round_Lat: Rounded latitude of the address where the sample was taken. Rounded down to two decimals (the equivalent of a 1 km2 area), so they could not be linked to a specific address. Used in the published version of the code.
- Round_Long: Rounded longitude of the address where the sample was taken. Rounded down to two decimals (the equivalent of a 1 km2 area), so they could not be linked to a specific address. Used in the published version of the code.
Analysis_genotypic_schimmelradar_TR_types.csv: The genotype data inferred from gel electrophoresis for all resistant isolates
- TR type: Indicates the length of the tandem repeats in bp, as judged from a gel. 34 bp, 46 bp, or multiples of 46.
- Plate: 96-well plate on which the isolate was cultured
- 96-well: well in which the isolate was cultured
- Azole: Azole on which the isolate was grown and resistant to. Itraconazole (ITRA) or Voriconazole (VORI).
- Sample: The air sample the isolate was taken from, corresponds to “Sample” in “Schimmelradar_360_submission.csv”.
- Strata: The number that equates to “Area” in “Schimmelradar_360_submission.csv”.
- WT: A binary that indicates whether an isolate had a wildtype cyp51a promotor.
- TR34: A binary that indicates whether an isolate had a TR34 cyp51a promotor.
- TR46: A binary that indicates whether an isolate had a TR46 cyp51a promotor.
- TR46_3: A binary that indicates whether an isolate had a TR46*3 cyp51a promotor.
- TR46_4: A binary that indicates whether an isolate had a TR46*4 cyp51a promotor.
Script 1 - generation_100_equisized_areas_NL
NOTE: Running this code is not necessary for the other analyses, it was used primarily for sample selection. The area distribution was used during the analysis in script 2B, yet each sample is already linked to an area in “Schimmelradar_360_submission.csv". This script was written to generate a spatial polygons data frame of 100 equisized areas of the Netherlands. The registrations for the citizen science project Schimmelradar were binned into these areas to facilitate a relatively even distribution of samples throughout the country which can be seen in Figure S1. The spatial polygons data frame can be opened and displayed in open-source software such as QGIS. The package “spcosa” used to generate the areas has RJava as a dependency, so having Java installed is required to run this script. The R script uses a shapefile of the Netherlands from the tmap package to generate the areas within the Netherlands. Generating a similar distribution for other countries will require different shape files!
Script 2 - Spatial_data_integration_fungalradar
This script produces 4 data files that describe land use in the Netherlands: The three focal.RData files with land use and resistant/colony counts, as well as the “Predictor_raster_NL.tif” land use file.
In this script, both the phenotypic and genotypic resistance spatial data from the air samples taken during the Fungal radar citizen science project are integrated with the land use and weather data used to model them. It is not recommended to run this code because the data extraction is fairly computationally demanding and it does not itself contain key statistical analyses. Rather it is used to generate the objects used for modelling and spatial predictions that are also included in this repository.
The phenotypic resistance is summarised in Table 1, which is generated in this script. Subsequently, the spatial data from the LNG22 and BRP datasets are integrated into the data. These dataset can be loaded from the "LGN2022.tif" and "Gewas22rast.tiff" raster files, respectively. Link to webpages where these files can be downloaded can found in the code.
Once the raster files are loaded, the code generates heatmaps and calculates the proportions of all the land use classes in both a 5 and 10-km radius around every sample and across the country to make spatial predictions. Only the 10 km radius data are used in the later analysis, but the 5 km radius was generated to test if that radius would be more appropriate, during an earlier stage of the analyses, and was left in for completeness. For documentation of the LGN22 data set, we refer to https://lgn.nl/documentatie and for BRP to https://nationaalgeoregister.nl/geonetwork/srv/dut/catalog.search#/metadata/44e6d4d3-8fc5-47d6-8712-33dd6d244eef, both of these online resources are in Dutch but can be readily translated. A list of the variables that were included from these datasets during model selection can be found in Table S3. Alongside land-use data, the code extracts weather data from datafiles that can be downloaded from https://cds.climate.copernicus.eu/datasets/sis-agrometeorological-indicators?tab=download for the Netherlands during the sampling window, dates and dimensions are listed within the code. The Weather_schimmelradar folder contains four subfolders for each weather variable that was considered during modelling: temperature, wind speed, precipitation and humidity. Each of these subfolders contains 44 .nc files that each cover the daily mean of the respective weather variable across the Netherlands for each of the 44 days of the sampling window the citizen scientists were given.
All spatial objects weather + land use are eventually merged into one predictor raster "Predictor_raster_NL.tif". The land use fractions and weather data are subsequently integrated with the air sample data into a single spatial data frame along with the resistance data and saved into an R object "Schimmelradar360spat_focal.RData". The script concludes by merging the cyp51a haplotype data with this object as well, to create two different objects: "Schimmelradar360spat_focal_TR_VORI.RData" for the haplotype data of the voriconazole resistant isolates and "Schimmelradar360spat_focal_TR_ITRA.RData" including the haplotype data of itraconazole resistant isolates. These two datasets are modeled separately in scripts 5,9 and 6,8, respectively. This final section of the script also generates summary table S2, which summarises the frequency of the cyp51a haplotypes per azole treatment.
If the relevant objects are loaded it is not necessary to run this script prior to running the subsequent scripts.
Script 2B – pairwise_correlations
This script requires only “Schimmelradar_360_submission.csv” and contains a supplementary analysis to explore the relationships between the different aspects of phenotypic resistance and the cyp51a haplotypes investigated during this study, per sampling area. It generates Figure S3.
Script 3 – Spatial_model_CFU_count_data_s
This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal.RData". With these objects, it generates Figures 1A, 1G and S3. It also generates and cross-validates the median CFU counts land-use + weather model (Tables S4 and S5).
Script 4 - Spatial_models_pheno_res_s
This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal.RData". Optionally, the intermediate files "qualified_models_ITRRES_naive_df.csv" and "qualified_models_VORRES_naive_df.csv" may also be loaded to skip the computationally heavy model selection step (glmmTMB is slower than the glm.nb function used in script 3). These files contain the formulas of the best-performing models (AIC <= minimal AIC + 2) and their respective AIC values. With these objects, it generates Figures 1B, 1C, 1H, 1I. It also generates and cross-validates the phenotypic itraconazole resistance land use model (Tables S6 and S7) and the phenotypic voriconazole resistance land use model (Tables S8 and S9). Additionally, this script also generates the neutral map with all sampling locations used in Figure 1F, and Figure 2, which is based on the minimal AIC phenotypic voriconazole resistance land use model (Table S9).
Script 5 - Spatial_model_vori_res_TR_s
This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal_TR_VORI.RData". Optionally, the intermediate files "qualified_models_TR34VORRES_naive_df.csv"and "qualified_models_TR46VORRES_naive_df.csv" may also be loaded to skip the computationally heavy model selection step. These files contain the formulas of the best-performing models (AIC <= minimal AIC + 2) and their respective AIC values. With these objects, it generates Figures 1D, 1E, 1J, and 1K. It also generates and cross-validates the Voriconazole-resistant TR34 land-use model (Tables S12 and S13) and the Voriconazole-resistant TR46 land-use model (Tables S10 and S11). If all objects generated in scripts 4 and 5 are still loaded this script also merges Figures 1A-E and Figures 1G-K together, each in the side-by-side subplots as they are displayed in Figure 1.
Script 6 - Spatial_model_itra_res_TR_s
This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal_TR_ITRA.RData". Optionally, the intermediate files "qualified_models_TR34ITRRES_naive_df.csv"and "qualified_models_TR46ITRRES_naive_df.csv" may also be loaded to skip the computationally heavy model selection step. These files contain the formulas of the best-performing models (AIC <= minimal AIC + 2) and their respective AIC values. With these objects, it generates Figure S6 which is analogous to Figures 1D, 1E, 1J, 1K but then for the itraconazole-resistant isolates. It also generates and cross-validates the Itraconazole-resistant TR34 land-use model (Tables S14 and S15), and the Itraconazole-resistant TR46 land-use model (Tables S16 and S17).
Script 7 - Spatial_high_fungicide_model_pheno_res_s
This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal.RData". Optionally, the intermediate files. "qualified_models_ITRRES_fungicide_df.csv" and "qualified_models_VORRES_fungicide_df.csv" may also be loaded to skip the computationally very heavy model selection step. These files contain the formulas of the best-performing models (AIC <= minimal AIC + 2) and their respective AIC values. With these objects, it generates all the subplots of Figure S7. It also generates and cross-validates the high-fungicide phenotypic itraconazole resistance land use model (Tables S19 and S20) and the phenotypic voriconazole resistance land use model (Tables S21 and S22).
Script 8 - Spatial_high_fungicide_model_itra_res_TR46_s
This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal_TR_ITRA.RData". Optionally, the intermediate file "qualified_models_TR46ITRRES_fungicide_df.csv" may also be loaded to skip the computationally very heavy model selection step. With these objects, it generates Figure S8. It also generates and cross-validates the Itraconazole-resistant TR46 high-fungicide land-use model (Tables S23 and S24).
Script 9 - Spatial_high_fungicide_model_vori_res_TR46
This script requires "Predictor_raster_NL.tif" and "Schimmelradar360spat_focal_TR_VORI.RData". Optionally, the intermediate file "qualified_models_TR46ITRRES_fungicide_df.csv" may also be loaded to skip the computationally very heavy model selection step. With these objects, it generates Figure S9. It also generates and cross-validates the Voriconazole-resistant TR46 high-fungicide land-use model (Tables S25 and S26).
Script 10 - Spatial_high_fungicide_model_vori_res_TR46_s
This script requires "Gewas22rast.tiff" and "Schimmelradar360spat_focal.RData" along with “apple_fract.tif", "consumption_potato_fract.tif", "gladiolus_fract.tif", "hyacinth_fract.tif", "lily_fract.tif", "daffodil_fract.tif", "pear_fract.tif", "set_onion_fract.tif", "seed_potato_fract.tif", "beetroot_fract.tif", "tulip_fract.tif", "seed_onion_fract.tif", "starch_potato_fract.tif", which can be generated in script 2. Optionally, to skip the requirement of all these .tif files "relative_fungicide_intensity.tif" can be loaded. With these objects, the script generates Figure S11. It also generates and cross-validates the fungicide-use models for phenotypic Voriconazole resistance and phenotypic itraconazole resistance, the outputs of which are referenced in the main text.
Files
Analysis_genotyping_schimmelradar_TR_types.csv
Files
(1.0 GB)
Name | Size | Download all |
---|---|---|
md5:13fd165433b4986af4ac473a53b55894
|
53.4 kB | Preview Download |
md5:09d8712b073879f6ee614c87887e0ff2
|
558.5 MB | Preview Download |
md5:b8fecab729fe509a81a4d61890e16798
|
350.3 MB | Preview Download |
md5:3a23ef3daeba32a37ad0a76be120a430
|
120.5 MB | Preview Download |
md5:065ddaae4e6c889a964045e1a33b4869
|
3.0 kB | Preview Download |
md5:3cc74c90d60f89d4875ad22bdf19466d
|
2.7 kB | Preview Download |
md5:944c04321df20b3840f7d779f970bd24
|
931 Bytes | Preview Download |
md5:b92cbeb9bb554a30d3a4aff2761a1e43
|
6.8 kB | Preview Download |
md5:e3a863be3a1d47e55e25f971aecdd026
|
11.5 kB | Preview Download |
md5:b4039622384b52fcb896434139563375
|
1.2 kB | Preview Download |
md5:c5fcbfaa06ea03abebb69568a01d24aa
|
19.0 kB | Preview Download |
md5:57b36ff449980ed981b5975d07dfebb8
|
1.7 kB | Preview Download |
md5:d8d771feaeed521aa1e259833c9f5b7e
|
8.9 kB | Preview Download |
md5:77dc500635e34f48367f828ee9d62410
|
1.3 kB | Preview Download |
md5:6aff9c812f134ed86cd4acae06219156
|
29.3 kB | Download |
md5:2c2c46b7fd36a5aef3cf76cf4452a097
|
3.3 MB | Preview Download |
md5:2770c3b4bbbb23c7e7e1856f6219c468
|
172.9 kB | Download |
md5:d40a67f6f9aa0a3971f4712ec70d5a3d
|
178.6 kB | Download |
md5:fcacd12fda2b29a8c827bf31739a7833
|
177.8 kB | Download |
md5:aec2d752c1a5ff5d20df234273cb587e
|
60.1 kB | Preview Download |
md5:ef91f83c490976e15a3b012fd35d5f60
|
36.2 kB | Download |
md5:eea8b529d68e67d28a54c07c24830170
|
3.0 kB | Download |
md5:65fec696c2e213c755bc96eab6d4537e
|
64.5 kB | Download |
md5:5811395bd7f346b5a6ab145ac17d5f28
|
5.4 kB | Download |
md5:96e7ace15ae41e5f73b3afd338bf3b21
|
17.2 kB | Download |
md5:e1897cfffdcc07029f4dfc919cbb4fed
|
39.7 kB | Download |
md5:07398890712bd353b4cf171ea3d9df67
|
40.3 kB | Download |
md5:cfc85f1c1dadd75fcb92984de3d3af83
|
39.4 kB | Download |
md5:9f326260772a051e68ced7463a12e92d
|
46.5 kB | Download |
md5:f2057a52ca1f8a8a8213795903f679c6
|
38.1 kB | Download |
md5:fcd4347ab1464b3965f817792db86bbb
|
22.2 kB | Download |
md5:862f0779253c0231697619455d5f46e4
|
22.3 kB | Download |
md5:9305b2fd7c2f5c9b26f9466702e8b09a
|
73.2 kB | Preview Download |
Additional details
Dates
- Submitted
-
2025-04-03In relation to intial manuscript submission
Software
- Programming language
- R
- Development Status
- Active