Published September 29, 2025 | Version v0.1.5
Dataset Open

Supplementary Data for the Predictive Model for Atmospheric Substances and Trace Pollutants in the Environment Using Machine Learning (PASTEL)

  • 1. ROR icon Saint Louis University

Description

Supporting datasets and ensemble members/submembers for the PASTEL model.

Brief description of individual entries:

  • v0_1_5_Awakens.csv — A merged dataset combining multiple airborne campaigns with supplementary 24-hour backward trajectory information. Represents the input samples used to train PASTEL.

  • Koppen_npy_files.zip — Numpy arrays containing merged land (Beck et al. 2023) and ocean (Walterscheid 2011) Köppen climate classifications at 0.5° x 0.5° global resolution. Includes a Matplotlib colormap (Python .pkl), following Beck et al. (2023), along with alternative Köppen representations.

  • worldcities.zip — Simplemaps basic dataset (see attribution and license within).

  • df_preprocessed.csv — A preprocessed version of v0_1_5_Awakens.csv containing additional derived features and statistics. Can be used to bypass preprocessing steps in the main PASTEL notebook.

  • AllTrajectories.zip — All 24-hour backward HYSPLIT trajectories generated for each sample in v0_1_5_Awakens.csv and df_preprocessed.csv, with varying meteorological inputs (see associated publication for details).

  • ne_10m_land.zip — Natural Earth shapefile containing 10-meter resolution land boundaries.

  • ERA5_32yr_monthly_avg.nc — NetCDF file containing 32-year monthly averages of ERA5 data (ozone, specific humidity, relative humidity, temperature) over the study period.

  • ensemble.zip — Ensemble members and submembers contributing to PASTEL predictions, along with derived statistics and plots (≈27 GB uncompressed).

License

  • Code (not included here, see linked repository): GNU General Public License v3.0 (GPLv3).

  • Data: Creative Commons Attribution–ShareAlike 4.0 International (CC-BY-SA 4.0).

  • Third-party data (Simplemaps, Natural Earth) is redistributed under their respective licenses (see included attributions).

Citation

If you use this dataset, please cite:

Geiser, Victor (2025). Supplementary data for the Predictive model for Atmospheric Substances and Trace pollutants in the Environment using machine Learning (PASTEL). Zenodo. https://doi.org/10.5281/zenodo.17204569

How to Use

These datasets are intended for use with the PASTEL model, but may also be of independent value for climate classification, atmospheric transport analysis, or ensemble modeling.

Size Warning

"ensemble.zip" is roughly 27GB uncompressed as statistics/plotting information for all members/submembers is included!

Contact

For questions regarding this dataset or publication please contact victor.w.geiser[at]gmail.com

Files

AllTrajectories.zip

Files (12.8 GB)

Name Size Download all
md5:fcccf160c5f2e2254bf39b341494ab9f
64.5 MB Preview Download
md5:5038ba88cf985bdc47266a8936661116
463.7 MB Preview Download
md5:0a62228e04730472b9d32a53362dc053
10.3 GB Preview Download
md5:802c892ebfb9ac54ce163c19dcd5d2e6
1.7 GB Download
md5:42bd9e86200d1e75a667a0439490f453
1.6 MB Preview Download
md5:ae4392ba06f12ec492b64a3ed3ff33c4
3.3 MB Preview Download
md5:47db61a70a0fa73235c508f642aa5c0d
226.8 MB Preview Download
md5:0e85c9875e21b731db502bc13fb15c2a
1.8 MB Preview Download

Additional details

Related works

Is published in
Publication: 10.1175/AIES-D-24-0051.1 (DOI)

Dates

Available
2025-09-29
Date published on Zenodo

Software

Repository URL
https://github.com/vwgeiser/PASTEL
Programming language
Python
Development Status
Concept