Published January 30, 2024 | Version v1
Dataset Open

Pre-processed daily ERA5 and MODIS AOD data (2003 - 2022) ready for use in AI/ML forecasting

Description

Long-term, pre-processed, atmospheric datasets for use in Machine Learning/AI based forecasting. Initially intended to predict AOD, however can be adapted for prediction of other atmospheric particles. 

Pre-processed data and code

Machine Learning ready NumPy* dataset constructed by pre-processing selected atmospheric variables at 5 pressure levels form ERA5 reanalysis (resulting in 35 features) and AOD data from MODIS on board of Aqua and Terra satellites. This is a long-term daily dataset which spans 20 years from 1st Jan 2003 to 31st Dec 2022 and is homogeneously structured into 1ºx1º grid cells. Missing days and AOD values from MODIS were imputed using Lattice Kriging method (Python code used for imputation included as Jupyter Notebook 'Combine_impute_AOD.ipynb'), but raw (unimputed) MODIS data are also available. All datasets were created for a purpose of training Convolutional Neural Network model designed to forecast Saharan dust (DustNet). These datasets can also be used to train other ML models, or indeed to forecast other variables. 

This dataset was used to train the DustNet model and predict 24-hr ahead AOD. Please see doi: 10.5281/zenodo.10722953 for further details on predicting AOD and the DustNet model code. 

 

*datasets are NumPy arrays (v1.23) created in Python v3.8.18.

Files

READ_ME.pdf

Files (2.6 GB)

Name Size Download all
md5:bd00934256d3e34d8142d4aaf341a040
85.5 kB Preview Download
md5:12c3f099dbcbbfda63f14a8efe25fe17
2.6 GB Preview Download
md5:d56d0d45a1820376f8d4e8d00e2be208
241.8 kB Preview Download

Additional details

Related works

Is continued by
Dataset: 10.5281/zenodo.10722953 (DOI)
Computational notebook: 10.5281/zenodo.10722953 (DOI)

Funding

CDT in Environmental Intelligence EP/S022074/1
Engineering and Physical Sciences Research Council

Dates

Collected
2023-01-07/2023-03-31