-------------------
GENERAL INFORMATION
-------------------

1. Title and description of the dataset:
Replication data for "Cross-location Air Quality Forecasting in Smart Cities: A Deep Learning Approach"
This dataset contains historical air quality measurements from two major urban areas: Madrid, Spain and Cali, Colombia, obtained from api.aqi.in. The data were collected from 15 fixed monitoring stations in Madrid and 18 in Cali, offering a comprehensive view of air pollution levels in each city over time.

2. Author's contact information:
Francisco-Jose Alvarado-Alcon, franciscojose.alvarado@upct.es, ORCID 0000-0002-3416-9547, Department of Information and Communication Technologies, Universidad Politécnica de Cartagena
Rafael Asorey-Cacheda, rafael.asorey@upct.es, ORCID 0000-0003-0722-4181, Department of Information and Communication Technologies, Universidad Politécnica de Cartagena
Antonio-Javier Garcia-Sanchez, antoniojavier.garcia@upct.es, ORCID 0000-0001-5095-3035, Department of Information and Communication Technologies, Universidad Politécnica de Cartagena
Joan Garcia-Haro, joang.haro@upct.es, ORCID 0000-0003-0741-7530, Department of Information and Communication Technologies, Universidad Politécnica de Cartagena
Laura Garcia, laura.garcia@upct.es, ORCID 0000-0003-2902-5757, Department of Information and Communication Technologies, Universidad Politécnica de Cartagena

3. Date of data collection:
2024-06-13/2024-12-31

4. Geographic location of data collection:
Madrid, Spain, Europe and Cali, Colombia, America

5. Information about funding sources that supported the collection of the data:
This work was supported by the grant PID2023-148214OB-C21 funded by MICIU/AEI/10.13039/501100011033 and by FEDER/EU. This work was also supported in part by the grants TED2021-129336B-I00 funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR. This work was also supported by the grant PCI2024-153485 funded by MICIU/AEI/10.13039/501100011033 and by the European Union. This research was also funded by the PRIMA Programme under Grant Agreement No. 2431 (FUSION: Comprehensive and sustainable solution to minimize food loss and waste and promote food security in the Mediterranean region). This work was also funded by Fundación Séneca (22236/PDC/23). This work was also a result of the ThinkInAzul and AgroAlNext programmes, funded by Ministerio de Ciencia, Innovación y Universidades (MICIU) with funding from European Union NextGenerationEU/PRTR-C17.I1 and by Fundación Séneca with funding from Comunidad Autónoma Región de Murcia (CARM). The work of Francisco-Jose Alvarado-Alcon was supported by the Spain’s Ministry of Universities under Grant FPU22/00316.

6. Recommended citation for this dataset:
Alvarado-Alcon, F.-J., Asorey-Cacheda, R., Garcia-Sanchez, A.-J., Garcia-Haro, J., & García, L. (2025). Replication data for "Cross-location Air Quality Forecasting in Smart Cities: A Deep Learning Approach" (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17074061

7. Language of the dataset:
Not applicable


----------------------------------
SHARING/ACCESS/CONTEXT INFORMATION
----------------------------------

1. Usage Licenses/restrictions placed on the data:
The original data is made available through public monitoring systems by the respective municipal governments listed above. The data was accessed via api.aqi.in, which aggregates publicly available air quality information. This dataset is shared for academic and research purposes only, and to the best of our knowledge, the underlying data is in the public domain.

4. Links to other publicly accessible locations of the data:
https://www.cali.gov.co/dagma/publicaciones/38365/sistema-de-vigilancia-de-calidad-del-aire-de-cali-svcac
https://airedemadrid.madrid.es/portal/site/calidadaire

6. Was data derived from another source? If so, please add link where such work is located:
https://api.aqi.in


--------------------
DATA & FILE OVERVIEW
--------------------

1. File List:
CaliData.csv
CaliDataDecember.csv 
MadridData.csv 
MadridDataDecember.csv
Readme.txt

2. File format:
All data is in CSV format


--------------------------
METHODOLOGICAL INFORMATION
--------------------------

1. Description of methods used for collection/generation of data:
Data was retrieved via the endpoint: https://api.aqi.in/api/v1/getMonitorsByCity, with the corresponding city specified in the request header. Records were collected at 5-minute intervals and stored in a MongoDB database in JSON format whenever new data became available.

2. Methods for processing the data:
Non-essential fields (e.g., units, location name) were removed. Sensor topics were simplified by retaining only the final four digits of the topic tag. In Cali, 39 sensors that appeared only once were excluded to ensure data reliability.

8. Author contact information:
Francisco-Jose Alvarado-Alcon, franciscojose.alvarado@upct.es, Department of Information and Communication Technologies, Universidad Politécnica de Cartagena


-------------------------
DATA-SPECIFIC INFORMATION:
-------------------------

1. Number of variables:
13

2. Number of cases/rows:
CaliData.csv - 182631
CaliDataDecember.csv - 26409
MadridData.csv - 30627
MadridDataDecember.csv - 19820

3. Variable List:
timestamp,topic,object_o3,object_no2,object_pm2_5,object_pm10,object_co,object_so2,object_temp,object_lat,object_lon,object_dew,object_hum

5. Specialized formats or other abbreviations used:
topic - integer that serves to uniquely identify the monitoring station that produced the measurement
object_o3 - Ozone (O3)
object_no2 - Nitrogen dioxide (NO2)
object_pm2_5 - Fine particulate matter (PM2.5)
object_pm10 - Particulate matter (PM10)
object_co - Carbon monoxide (CO)
object_so2 - Sulfur dioxide (SO2)
object_temp - Temperature
object_lat - Latitude
object_lon - Longitude
object_dew - Dew point
object_hum - Relative humidity