Deep Learning with Satellite Images Enables High-Resolution Income Estimation: a Case Study of Buenos Aires
Authors/Creators
Description
This repository contains the datasets required for replicating the results in Abbate et al (forthcoming). The datasets also include per capita income estimates at a 50x50 meter resolution for the years 2013, 2018, and 2022, using satellite images from the Metropolitan Area of Buenos Aires (Argentina) and 2010 census+survey data. The model, based on the EfficientnetV2 architecture, achieved high accuracy in predicting household incomes (R2=0.878), surpassing existing methods in spatial resolution and performance.
Inside the Replication Package folder, the user can replicate the main results from the paper. This includes:
-
Small Area Estimation (SAE) Replication:
-
Argentina Household Survey Data (EPH): Processed microdata for 2010, 2013, 2018, and 2022 (ARG_*_EPHC-S2_*.dta).
-
Argentina Census Microdata: Raw 2010 census microdata (censo2010_fullraw_p.dta).
-
Census Tract Map: Shapefile of 2010 census tracts (radios_eph_with_link.shp).
-
SAE Output: The final small_area_estimates.parquet file containing census tract-level population and estimated income, which serves as labels for the CNN model.
-
-
CNN-based Income Prediction Replication (Paper Results):
-
CNN Model Income Predictions: Gridded 50x50m income estimates for Buenos Aires for 2013, 2018, and 2022 (income_estimates_*.shp).
-
Normalization Scalars: A CSV file (scalars_ln_pred_inc_mean_trimTrue.csv) to convert the model's log-scale outputs into real income values (2010 PPP-adjusted Argentinian pesos).
-
World Settlement Footprint (WSF): Satellite-based data (WSF2015_v2_-60_-36.tif) used to mask predictions in uninhabited areas.
-
Key prediction datasets are published in shapefile format, while input data for SAE and other auxiliary files are in formats like .dta, .parquet, .csv, and .tif.
Results can be replicated by connecting these datasets with the scripts available at the GitHub repo linked below.
For researchers who wish to replicate the full analysis pipeline starting from the original source imagery, the data must be acquired commercially. The proprietary Pleiades and Pleiades NEO satellite imagery is owned by Airbus and can be purchased through their data portal: https://space-solutions.airbus.com/imagery/. To facilitate this process, we provide the unique product identifiers for each scene used in this study. These identifiers can be used to query the Airbus archive and purchase the exact scenes.
- Pléiades: for 2013 imagery the IDs are DS_PHR1A_201302051411520_FR1_PX_W059S35_0807_03124, DS_PHR1A_201302071357305_FR1_PX_W059S35_0410_06105 and DS_PHR1A_201302071357509_FR1_PX_W059S35_0609_05426, and for 2018, DS_PHR1A_201803251356358_FR1_PX_W059S35_0909_03875, DS_PHR1A_201808021356574_FR1_PX_W059S35_0509_06938 and DS_PHR1A_201808021357186_FR1_PX_W059S35_0706_06104.
- Pleiades NEO: for 2022 imagery the IDs used are 000047717_1_22_STD_A, 000047717_1_24_STD_A, 000047717_1_25_STD_A, 000047717_1_26_STD_A, 000058605_1_3_STD_A, 000058605_1_4_STD_A, 000058605_1_7_STD_A, and 000058608_1_2_STD_A.
Important Usage Note: Since the predictions for each 50x50m cell individually present some random variation, we recommend that the results are used by averaging out the estimations for each area of interest (e.g., municipalities, neighborhoods, sections, or census tracts) and not at an individual cell level. As detailed throughout the paper, the aggregated results, even in small areas such as census tracts, predict household incomes with precision.
Furthermore, inside this repository, it is possible to access and use the model’s trained parameters to make predictions about different satellite images.
Data can be visualized by accessing: https://ingresoamba.netlify.app
Files
EfficientNetV2S Trained Model Weights.zip
Files
(1.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:701e9aef595f7d9d536c991fba11a83e
|
227.4 MB | Preview Download |
|
md5:4afa33b1a2394a9fa433682cc68375e7
|
852.6 MB | Preview Download |
Additional details
Additional titles
- Translated title (Spanish)
- Mapeando el ingreso del Área Metropolitana de Buenos Aires en alta resolución: Un enfoque basado en Redes Neuronales aplicadas a imágenes
Software
- Repository URL
- https://github.com/Queeno11/Deep-learning-with-satellite-images-income-estimation-in-Buenos-Aires
- Programming language
- Python , Stata
- Development Status
- Active