There is a newer version of the record available.

Published October 13, 2025 | Version v4
Dataset Open

Deep learning four decades of human migration: datasets

  • 1. ROR icon University of Cambridge
  • 2. ROR icon University of Hong Kong
  • 3. ROR icon International Institute for Applied Systems Analysis

Description

This Zenodo repository contains all migration flow estimates associated with the paper "Deep learning four decades of human migration." Evaluation code, training data, trained neural networks, and smaller flow datasets are available in the main GitHub repository, which also provides detailed instructions on data sourcing. Due to file size limits, the larger datasets are archived here.

Data is available in both NetCDF (.nc) and CSV (.csv) formats. The NetCDF format is more compact and pre-indexed, making it suitable for large files. In Python, datasets can be opened as xarray.Dataset objects, enabling coordinate-based data selection.

Each dataset uses the following coordinate conventions:

  • Year: 1990–2023
  • Birth ISO: Country of birth (UN ISO3)
  • Origin ISO: Country of origin (UN ISO3)
  • Destination ISO: Destination country (UN ISO3)
  • Country ISO: Used for net migration data (UN ISO3)

The following data files are provided:

  • T.nc: Full table of flows disaggregated by country of birth. Dimensions: Year, Birth ISO, Origin ISO, Destination ISO
  • flows.nc: Total origin-destination flows (equivalent to T summed over Birth ISO). Dimensions: Year, Origin ISO, Destination ISO
  • net_migration.nc: Net migration data by country. Dimensions: Year, Country ISO
  • stocks.nc: Stock estimates for each country pair. Dimensions: Year, Origin ISO (corresponding to Birth ISO), Destination ISO
  • test_flows.nc: Flow estimates on a randomly selected set of test edges, used for model validation

Additionally, two CSV files are provided for convenience:

  • mig_unilateral.csv: Unilateral migration estimates per country, comprising:
    • imm: Total immigration flows
    • emi: Total emigration flows
    • net: Net migration
    • imm_pop: Total immigrant population (non-native-born)
    • emi_pop: Total emigrant population (living abroad)
  • mig_bilateral.csv: Bilateral flow data, comprising:
    • mig_prev: Total origin-destination flows
    • mig_brth: Total birth-destination flows, where Origin ISO reflects place of birth

Each dataset includes a mean variable (mean estimate) and a std variable (standard deviation of the estimate).

An ISO3 conversion table is also provided.

Files

Iso_code_lookup.csv

Files (3.5 GB)

Name Size Download all
md5:d7bdda6b9f225619bd5bffecf3ef1feb
14.5 MB Download
md5:f191e0a1503fcb7e50d8c7dbb881527b
6.3 kB Preview Download
md5:b009ff8877c256a7d1de767446864b68
145.6 MB Preview Download
md5:e750f0105e0ce547518bd919842f7a79
869.9 kB Preview Download
md5:f8808689ec70a25e7085486b0ee3e967
64.0 kB Download
md5:4b79d0d3d1a3c849639da4ceb975b66c
14.9 MB Download
md5:598bdcbfeffef84bc35e33929869c9e3
3.4 GB Download
md5:f43596c26cd59617644f3aeb5695d7f6
14.5 MB Download