# European Region-to-Region Origin Destination Matrices

## Description of the data and file structure

The data in this repository was collected by the ESPON IRiE (Interregional Relations in Europe) project, which aimed to analyze and map various types of flows between European regions at the NUTS2 level. The project gathered data from multiple sources to create comprehensive origin-destination matrices for different types of interregional flows from 2010 to 2018. The experimental efforts involved:

1. Data collection from various national and European statistical offices, as well as specialized databases.
2. Harmonization of data across different countries and regions to ensure comparability.
3. Creation of origin-destination matrices for each type of flow (Migration, Tourism, FDI, Remittances, Freight transport, Passenger transport, Erasmus exchanges, and Horizon 2020 collaborations).
4. Validation and quality control of the collected data.

### Files and variables

#### File: Code\_and\_Data.zip

**Description:** 

1. Code.ipynb: A Jupyter notebook containing the Python code for data analysis.
2. dict_nodes.json: A JSON file with a dictionary mapping region names to their numerical representations, used for analyses like multilayer PageRank in MuxViz.
3. Flows folder: Contains 8 Excel files, each representing origin-destination matrices for different types of interregional flows.
4. infomap_multilayer.txt: An input file for Infomap, containing node names and the multilayer edgelist.
5. Multilayer_edgelist_r_int folder: Contains 9 multilayer edgelists (one for each year from 2010 to 2018) used in MuxViz for multilayer PageRank analysis.
6. MuxViz_Multilayer_Pagerank.r: An R script for performing multilayer PageRank analysis.
7. NUTS_RG_20M_2016_3035.shp: A folder containing files for the geodataframe,  used for spatial analysis or mapping.
8. pagerank_multilayer_results: A folder with 9 files (one per year) containing the results of the multilayer PageRank analysis from MuxViz.
9. Pop_GDP folder: Contains two CSV files with population and GDP data for European regions, used in the main Python code for normalizing flow data.
10. Sample_CREMA folder: Contains 7 subfolders, one for each flow type with data available in 2010 (excluding Horizon 2020, which starts in 2015). Each subfolder contains 50 samples generated by the CREMA (Correlation Reduced Maximum Entropy) null model used in the Python script.

## Code/software

The zip folder contains the following code:

1. Code.ipynb: A Jupyter notebook containing the Python code for data analysis.
2. MuxViz_Multilayer_Pagerank.r: An R script for performing multilayer PageRank analysis.

Software and packages used:

1. Python 3.11.1 with packages pandas, networkx, matplotlib, numpy as np,   geopandas, scipy.stats as stats, display from IPython.display, make_axes_locatable from mpl_toolkits.axes_grid1, re, json, os, NEMtropy, Infomap from infomap, Counter from collections, colorsys
2. R 4.4.0 with packages muxViz, igraph, RColorBrewer, ggraph, Matrix

## Access information

Other publicly accessible locations of the data:

* https://database.espon.eu/

Data was derived from the following sources:

* https://database.espon.eu/
* https://gis-portal.espon.eu/arcgis/apps/sites/#/irie-hub?

