Published November 16, 2023 | Version 1.0.1
Dataset Open

Open-source traffic and CO2 emission dataset for commercial aviation

  • 1. ISAE-SUPAERO
  • 2. Delft University of Technology
  • 3. Toulouse Business School
  • 1. ISAE-SUPAERO
  • 2. ROR icon Delft University of Technology
  • 3. ROR icon Toulouse Business School

Description

This record is a global open-source passenger air traffic dataset primarily dedicated to the research community. 
It gives a seating capacity available on each origin-destination route for a given year, 2019, and the associated aircraft and airline when this information is available. 

Context on the original work is given in the related article (https://journals.open.tudelft.nl/joas/article/download/7201/5683) and on the associated GitHub page (https://github.com/AeroMAPS/AeroSCOPE/).
A simple data exploration interface will be available at www.aeromaps.eu/aeroscope.
The dataset was created by aggregating various available open-source databases with limited geographical coverage. It was then completed using a route database created by parsing Wikipedia and Wikidata, on which the traffic volume was estimated using a machine learning algorithm (XGBoost) trained using traffic and socio-economical data.
 


1- DISCLAIMER


The dataset was gathered to allow highly aggregated analyses of the air traffic, at the continental or country levels. At the route level, the accuracy is limited as mentioned in the associated article and improper usage could lead to erroneous analyses. 

Although all sources used are open to everyone, the Eurocontrol database is only freely available to academic researchers. It is used in this dataset in a very aggregated way and under several levels of abstraction. As a result, it is not distributed in its original format as specified in the contract of use.

As a general rule, we decline any responsibility for any use that is contrary to the terms and conditions of the various sources that are used. In case of commercial use of the database, please contact us in advance.


2- DESCRIPTION

Each data entry represents an (Origin-Destination-Operator-Aircraft type) tuple.

Please refer to the support article for more details (see above).

The dataset contains the following columns:

  • "First column" : index
  • airline_iata : IATA code of the operator in nominal cases. An ICAO -> IATA code conversion was performed for some sources, and the ICAO code was kept if no match was found.
  • acft_icao : ICAO code of the aircraft type
  • acft_class : Aircraft class identifier, own classification.
    • WB: Wide Body
    • NB: Narrow Body
    • RJ: Regional Jet
    • PJ: Private Jet
    • TP: Turbo Propeller
    • PP: Piston Propeller
    • HE: Helicopter
    • OTHER
  • seymour_proxy: Aircraft code for Seymour Surrogate (https://doi.org/10.1016/j.trd.2020.102528), own classification to derive proxy aircraft when nominal aircraft type unavailable in the aircraft performance model.
  • source: Original data source for the record, before compilation and enrichment.
    • ANAC: Brasilian Civil Aviation Authorities
    • AUS Stats: Australian Civil Aviation Authorities
    • BTS: US Bureau of Transportation Statistics T100
    • Estimation: Own model, estimation on Wikipedia-parsed route database
    • Eurocontrol: Aggregation and enrichment of R&D database
    • OpenSky
    • World Bank
  • seats: Number of seats available for the data entry, AFTER airport residual scaling
  • n_flights: Number of flights of the data entry, when available
  • iata_departure, iata_arrival : IATA code of the origin and destination airports. Some BTS inhouse identifiers could remain but it is marginal.
  • departure_lon, departure_lat, arrival_lon, arrival_lat : Origin and destination coordinates, could be NaN if the IATA identifier is erroneous
  • departure_country, arrival_country: Origin and destination country ISO2 code. WARNING: disable NA (Namibia) as default NaN at import
  • departure_continent, arrival_continent: Origin and destination continent code. WARNING: disable NA (North America) as default NaN at import
  • seats_no_est_scaling: Number of seats available for the data entry, BEFORE airport residual scaling
  • distance_km: Flight distance (km)
  • ask: Available Seat Kilometres
  • rpk: Revenue Passenger Kilometres (simple calculation from ASK using IATA average load factor)
  • fuel_burn_seymour: Fuel burn per flight (kg) when seymour proxy available
  • fuel_burn: Total fuel burn of the data entry (kg)
  • co2: Total CO2 emissions of the data entry (kg)
  • domestic: Domestic/international boolean (Domestic=1, International=0)

 

3- Citation

Please cite the support paper instead of the dataset itself. 

Salgas, A., Sun, J., Delbecq, S., Planès, T., & Lafforgue, G. (2023). Compilation of an open-source traffic and CO2 emissions dataset for commercial aviation. Journal of Open Aviation Science. https://doi.org/10.59490/joas.2023.7201

Files

AeroSCOPE_global_aviation_traffic_dataset_16_11.csv

Files (66.8 MB)

Additional details

Related works

Is described by
Conference paper: 10.59490/joas.2023.7201 (DOI)

Dates

Created
2023-10-30