File uploads: We have fixed an issue which caused file uploads to fail. We apologise for the inconvenience it may have caused.

Published November 1, 2022 | Version V1.0
Dataset Open

Large Landing Trajectory Data Set for Go-Around Analysis

Description

Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

If you use this data for a scientific publication, please consider citing our paper.

The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

go_arounds_minimal.csv.gz

Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

 
Column name Type Description
time date time UTC time of landing or first GA attempt
icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign string Aircraft identifier in air-ground communications
airport string ICAO airport code where the aircraft is landing
runway string Runway designator on which the aircraft landed
has_ga string "True" if at least one GA was performed, otherwise "False"
n_approaches integer Number of approaches identified for this flight
n_rwy_approached integer Number of unique runways approached by this flight

The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

go_arounds_augmented.csv.gz

Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description
time date time UTC time of landing or first GA attempt
icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign string Aircraft identifier in air-ground communications
airport string ICAO airport code where the aircraft is landing
runway string Runway designator on which the aircraft landed
has_ga string "True" if at least one GA was performed, otherwise "False"
n_approaches integer Number of approaches identified for this flight
n_rwy_approached integer Number of unique runways approached by this flight
registration string Aircraft registration
typecode string Aircraft ICAO typecode
icaoaircrafttype string ICAO aircraft type
wtc string ICAO wake turbulence category
glide_slope_angle float Angle of the ILS glide slope in degrees
has_intersection

string

Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length float Length of the runway in kilometre
airport_country string ISO Alpha-3 country code of the airport
airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
operator_country string ISO Alpha-3 country code of the operator
operator_region string Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
wind_speed_knts integer METAR, surface wind speed in knots
wind_dir_deg integer METAR, surface wind direction in degrees
wind_gust_knts integer METAR, surface wind gust speed in knots
visibility_m float METAR, visibility in m
temperature_deg integer METAR, temperature in degrees Celsius
press_sea_level_p float METAR, sea level pressure in hPa
press_p float METAR, QNH in hPA
weather_intensity list METAR, list of present weather codes: qualifier - intensity
weather_precipitation list METAR, list of present weather codes: weather phenomena - precipitation
weather_desc list METAR, list of present weather codes: qualifier - descriptor
weather_obscuration list METAR, list of present weather codes: weather phenomena - obscuration
weather_other list METAR, list of present weather codes: weather phenomena - other

This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

go_arounds_agg.csv.gz

Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

Column name Type Description
airport string ICAO airport code where the aircraft is landing
runway string Runway designator on which the aircraft landed
n_landings integer Total number of landings observed on this runway in 2019
ga_rate float Go-around rate, per 1000 landings
glide_slope_angle float Angle of the ILS glide slope in degrees
has_intersection string Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length float Length of the runway in kilometres
airport_country string ISO Alpha-3 country code of the airport
airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)

This aggregated data set is used in the paper for the generalized linear regression model.

Downloading the trajectories

Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

import datetime
from tqdm.auto import tqdm
import pandas as pd
from traffic.data import opensky
from traffic.core import Traffic

# load minimum data set
df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False)
df["time"] = pd.to_datetime(df["time"])

# select London City Airport, go-arounds, and 2019-01-04
airport = "EGLC"
start = datetime.datetime(year=2019, month=1, day=4).replace(
    tzinfo=datetime.timezone.utc
)
stop = datetime.datetime(year=2019, month=1, day=5).replace(
    tzinfo=datetime.timezone.utc
)

df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

# iterate over flights and pull the data from OpenSky Network
flights = []
delta_time = pd.Timedelta(minutes=10)
for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]):
    # take at most 10 minutes before and 10 minutes after the landing or go-around
    start_time = row["time"] - delta_time
    stop_time = row["time"] + delta_time

    # fetch the data from OpenSky Network
    flights.append(
        opensky.history(
            start=start_time.strftime("%Y-%m-%d %H:%M:%S"),
            stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"),
            callsign=row["callsign"],
            return_flight=True,
        )
    )
# The flights can be converted into a Traffic object
Traffic.from_flights(flights)

Additional files

Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

  • validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.
  • validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.

Notes

This research was funded by the Swiss Federal Office of Civil Aviation grant number SFLV 2018-037.

Files

validation_sample.zip

Files (1.7 GB)

Name Size Download all
md5:905b04cc295008c66a64d63fcc5231b7
15.1 kB Download
md5:dbcc6b2a60973b1fef9444a72d6085c0
172.7 MB Download
md5:f74dec6f3175bed294f5b5e4ca551403
92.8 MB Download
md5:6d86f379ecf58675d03522f4891c88cf
1.4 GB Preview Download
md5:af317a742264a426a2933f7abd0c8a47
82.6 kB Download

Additional details

Related works

Is published in
Conference paper: 10.3390/engproc2022028002 (DOI)