Mode Choice Modelling with Machine Learning: A Sequential Tour-based Approach for Addressing Imbalanced Datasets
Description
The continuous progress of machine learning has introduced numerous powerful classifiers that are examined as prominent alternatives to predict travellers' mode choices. However, most classifiers fail to capture the lower market share that characterizes the minority modes of transport. Although imbalanced choice datasets are common, this has been more apparent with the emergence of new modes and mobility services, which further fragment the mode choice composition. The problem is often magnified by biased sampling and measurement errors during the data collection process. The challenge of imbalanced classification in machine learning is subject of continuous multidisciplinary research, however its extensions in mode choice modelling, remain relatively unexplored. This paper provides empirical evidence of the effect that dataset imbalance might have on prediction measures and proposes a sequential tour-based framework for addressing skewed travel diary data. The framework is applied on a dataset from the city of Thessaloniki, Greece with a total of 5646 trips, using extreme gradient boosting (XGBoost). A set of performance metrics are used for the evaluation of the developed model and the output predictions are interpreted with partial dependence plots and state-of-the-art SHAP (SHapley Additive exPlanations) based on cooperative game theory. The results indicate that incorporating sequential effects can significantly improve the model’s overall performance, especially with regards to recognition rates for the minority mode, without inducing bias within the trained classifier.
Files
2021.01_TRB_Mode-Choice-Modelling-with-Machine-Learning_UCL.pdf
Files
(783.1 kB)
Name | Size | Download all |
---|---|---|
md5:3b30a6f18144ec04a983aea2edea170f
|
783.1 kB | Preview Download |