A machine learning approach to predicting the outcome of football matches
Authors/Creators
Description
In this dissertation we will present a study which concerns predicting football (soccer) matches using a large dataset. We will employ machine learning methods such as K-Nearest Neighbour, Random Forests, Support Vector Machine and logistic regression in models as a tool for predicting football match outcomes. These methods will be applied to a dataset of football matches from Europe’s top five leagues (England, France, Germany, Italy and Spain) from the 2017-18 season up to the 2021-2022 season. The dataset includes various types of match-level statistics like the result, total number of shots and expected goals (xG). The performance of our models are then compared against bookmaker’s predictions, as well as leading models that arose from the literature. The study revealed that features related to a team’s historic strength and recent form were the most important variables when predicting match outcomes. It also found that a support vector machine trained on the original training
set scored the highest accuracy score of 0.5221, whilst a support vector machine trained on data with balanced classes achieved the lowest accuracy of all the models, but the highest F1 score of 0.46. This trend was observed in comparisons between all models where the model trained on data with balanced classes achieved higher F1 score, but lower accuracy scores when compared to the same model trained on the original data with imbalanced classes.
Files
MachineLearningApproachToSoccerPrediction_02_00.pdf
Files
(477.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:fc2530baa98dbc2da352cb655c944011
|
477.5 kB | Preview Download |
Additional details
Dates
- Created
-
2023-08-24