Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published December 30, 2019 | Version v1
Journal article Open

Data Mining Techniques for Analysing Employment Data

  • 1. Lecturer, NUI, Galway, Ireland
  • 1. Publisher

Description

This paper proposes a methodology that uses a large-scale employment dataset in order to explore which factors affect employment and how. The proposed methodology is a combination of predictive modelling, variable significance analysis, and VEC analysis. Modelling is based on logistic regression, linear discriminant analysis, neural network, classification tree, and support vector machine. Following the CRISP-DM standard process model, we train binary classifiers optimising their hyper-parameters and measure their performance by prediction accuracy, ROC analysis, and AUC. Using sensitivity analysis, we rank the variable significance in order to identify and measure factors of employment. Using VEC analysis, we further explore how values of those factors affect employment. Findings show that best performing models are neural networks and support vector machines with preference to the latter for quality of VEC. Experiments also suggest that education and age are primary contributors for correct classification with specific value distribution, discussed in the paper. All results were validated using a rigorous testing procedure that involves training, validation, and test data partitions and a combination of multiple runs along with three-fold cross-validation. This study addresses some gaps in previous research publications, which lack quantification of the conclusions made.

Files

B3311129219.pdf

Files (1.2 MB)

Name Size Download all
md5:88b65e871799dc8af29de2b7ca29a7af
1.2 MB Preview Download

Additional details

Related works

Is cited by
Journal article: 2249-8958 (ISSN)

Subjects

ISSN
2249-8958
Retrieval Number
B3311129219/2019©BEIESP