There is a newer version of the record available.

Published June 4, 2024 | Version 1.0
Dataset Open

JobSet

Authors/Creators

Description

The online labour market's expansion presents unique opportunities to analyse job trends through Machine Learning (ML). However, the effectiveness of ML depends on access to well-labelled job advertisement datasets, which are often limited and require labour-intensive manual annotation. Our proposed solution, JobGen, leverages Large Language Models (LLMs) to generate synthetic Online Job Advertisements (OJAs), using real data and the ESCO taxonomy to ensure accurate representation of job market distributions. JobGen enhances data diversity and semantic alignment, addressing common issues in synthetic data generation. The resulting dataset, JobSet, provides a valuable resource for tasks like skill extraction and job matching and is openly available to the community.

Files

JobSet.csv

Files (29.6 MB)

Name Size Download all
md5:65db1f890b283c6468ef35d2397bcdd8
29.6 MB Preview Download

Additional details

Dates

Available
2023-06

Software

Repository URL
https://anonymous.4open.science/r/JobGen-DB24/README.md
Programming language
Python
Development Status
Active