Published November 3, 2025 | Version v1
Dataset Open

Skill Extraction

  • 1. ROR icon Idiap Research Institute
  • 2. ROR icon EHL Hospitality Business School
  • 3. ROR icon HES-SO University of Applied Sciences and Arts Western Switzerland
  • 4. ROR icon Università della Svizzera italiana

Description

Description

This dataset is a collection of hard skill entities extracted from a corpus of resumes. It is designed to benchmark the differences in skill extraction performance between human annotators and automatic systems. The resource contains two types of labels:

  1. Human-Annotated Labels: Created during an organized student workshop at the EHL Business School. Multiple annotations per CV were collected to establish a reliable consensus for the ground truth.
  2. Automatic System Labels: Generated by a state-of-the-art supervised machine learning system and conversational LLM (see related paper).

 

Reference

If you use this dataset, please cite the following publication:

Vásquez-Rodríguez, L., Audrin, B., Michel, S., Galli, S., Rogenhofer, J., Negro Cusa, J., & Van Der Plas, L. (2025). A human perspective to ai-based candidate screening. Proceedings of the 58th Hawaii International Conference on System Sciences

Files

README.md

Files (112.5 kB)

Name Size Download all
md5:f5fc7fdd78c89661a37d4e9a13e9cd6b
111.4 kB Download
md5:88623d298c910dd297ab0bea0bedfda1
1.1 kB Preview Download

Additional details

Funding

Innosuisse – Swiss Innovation Agency
SEM24 104.069 IP-ICT