Skill Extraction

Vásquez-Rodríguez, Laura; Audrin, Bertrand; Michel, Samuel; Galli, Samuele; Rogenhofer, Julneth; Negro Cusa, Jacopo; van der Plas, Lonneke

doi:10.34777/peay-xe10

Published November 3, 2025 | Version v1

Dataset Open

Skill Extraction

1. Idiap Research Institute
2. EHL Hospitality Business School
3. HES-SO University of Applied Sciences and Arts Western Switzerland
4. Università della Svizzera italiana

Description

This dataset is a collection of hard skill entities extracted from a corpus of resumes. It is designed to benchmark the differences in skill extraction performance between human annotators and automatic systems. The resource contains two types of labels:

Human-Annotated Labels: Created during an organized student workshop at the EHL Business School. Multiple annotations per CV were collected to establish a reliable consensus for the ground truth.
Automatic System Labels: Generated by a state-of-the-art supervised machine learning system and conversational LLM (see related paper).

Reference

If you use this dataset, please cite the following publication:

Vásquez-Rodríguez, L., Audrin, B., Michel, S., Galli, S., Rogenhofer, J., Negro Cusa, J., & Van Der Plas, L. (2025). A human perspective to ai-based candidate screening. Proceedings of the 58th Hawaii International Conference on System Sciences

Files

README.md

Files (112.5 kB)

Name	Size	Download all
all_systems_annotations.tsv md5:f5fc7fdd78c89661a37d4e9a13e9cd6b	111.4 kB	Download
README.md md5:88623d298c910dd297ab0bea0bedfda1	1.1 kB	Preview Download

Additional details

Is described by: Conference proceeding: https://publications.idiap.ch/downloads/papers/2025/Vasquez-Rodriguez_HICSS-58_2024.pdf (URL)

Innosuisse – Swiss Innovation Agency
SEM24 104.069 IP-ICT

	All versions	This version
Views	158	158
Downloads	87	87
Data volume	5.6 MB	5.6 MB

README.md

Files (112.5 kB)

Related works

Funding

Skill Extraction

Authors/Creators

Description

Files

README.md

Files (112.5 kB)

Additional details

Related works

Funding