Published November 3, 2025
| Version v1
Dataset
Open
Skill Extraction
Authors/Creators
Description
Description
This dataset is a collection of hard skill entities extracted from a corpus of resumes. It is designed to benchmark the differences in skill extraction performance between human annotators and automatic systems. The resource contains two types of labels:
- Human-Annotated Labels: Created during an organized student workshop at the EHL Business School. Multiple annotations per CV were collected to establish a reliable consensus for the ground truth.
- Automatic System Labels: Generated by a state-of-the-art supervised machine learning system and conversational LLM (see related paper).
Reference
If you use this dataset, please cite the following publication:
Vásquez-Rodríguez, L., Audrin, B., Michel, S., Galli, S., Rogenhofer, J., Negro Cusa, J., & Van Der Plas, L. (2025). A human perspective to ai-based candidate screening. Proceedings of the 58th Hawaii International Conference on System Sciences
Files
README.md
Files
(112.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:f5fc7fdd78c89661a37d4e9a13e9cd6b
|
111.4 kB | Download |
|
md5:88623d298c910dd297ab0bea0bedfda1
|
1.1 kB | Preview Download |
Additional details
Related works
- Is described by
- Conference proceeding: https://publications.idiap.ch/downloads/papers/2025/Vasquez-Rodriguez_HICSS-58_2024.pdf (URL)