TalentCLEF 2026 corpus: Skill and Job Title Intelligence for Human Capital Management
Creators
Description
TalentCLEF2026 corpus - sample set. To check when new data will be uploaded, please consult the calendar of the task
Introduction:
Given the positive response to the inaugural edition of TalentCLEF in 2025, with 76 registered teams, 15 submitted working notes, and 280 runs across both task. TalentCLEF continues to focus on developing and evaluating models that support three key objectives:
- Finding/ranking candidates for job positions based on their experience and professional skills.
- Implementing upskilling and reskilling strategies that promote the coninuous development of workers
- Detecting emerging skills and skills gaps of importance in organizations.
This year’s edition includes two tasks:
- Task A - Contextualized Job-Person Matching. Develop systems that identify and rank the most suitable candidate résumés for a given job offer. For each job description in the test set, participants must submit a ranked list of candidate profiles relevant to the position.
- Task B - Job-Skill Matching with Skill Type Classification. Develop systems that retrieve the relevant skills associated with a given job title and classify each retrieved skill as core, complementary, or transversa
This repo contains the data for these two tasks.
File structure:
The files will be organized into two *.zip files, TaskA and TaskB, each containing training, validation and test folders to support different stages of model development. Until the official release of the full training set, users can access a sample version of the data that will be used through the sampleset_TaskA.zip and sampleset_TaskB.zip files to visualize the type of dataset they will encounter during the development of TalentCLEF
TaskA includes language-specific subfolders within the directories, covering English and Spanish data. Development folders include two essential folders (queries, corpus_elements), and a q_rels file for evaluating model relevance to search queries.
TaskA/
│
├── development/
│ ├── english/
│ │ └── queries/
│ │ └── corpus_elements/
│ │ └── qrels.tsv
│ └── spanish/
│
└── test/
TaskB follows a similar structure but without language-specific subfolders, providing general .tsv files for training, validation, and testing. This consistent file organization enables efficient data access and structured updates as new data versions are published. The training set (and the sample set) does not include information about the labels, that will be shown with the release of the training set.
TaskB/
│
├── training/
│ └── taskB_training.tsv
│
├── validation/
│
└── test/
Resources:
- Web
- Additional resources
- More resources soon.
Files
sampleset_TaskA.zip
Files
(6.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:00d7de438ce858fd29bc9ecc0515a4ab
|
5.0 kB | Preview Download |
|
md5:3e131a1944eb674b07af0230a691b16b
|
1.2 kB | Preview Download |