TalentCLEF 2026 corpus: Skill and Job Title Intelligence for Human Capital Management

Gasco, Luis; Fabregat Marcos, Hermenegildo; Carrino, Casimiro Pio; García-Sardiña, Laura; Deniz, Daniel; Zbib, Rabih; Decorte, Jens-Joris; De Lange, Matthias; Rodrigo, Alvaro

doi:10.5281/zenodo.17625262

Published November 17, 2025 | Version 0.0.1

Dataset Open

TalentCLEF 2026 corpus: Skill and Job Title Intelligence for Human Capital Management

1. Avature
2. TechWolf
3. National University of Distance Education

TalentCLEF2026 corpus - sample set. To check when new data will be uploaded, please consult the calendar of the task

Introduction:

Given the positive response to the inaugural edition of TalentCLEF in 2025, with 76 registered teams, 15 submitted working notes, and 280 runs across both task. TalentCLEF continues to focus on developing and evaluating models that support three key objectives:

Finding/ranking candidates for job positions based on their experience and professional skills.
Implementing upskilling and reskilling strategies that promote the coninuous development of workers
Detecting emerging skills and skills gaps of importance in organizations.

This year’s edition includes two tasks:

Task A - Contextualized Job-Person Matching. Develop systems that identify and rank the most suitable candidate résumés for a given job offer. For each job description in the test set, participants must submit a ranked list of candidate profiles relevant to the position.
Task B - Job-Skill Matching with Skill Type Classification. Develop systems that retrieve the relevant skills associated with a given job title and classify each retrieved skill as core, complementary, or transversa

This repo contains the data for these two tasks.

File structure:

The files will be organized into two *.zip files, TaskA and TaskB, each containing training, validation and test folders to support different stages of model development. Until the official release of the full training set, users can access a sample version of the data that will be used through the sampleset_TaskA.zip and sampleset_TaskB.zip files to visualize the type of dataset they will encounter during the development of TalentCLEF

TaskA includes language-specific subfolders within the directories, covering English and Spanish data. Development folders include two essential folders (queries, corpus_elements), and a q_rels file for evaluating model relevance to search queries.

TaskA/
│
├── development/
│   ├── english/
│   │   └── queries/
│   │   └── corpus_elements/
│   │   └── qrels.tsv
│   └── spanish/
│
└── test/

TaskB follows a similar structure but without language-specific subfolders, providing general .tsv files for training, validation, and testing. This consistent file organization enables efficient data access and structured updates as new data versions are published. The training set (and the sample set) does not include information about the labels, that will be shown with the release of the training set.

TaskB/
│
├── training/
│   └── taskB_training.tsv
│
├── validation/
│
└── test/

Resources:

Web
Additional resources
More resources soon.

Files

sampleset_TaskA.zip

Files (6.3 kB)

Name	Size	Download all
sampleset_TaskA.zip md5:00d7de438ce858fd29bc9ecc0515a4ab	5.0 kB	Preview Download
sampleset_TaskB.zip md5:3e131a1944eb674b07af0230a691b16b	1.2 kB	Preview Download

	All versions	This version
Views	132	38
Downloads	26	1
Data volume	147.2 kB	1.2 kB

TalentCLEF 2026 corpus: Skill and Job Title Intelligence for Human Capital Management

Creators

Description

Files

sampleset_TaskA.zip

Files (6.3 kB)