TalentCLEF 2026 corpus: Skill and Job Title Intelligence for Human Capital Management

Gasco, Luis; Fabregat Marcos, Hermenegildo; Carrino, Casimiro Pio; García-Sardiña, Laura; Deniz, Daniel; Zbib, Rabih; Decorte, Jens-Joris; De Lange, Matthias; Rodrigo, Alvaro

doi:10.5281/zenodo.18449283

Published February 2, 2026 | Version 0.1.0

Dataset Open

TalentCLEF 2026 corpus: Skill and Job Title Intelligence for Human Capital Management

1. Avature
2. TechWolf
3. National University of Distance Education

🚨 Current Status: Release of Task A Development set and Task B Trainining Set. To check when new data will be uploaded, please consult the calendar of the task

TalentCLEF2026 corpus - Task A development set and Task B training set

Introduction:

Given the positive response to the inaugural edition of TalentCLEF in 2025, with 76 registered teams, 15 submitted working notes, and 280 runs across both task. TalentCLEF continues to focus on developing and evaluating models that support three key objectives:

Finding/ranking candidates for job positions based on their experience and professional skills.
Implementing upskilling and reskilling strategies that promote the coninuous development of workers
Detecting emerging skills and skills gaps of importance in organizations.

This year’s edition includes two tasks:

Task A - Contextualized Job-Person Matching. Develop systems that identify and rank the most suitable candidate résumés for a given job offer. For each job description in the test set, participants must submit a ranked list of candidate profiles relevant to the position.
Task B - Job-Skill Matching with Skill Type Classification. Develop systems that retrieve the relevant skills associated with a given job title and classify each retrieved skill as core, complementary, or transversa

This repo contains the data for these two tasks.

File structure:

For a detailed description of the data structure, you can refer to the TalentCLEF2026 data description page, where it is thoroughly explained.

The files is organized into two *.zip files, TaskA.zip and TaskB.zip, each containing folders to support different stages of model development. So far, only the development test of Task A, and the training set of Task A has been released, but in future releases, as the tasks progress, additional data will be added to the different subfolders for each task.

TaskA includes language-specific subfolders within the directories, covering English and Spanish data. Development folders include two essential folders (queries, corpus), and a qrels file for evaluating model relevance to search queries.

TaskA/
│
├── development/
│   ├── english/
│   │   └── queries/
│   │   └── corpus/
│   │   └── qrels.tsv
│   └── spanish/
│       └── queries/
│       └── corpus/
│       └── qrels.tsv
│
└── test/

TaskB follows a similar structure but without language-specific subfolders, providing general .tsv files for training, validation, and testing. This consistent file organization enables efficient data access and structured updates as new data versions are published.

TaskB/
│

├── training/
│   ├── job2skill.tsv
│   ├── jobid2terms.json
│   └── skillid2terms.json
│
├── validation/
│
└── test/

Tutorials:

Notebook	Link
Data Download and Load using Python	Link to Colab

Resources:

Web
Additional resources
More resources soon.

Files

TaskA.zip

Files (6.0 MB)

Name	Size	Download all
sampleset_TaskA.zip md5:d6e8bd54f5dc39836e0167d09675c2f1	9.3 kB	Preview Download
sampleset_TaskB.zip md5:3e131a1944eb674b07af0230a691b16b	1.2 kB	Preview Download
TaskA.zip md5:a966109f7e6c2a6c30111fe814fe0bae	1.8 MB	Preview Download
TaskB.zip md5:39f9c50f3cf2f419c57c4b9b2a52bf47	4.2 MB	Preview Download

	All versions	This version
Views	1,736	362
Downloads	1,107	336
Data volume	2.3 GB	669.9 MB

TalentCLEF 2026 corpus: Skill and Job Title Intelligence for Human Capital Management

Authors/Creators

Description

Files

TaskA.zip

Files (6.0 MB)