There is a newer version of the record available.

Published February 2, 2026 | Version 0.1.0
Dataset Open

TalentCLEF 2026 corpus: Skill and Job Title Intelligence for Human Capital Management

Description

🚨 Current Status: Release of Task A Development set and Task B Trainining Set.  To check when new data will be uploaded, please consult the calendar of the task

TalentCLEF2026 corpus - Task A development set and Task B training set

Introduction:

Given the positive response to the inaugural edition of TalentCLEF in 2025, with 76 registered teams, 15 submitted working notes, and 280 runs across both task. TalentCLEF continues to focus on developing and evaluating models that support three key objectives:

  1. Finding/ranking candidates for job positions based on their experience and professional skills.
  2. Implementing upskilling and reskilling strategies that promote the coninuous development of workers
  3. Detecting emerging skills and skills gaps of importance in organizations.

This year’s edition includes two tasks:

  • Task A - Contextualized Job-Person Matching. Develop systems that identify and rank the most suitable candidate résumés for a given job offer. For each job description in the test set, participants must submit a ranked list of candidate profiles relevant to the position.
  • Task B - Job-Skill Matching with Skill Type Classification. Develop systems that retrieve the relevant skills associated with a given job title and classify each retrieved skill as core, complementary, or transversa

This repo contains the data for these two tasks.

File structure: 

For a detailed description of the data structure, you can refer to the TalentCLEF2026 data description page, where it is thoroughly explained.

The files is organized into two *.zip files, TaskA.zip and TaskB.zip, each containing folders to support different stages of model development. So far, only the development test of Task A, and the training set of Task A has been released, but in future releases, as the tasks progress, additional data will be added to the different subfolders for each task.

TaskA includes language-specific subfolders within the directories, covering English and Spanish data. Development folders include two essential folders (queries, corpus), and a qrels file for evaluating model relevance to search queries. 

TaskA/
β”‚
β”œβ”€β”€ development/
β”‚ β”œβ”€β”€ english/ β”‚ β”‚ └── queries/
β”‚ β”‚ └── corpus/
β”‚ β”‚ └── qrels.tsv β”‚ └── spanish/
β”‚ └── queries/
β”‚ └── corpus/
β”‚ └── qrels.tsv β”‚ └── test/

TaskB follows a similar structure but without language-specific subfolders, providing general .tsv files for training, validation, and testing. This consistent file organization enables efficient data access and structured updates as new data versions are published.

TaskB/
β”‚
β”œβ”€β”€ training/
β”‚   β”œβ”€β”€ job2skill.tsv
β”‚   β”œβ”€β”€ jobid2terms.json
β”‚   └── skillid2terms.json
β”‚
β”œβ”€β”€ validation/
β”‚
└── test/

Tutorials: 

Notebook Link
Data Download and Load using Python  Link to Colab

 

Resources: 

 

 

Files

TaskA.zip

Files (6.0 MB)

Name Size Download all
md5:d6e8bd54f5dc39836e0167d09675c2f1
9.3 kB Preview Download
md5:3e131a1944eb674b07af0230a691b16b
1.2 kB Preview Download
md5:a966109f7e6c2a6c30111fe814fe0bae
1.8 MB Preview Download
md5:39f9c50f3cf2f419c57c4b9b2a52bf47
4.2 MB Preview Download