TalentCLEF 2025 corpus: Skill and Job Title Intelligence for Human Capital Management
Description
TalentCLEF2025 corpus - sample set. To check when new data will be uploaded, please consult the calendar of the task
Introduction:
The first edition of TalentCLEF aims to develop and evaluate models designed to facilitate three essential tasks:
- Finding/ranking candidates for job positions based on their experience and professional skills.
- Implementing upskilling and reskilling strategies that promote the coninuous development of workers
- Detecting emerging skills and skills gaps of importance in organizations.
With that aim, the task is divided into two tasks:
- Task A - Multilingual Job Title Matching. This task involves developing systems to identify and rank the job titles most similar to a given one by generating a ranked list of similar titles from a specified knowledge base for each job title in a provided test set.
- Task B - Job Title-Based Skill Prediction. Task B requires developing systems that can retrieve relevant skills associated with a specified job title.
This repo contains the data for these two tasks.
File structure:
The files will be organized into two *.zip files, TaskA and TaskB, each containing training, validation and test folders to support different stages of model development. Until the official release of the full training set, users can access a sample version of the data through the sampleset_TaskA.zip and sampleset_TaskB.zip files.
TaskA includes language-specific subfolders within the training and validation directories, covering English, Spanish, German, and Chinese job title data. The tr*aining folders for TaskA contain language-specific .tsv files for each respective language. Validation folders include three essential files—queries, corpus_elements, and q_rels—for evaluating model relevance to search queries. TaskA’s test folder has queries and corpus_elements files for testing retrieval.
TaskA/
│
├── training/
│ ├── english/
│ │ └── taskA_training_en.tsv
│ ├── spanish/
│ │ └── taskA_training_es.tsv
│ └── german/
│ └── taskA_training_de.tsv
│
├── validation/
│ ├── english/
│ │ ├── queries
│ │ ├── corpus_elements
│ │ └── q_rels
│ ├── spanish/
│ ├── german/
│ └── chinese/
│
└── test/
├── queries
└── corpus_elements
TaskB follows a similar structure but without language-specific subfolders, providing general .tsv files for training, validation, and testing. This consistent file organization enables efficient data access and structured updates as new data versions are published.
TaskB/
│
├── training/
│ └── taskB_training.tsv
│
├── validation/
│ ├── queries
│ ├── corpus_elements
│ └── q_rels
│
└── test/
├── queries
└── corpus_elements
Resources:
- Web
- More resources soon.
Files
sampleset_TaskA.zip
Files
(4.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d197e9f2388fbaae12707e8f92d238ef
|
4.0 kB | Preview Download |
|
md5:593a2fc935cfb69c08da13f6f3831ce5
|
870 Bytes | Preview Download |