Published May 3, 2023 | Version 1.0.0
Dataset Open

MORFITT : A multi-label corpus of French scientific articles in the biomedical domain

  • 1. Avignon University
  • 2. Nantes University

Description

This article presents MORFITT, the first multi-label corpus in French annotated in specialties in the medical field. MORFITT is composed of 3~624 abstracts of scientific articles from PubMed, annotated in 12 specialties for a total of 5,116 annotations. We detail the corpus, the experiments and the preliminary results obtained using a classifier based on the pre-trained language model CamemBERT. These preliminary results demonstrate the difficulty of the task, with a weighted average F1-score of 61.78%.

Files

data-morfitt.zip

Files (2.0 MB)

Name Size Download all
md5:5fbc6f999188fc33c89c2c71abeb807a
2.0 MB Preview Download