Published May 3, 2023
| Version 1.0.0
Dataset
Open
MORFITT : A multi-label corpus of French scientific articles in the biomedical domain
Authors/Creators
- 1. Avignon University
- 2. Nantes University
Description
This article presents MORFITT, the first multi-label corpus in French annotated in specialties in the medical field. MORFITT is composed of 3~624 abstracts of scientific articles from PubMed, annotated in 12 specialties for a total of 5,116 annotations. We detail the corpus, the experiments and the preliminary results obtained using a classifier based on the pre-trained language model CamemBERT. These preliminary results demonstrate the difficulty of the task, with a weighted average F1-score of 61.78%.
Files
data-morfitt.zip
Files
(2.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5fbc6f999188fc33c89c2c71abeb807a
|
2.0 MB | Preview Download |