Published September 18, 2023 | Version 0.0.1
Dataset Open

FRASIMED: a Clinical French Annotated Resource Produced through Crosslingual BERT-Based Annotation Projection

  • 1. Division of Medical Information Sciences, University Hospitals of Geneva - Department of Radiology and Medical Informatics, University of Geneva

Description

The French Annotated Resource with Semantic Information for Medical Entities Detection (FRASIMED) contains 2'051 synthetic clinical cases in French, with 24'037 annotated entities. The dataset contains two subsets:

  • CANTEMIST-FR: Originally from CANTEMIST (Miranda-Escalada et al. (2020)), it contains 1'301 oncological notes, with 15'978 annotations linked to an ICD-O-3.1 morphology code. Additionally, 15’457 of them are linked to a SNOMED-CT code.
  • DISTEMIST-FR: Originally from DISTEMIST's training set (Miranda-Escalada et al. (2022)), it contains 750 clinical cases, with 8'059 annotations, with 5'132 of them linked to a SNOMED-CT code.

Please, cite us:

Zaghir, J., Bjelogrlic, M., Goldman, J.-P., Aananou, S., Gaudet-Blavignac, & Lovis, C. (2023). FRASIMED: a Clinical French Annotated Resource Produced through Crosslingual BERT-Based Annotation Projection. arXiv preprint http://arxiv.org/abs/2309.10770

Files

FRASIMED.zip

Files (5.7 MB)

Name Size Download all
md5:8ef2068b479ee2ee438a2f908ecb2d55
5.7 MB Preview Download

Additional details

Funding

NCCR Evolving Language (phase I) 51NF40_180888
Swiss National Science Foundation