Published September 18, 2023
| Version 0.0.1
Dataset
Open
FRASIMED: a Clinical French Annotated Resource Produced through Crosslingual BERT-Based Annotation Projection
Creators
- 1. Division of Medical Information Sciences, University Hospitals of Geneva - Department of Radiology and Medical Informatics, University of Geneva
Description
The French Annotated Resource with Semantic Information for Medical Entities Detection (FRASIMED) contains 2'051 synthetic clinical cases in French, with 24'037 annotated entities. The dataset contains two subsets:
- CANTEMIST-FR: Originally from CANTEMIST (Miranda-Escalada et al. (2020)), it contains 1'301 oncological notes, with 15'978 annotations linked to an ICD-O-3.1 morphology code. Additionally, 15’457 of them are linked to a SNOMED-CT code.
- DISTEMIST-FR: Originally from DISTEMIST's training set (Miranda-Escalada et al. (2022)), it contains 750 clinical cases, with 8'059 annotations, with 5'132 of them linked to a SNOMED-CT code.
Please, cite us:
Zaghir, J., Bjelogrlic, M., Goldman, J.-P., Aananou, S., Gaudet-Blavignac, & Lovis, C. (2023). FRASIMED: a Clinical French Annotated Resource Produced through Crosslingual BERT-Based Annotation Projection. arXiv preprint http://arxiv.org/abs/2309.10770
Files
FRASIMED.zip
Files
(5.7 MB)
Name | Size | Download all |
---|---|---|
md5:8ef2068b479ee2ee438a2f908ecb2d55
|
5.7 MB | Preview Download |
Additional details
Funding
- NCCR Evolving Language (phase I) 51NF40_180888
- Swiss National Science Foundation