Text mining processing pipeline for semi structured data D3.3

Copara, Jenny; Naderi, Nona; Kellmann, Alexander; Gosal, Gurinder; Hsiao, William; Teodoro, Douglas

doi:10.5281/zenodo.5795433

Published December 21, 2021 | Version 1

Project deliverable Open

Text mining processing pipeline for semi structured data D3.3

1. SIB
2. UMCG
3. SFU

Unstructured and semi-structured cohort data contain relevant information about the health condition of a patient, e.g., free text describing disease diagnoses, drugs, medication reasons, which are often not available in structured formats. One of the challenges posed by medical free texts is that there can be several ways of mentioning a concept. Therefore, encoding free text into unambiguous descriptors allows us to leverage the value of the cohort data, in particular, by facilitating its findability and interoperability across cohorts in the project.

Named entity recognition and normalization enable the automatic conversion of free text into standard medical concepts. Given the volume of available data shared in the CINECA project, the WP3 text mining working group has developed named entity normalization techniques to obtain standard concepts from unstructured and semi-structured fields available in the cohorts. In this deliverable, we present the methodology used to develop the different text mining tools created by the dedicated SFU, UMCG, EBI, and HES-SO/SIB groups for specific CINECA cohorts.

Files

D3.3 - Text mining processing pipeline for semi structured data.pdf

Files (1.3 MB)

Name	Size	Download all
D3.3 - Text mining processing pipeline for semi structured data.pdf md5:98f0b068d97eaff21054b6201e6e6648	1.3 MB	Preview Download

Additional details

European Commission
CINECA - Common Infrastructure for National Cohorts in Europe, Canada, and Africa 825775

	All versions	This version
Views	402	400
Downloads	246	244
Data volume	347.0 MB	344.4 MB

Text mining processing pipeline for semi structured data D3.3

Authors/Creators

Description

Files

D3.3 - Text mining processing pipeline for semi structured data.pdf

Files (1.3 MB)

Additional details

Funding