Standardizing NLP-related contextual metadata for clinical and translational research

Fu, Sunyang; Smith, Daniel

doi:10.5281/zenodo.17188578

Published October 24, 2024 | Version v2

Presentation Open

Standardizing NLP-related contextual metadata for clinical and translational research

1. The University of Texas Health Science Center at Houston
2. Emory University

Natural language processing (NLP) has become an essential tool for extracting clinically meaningful information from unstructured electronic health record (EHR) data. However, recent reviews of NLP-assisted observational studies have highlighted critical shortcomings in reporting practices, including inconsistent documentation of data sources, annotation methods, normalization techniques, and evaluation strategies. These gaps undermine reproducibility, transparency, and the ability to assess the reliability of NLP-derived evidence.

In this session of the OHDSI NLP Working Group, we summarized current challenges in NLP reporting and presented a community-driven proposal for standardizing NLP metadata. Drawing on principles of FAIR (Findable, Accessible, Interoperable, Reusable) data and RITE (Reproducible, Implementable, Transparent, Explainable) implementation, we outlined a framework for capturing key methodological details across the lifecycle of NLP development and evaluation. This framework includes a taxonomy of metadata elements, recommendations for documenting evaluation and refinement, and a scoring system to support consistent communication of NLP quality in network studies.

By advancing shared practices for documenting NLP implementation within OHDSI, we aim to strengthen methodological rigor, enable reproducibility across sites, and enhance confidence in NLP-enabled clinical and translational research.

Files

OHDSI_NLP_WG__split.pdf

Files (3.2 MB)

Name	Size	Download all
OHDSI_NLP_WG__split.pdf md5:b1926cd1ee34548d6fea5b5ae54b9477	3.2 MB	Preview Download

Additional details

Available: 2024-10-24

Presented

	All versions	This version
Views	43	31
Downloads	52	43
Data volume	199.6 MB	158.6 MB

Standardizing NLP-related contextual metadata for clinical and translational research

Authors/Creators

Description

Files

OHDSI_NLP_WG__split.pdf

Files (3.2 MB)

Additional details

Dates