Standardizing NLP-related contextual metadata for clinical and translational research
Authors/Creators
Description
Natural language processing (NLP) has become an essential tool for extracting clinically meaningful information from unstructured electronic health record (EHR) data. However, recent reviews of NLP-assisted observational studies have highlighted critical shortcomings in reporting practices, including inconsistent documentation of data sources, annotation methods, normalization techniques, and evaluation strategies. These gaps undermine reproducibility, transparency, and the ability to assess the reliability of NLP-derived evidence.
In this session of the OHDSI NLP Working Group, we summarized current challenges in NLP reporting and presented a community-driven proposal for standardizing NLP metadata. Drawing on principles of FAIR (Findable, Accessible, Interoperable, Reusable) data and RITE (Reproducible, Implementable, Transparent, Explainable) implementation, we outlined a framework for capturing key methodological details across the lifecycle of NLP development and evaluation. This framework includes a taxonomy of metadata elements, recommendations for documenting evaluation and refinement, and a scoring system to support consistent communication of NLP quality in network studies.
By advancing shared practices for documenting NLP implementation within OHDSI, we aim to strengthen methodological rigor, enable reproducibility across sites, and enhance confidence in NLP-enabled clinical and translational research.
Files
OHDSI_NLP_WG__split.pdf
Files
(3.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b1926cd1ee34548d6fea5b5ae54b9477
|
3.2 MB | Preview Download |
Additional details
Dates
- Available
-
2024-10-24Presented