Published October 24, 2024 | Version v2
Presentation Open

Standardizing NLP-related contextual metadata for clinical and translational research

  • 1. ROR icon The University of Texas Health Science Center at Houston
  • 2. ROR icon Emory University

Description

Natural language processing (NLP) has become an essential tool for extracting clinically meaningful information from unstructured electronic health record (EHR) data. However, recent reviews of NLP-assisted observational studies have highlighted critical shortcomings in reporting practices, including inconsistent documentation of data sources, annotation methods, normalization techniques, and evaluation strategies. These gaps undermine reproducibility, transparency, and the ability to assess the reliability of NLP-derived evidence.

In this session of the OHDSI NLP Working Group, we summarized current challenges in NLP reporting and presented a community-driven proposal for standardizing NLP metadata. Drawing on principles of FAIR (Findable, Accessible, Interoperable, Reusable) data and RITE (Reproducible, Implementable, Transparent, Explainable) implementation, we outlined a framework for capturing key methodological details across the lifecycle of NLP development and evaluation. This framework includes a taxonomy of metadata elements, recommendations for documenting evaluation and refinement, and a scoring system to support consistent communication of NLP quality in network studies.

By advancing shared practices for documenting NLP implementation within OHDSI, we aim to strengthen methodological rigor, enable reproducibility across sites, and enhance confidence in NLP-enabled clinical and translational research.

Files

OHDSI_NLP_WG__split.pdf

Files (3.2 MB)

Name Size Download all
md5:b1926cd1ee34548d6fea5b5ae54b9477
3.2 MB Preview Download

Additional details

Dates

Available
2024-10-24
Presented