Published April 5, 2022 | Version v1
Dataset Open

Structure Annotations of Assessment and Plan Sections from MIMIC-III

Description

Physicians record their detailed thought-processes about diagnoses and treatments as unstructured text in a section of a clinical note called the "assessment and plan". This information is more clinically rich than structured billing codes assigned for an encounter but harder to reliably extract given the complexity of clinical language and documentation habits. To structure these sections we collected a dataset of annotations over assessment and plan sections from the publicly available and de-identified MIMIC-III dataset, and developed deep-learning based models to perform this task, described in the associated paper available as a pre-print at: https://www.medrxiv.org/content/10.1101/2022.04.13.22273438v1

When using this data please cite our paper:

@article {Stupp2022.04.13.22273438,
author = {Stupp, Doron and Barequet, Ronnie and Lee, I-Ching and Oren, Eyal and Feder, Amir and Benjamini, Ayelet and Hassidim, Avinatan and Matias, Yossi and Ofek, Eran and Rajkomar, Alvin},
title = {Structured Understanding of Assessment and Plans in Clinical Documentation},
year = {2022},
doi = {10.1101/2022.04.13.22273438},
publisher = {Cold Spring Harbor Laboratory Press},
URL = {https://www.medrxiv.org/content/early/2022/04/17/2022.04.13.22273438},
journal = {medRxiv}
}

The dataset, presented here, contains annotations of assessment and plan sections of notes from the publicly available and de-identified MIMIC-III dataset, marking the active problems, their assessment description, and plan action items. Action items are additionally marked as one of 8 categories (listed below). The dataset contains over 30,000 annotations of 579 notes from distinct patients, annotated by 6 medical residents and students. 

The dataset is divided into 4 partitions -  a training set (481 notes), validation set (50 notes), test set (48 notes) and an inter-rater set. The inter-rater set contains the annotations of each of the raters over the test set. Rater 1 in the inter-rater set should be regarded as an intra-rater comparison (details in the paper). The labels underwent automatic normalization to capture entire word boundaries and remove flanking non-alphanumeric characters.

Code for transforming labels into TensorFlow examples and training models as described in the paper will be made available at GitHub: https://github.com/google-research/google-research/tree/master/assessment_plan_modeling

In order to use these annotations, the user additionally needs to obtain the text of the notes which is found in the NOTE_EVENTS table from MIMIC-III, access to which is to be acquired independently (https://mimic.mit.edu/)

Annotations are given as character spans in a CSV file with the following schema:

Field Type Semantics
partition categorical (one of [train, val, test, interrater] The set of ratings the span belongs to.
rater_id int Unique id for each the raters
note_id int The note’s unique note_id, links to the MIMIC-III notes table (as ROW-ID).
span_type categorical (one of [PROBLEM_TITLE,
PROBLEM_DESCRIPTION, ACTION_ITEM]
Type of the span as annotated by raters.
char_start int Character offsets from note start
char_end int
action_item_type categorical (one of [MEDICATIONS, IMAGING, OBSERVATIONS_LABS, CONSULTS, NUTRITION, THERAPEUTIC_PROCEDURES, OTHER_DIAGNOSTIC_PROCEDURES, OTHER]) Type of action item if the span is an action item (empty otherwise) as annotated by raters.

Files

ap_parsing_mimic3_annotations.csv

Files (1.5 MB)

Name Size Download all
md5:6d6d57e18547b7b06026067c21d3b72e
1.5 MB Preview Download

Additional details

References

  • Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Scientific data. 2016 May 24;3(1):1-9.