Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published December 19, 2022 | Version 1
Dataset Open

OpenChart-SE: A corpus of artificial Swedish electronic health records for imagined emergency care patients written by physicians in a crowd-sourcing project

  • 1. Department of Emergency and Internal Medicine, Skåne University Hospital, Malmö, Sweden; Department of Global Public Health, Karolinska Institute, Stockholm, Sweden
  • 2. Cell Death, Lysosomes and Artificial Intelligence Group, Department of Experimental Medical Science, Faculty of Medicine, Lund University, Lund, Sweden
  • 3. Department of Emergency and Internal Medicine, Skåne University Hospital, Malmö, Sweden

Description

Electronic health records (EHRs) are a rich source of information for medical research and public health monitoring. Information systems based on EHR data could also assist in patient care and hospital management. However, much of the data in EHRs is in the form of unstructured text, which is difficult to process for analysis. Natural language processing (NLP), a form of artificial intelligence, has the potential to enable automatic extraction of information from EHRs and several NLP tools adapted to the style of clinical writing have been developed for English and other major languages. In contrast, the development of NLP tools for less widely spoken languages such as Swedish has lagged behind. A major bottleneck in the development of NLP tools is the restricted access to EHRs due to legitimate patient privacy concerns. To overcome this issue we have generated a citizen science platform for collecting artificial Swedish EHRs with the help of Swedish physicians and medical students. These artificial EHRs describe imagined but plausible emergency care patients in a style that closely resembles EHRs used in emergency departments in Sweden. In the pilot phase, we collected a first batch of 50 artificial EHRs, which has passed review by an experienced Swedish emergency care physician. We make this dataset publicly available as OpenChart-SE corpus (version 1) under an open-source license for the NLP research community. The project is now open for general participation and Swedish physicians and medical students are invited to submit EHRs on the project website (https://github.com/Aitslab/openchart-se), where additional batches of quality-controlled EHRs will be released periodically.  

 

Dataset content

OpenChart-SE, version 1 corpus (txt files and and dataset.csv)

The OpenChart-SE corpus, version 1, contains 50 artificial EHRs (note that the numbering starts with 5 as 1-4 were test cases that were not suitable for publication). The EHRs are available in two formats, structured as a .csv file and as separate textfiles for annotation. Note that flaws in the data were not cleaned up so that it simulates what could be encountered when working with data from different EHR systems. All charts have been checked for medical validity by a resident in Emergency Medicine at a Swedish hospital before publication.

 

Codebook.xlsx

The codebook contain information about each variable used. It is in XLSForm-format, which can be re-used in several different applications for data collection.

 

suppl_data_1_openchart-se_form.pdf

OpenChart-SE mock emergency care EHR form.

 

suppl_data_3_openchart-se_dataexploration.ipynb

This jupyter notebook contains the code and results from the analysis of the OpenChart-SE corpus.

 

More details about the project and information on the upcoming preprint accompanying the dataset can be found on the project website (https://github.com/Aitslab/openchart-se).

Notes

Acknowledgement We thank all citizen scientists for contributing artificial EHRs and all members of our research groups, who provided helpful comments throughout the development of this project. This study was supported by a grant to Science for Life Laboratory from the Knut and Alice Wallenberg (KAW) Foundation (S.A. 2020.0182), which was distributed through the SciLifeLab and KAW National COVID-19 Research Program. The project is conducted in the AI Lund research environment at Lund University.

Files

10.txt

Files (1.0 MB)

Name Size Download all
md5:93941f4a84c566965c2944ba4ff919ca
1.3 kB Preview Download
md5:08e6d68d4d7ac378a0174d705d8cd5c0
1.5 kB Preview Download
md5:560c14e376636c61bd9fb73838665252
1.3 kB Preview Download
md5:2092f0e7a32397dc9a559865ce407d86
1.7 kB Preview Download
md5:43a554bb5ef5532205dab48b3cb4d2ba
1.9 kB Preview Download
md5:385633c2e96447cfdc0bad48cd63655c
1.4 kB Preview Download
md5:705b680d8b9b4cb211505ffd25fe3ed5
1.5 kB Preview Download
md5:b2267ade94c22b85d781f2f336557b33
1.5 kB Preview Download
md5:5cfe28fc9c50798f883b95092e30938a
1.1 kB Preview Download
md5:65df387e274b946f7660e9cf66efbcee
1.5 kB Preview Download
md5:e119f42e87faf05494fc127908044ad1
1.7 kB Preview Download
md5:3930f1f73b87e595911a009d0817c5b1
1.5 kB Preview Download
md5:c307a4f95a3d6869104f96a596685b2d
1.2 kB Preview Download
md5:e3ebe8ba3bb9a203da9fc766db38632a
1.4 kB Preview Download
md5:991105e0b8648868c557ed6027945203
1.2 kB Preview Download
md5:8859de93a27fde1a77d0d1b17d8b7c79
1.5 kB Preview Download
md5:107270cb73e2359a56b78fa1a313224b
1.4 kB Preview Download
md5:3ab54bfc34d3e84fb4b280208ad28952
1.5 kB Preview Download
md5:81d5dd85858b6bbd979d1e6b7829004a
1.4 kB Preview Download
md5:fe3cbe84080d8f3352dce3c2a9f6d554
1.6 kB Preview Download
md5:a290252f25a9a32078e72f51d1df4033
1.6 kB Preview Download
md5:396b2818eddb7ed52dbde3cf53995174
1.4 kB Preview Download
md5:0b8002083b3b33f37534ad958cf00be8
1.6 kB Preview Download
md5:c217362a3f2837218a3021ebe9533c74
1.5 kB Preview Download
md5:f845d5df92ea8dfc421070d1b5c442be
1.8 kB Preview Download
md5:44e44916117ca23f899e5e12439041da
1.3 kB Preview Download
md5:5753345fb3ff5699eaf2c16ba766b225
1.3 kB Preview Download
md5:4c8b0a66ba83f7930624d85b386279c1
1.5 kB Preview Download
md5:04c6f7bee7d64475ef719a13ace8d956
1.7 kB Preview Download
md5:4a4cf2ab68def2be63bd40c068649600
1.2 kB Preview Download
md5:9fa51d7a1566d0dec4df9619e20b489c
1.8 kB Preview Download
md5:bb1160aaba29aa911a836a304402ab03
1.9 kB Preview Download
md5:e3813decd62ae7e7cbb15294e76560a8
1.6 kB Preview Download
md5:4fb36324b9eab5a3c2f9129e4831efe1
2.2 kB Preview Download
md5:889c175849b4b1e18bbad1605416664f
1.6 kB Preview Download
md5:6faa287c88e8d33c6a4524b5760219e0
1.6 kB Preview Download
md5:c0991503f8273ce682aa999e34d570cc
1.1 kB Preview Download
md5:00cbb55f9e2444d70f58b1570790e561
2.1 kB Preview Download
md5:ab37e7a307d89121f8bbfaecdcc7b8a8
1.8 kB Preview Download
md5:ec4c8ab76d95ed4bd940bb07707e9977
1.6 kB Preview Download
md5:71d6162d9d5c040c2e443d7867b743ce
2.9 kB Preview Download
md5:62cbc7928e0e5f47ea4098480d77f675
1.7 kB Preview Download
md5:3ab2d8100a1748afd847130739ceb93f
2.1 kB Preview Download
md5:32e1c25e6b2e33f9c0f36c47b624d53e
1.7 kB Preview Download
md5:398fd4c261a693b4b3af0bdb14325c28
2.8 kB Preview Download
md5:54d5c6b9576a6546fee51d733746bb5a
1.4 kB Preview Download
md5:7bc95f93c79d0c32a574536a53ef97ee
1.3 kB Preview Download
md5:d0c59b0d733c03ac8f867acc97427364
1.4 kB Preview Download
md5:4ead47703726da8cf961f04bc8475a9f
1.0 kB Preview Download
md5:d990cc3c357a28051568b455a4794722
1.3 kB Preview Download
md5:f072c9d1a26c471a98108a75d9498724
18.0 kB Download
md5:3611b2e99353c0cb49cc2601a0263a4f
62.6 kB Preview Download
md5:2d367cd81d5c8921bd1dfb0a4a92d80e
18.7 kB Preview Download
md5:1a5a77ad303fa473902fd10fd177a28c
776.1 kB Preview Download
md5:51f117a032fece579a4da7718be2e6b1
58.2 kB Preview Download

Additional details

Related works

Is derived from
Dataset: https://github.com/Aitslab/openchart-se/ (URL)