Published December 1, 2021 | Version 1.0
Dataset Open

RuMedPrimeData

Description

The Dataset contains anonymized data from outpatient visits to SSMU hospital.    
Each visit has the following fields:
new_patient_id – patient identification;
new_event_id – visit identification;
new_event_time – date and time of the event  (random for each patient);
symptoms – patient complains, registered by the doctor;
anamnesis – patient anamnesis;
ICD10 – illness code, assigned during the visit in accordance with ICD-10 classification;

Text encoding - utf-8.
Number of visits – 7625.
Data is stored in Tab-Separated Values (TSV) format, File data.tsv (md5sum abc73e2b0e1fecb187e10152185b4c64).
Dataset version – 1.0

Before this dataset publication, it was anonymized in two steps:
Automated exception of all personal information during data acquisition from original raw data;
Manually checking and anonymization by trained assessors.
Specifically, each record was manually checked for personal information presents. Some sensitive pieces of such information were completely removed from texts. For example, phone / fax / car license plate numbers; emails; insurance policy / medical record / taxpayer-identification numbers; place of employment etc. Less sensitive parts are replaced with masked templates, for example, any specific date or name replaced with *ДАТА* or *ИМЯ*.
The dataset was prepared in compliance with fundamental principles of ethics.

 

Files

RuMedPrimeData.zip

Files (2.0 MB)

Name Size Download all
md5:6aa7fade7cc0c13d36fbe4f893fa0cff
2.0 MB Preview Download