Published April 26, 2023 | Version 1.0.0
Dataset Open

PolyMed: A Medical Dataset Addressing Disease Imbalance for Robust Automatic Diagnosis Systems

  • 1. 1Department of Applied Artificial Intelligence Major in Bio Artificial Intelligence Hanyang University, Ansan, Republic of Korea

Description

We introduce the PolyMed dataset, designed to address the limitations of existing medical case data for Automatic Diagnosis Systems (ADS). ADS assists doctors by predicting diseases based on patients' basic information, such as age, gender, and symptoms. However, these systems face challenges due to imbalanced disease label data and difficulties in accessing or collecting medical data. To tackle these issues, the PolyMed dataset has been developed to improve the evaluation of ADS by incorporating medical knowledge graph data and diagnosis case data. The dataset aims to provide comprehensive evaluation, include diverse disease information, effectively utilize external knowledge, and perform tasks closer to real-world scenarios.

We have also made the data collection tools publicly available to enable researchers and other interested parties to contribute additional data in a standardized format. These tools feature a range of customizable input fields that can be selectively utilized according to the user's specific requirements, ensuring consistency and professionalism in the data collection process.

All train and test code of our data available in https://github.com/krchanyang/PolyMed

Files

data annotation tool.zip

Files (58.2 MB)

Name Size Download all
md5:f5b6cae4c27c51c457cfe121174e5fd6
57.4 MB Preview Download
md5:df94c9c2995afba4cdc79df19c33fa66
786.0 kB Preview Download

Additional details

Related works