Medical Concept Embeddings for SNOMED-CT (Jan 2019 version)

Pattisapu, Nikhil; Patil, Sangameshwar; Palshikar, Girish; Varma Vasudeva

doi:10.5281/zenodo.3842143

Published May 24, 2020 | Version v1

Dataset Open

Medical Concept Embeddings for SNOMED-CT (Jan 2019 version)

1. IIIT Hyderabad
2. TCS Research

This dataset contains the SNOMED-CT medical concept embeddings trained using the following text and graph embedding methods.

Averaged Word Embedding (300)
ELMo (1024)
Universal Sentence Encoder (512)
BERT (768)
Deepwalk (128)
Node2Vec (128)
HARP (128)
LINE (128)

The tar file contains eight JSON files corresponding to the aforementioned embedding techniques. The number (in parenthesis) besides each embedding method represents the dimensionality of the embedding. Each JSON file contains a python dictionary of the form

SNOMED concept ID (String): Embedding (List).

If you find this resource useful in your research, please consider citing our paper:

"Pattisapu, N., Patil, S., Palshikar, G. and Varma, V., Medical Concept Normalization by Encoding Target Knowledge, Proceedings of Machine Learning Research 116:246–259, 2020 Machine Learning for Health (ML4H) at NeurIPS 2019"

Warning: The dataset size is large (~12 GB). Please ensure that you have sufficient network bandwidth and disk space before requesting a download.

Files

Files (12.2 GB)

Name	Size	Download all
snomed_embeddings.tar.gz md5:9d0a5e1d0a9261f345933cbb649487c5	12.2 GB	Download

Additional details

References: Conference paper: http://proceedings.mlr.press/v116/pattisapu20a/pattisapu20a.pdf (URL)

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	1,512	1,505
Downloads	183	183
Data volume	2.4 TB	2.4 TB

Medical Concept Embeddings for SNOMED-CT (Jan 2019 version)

Creators

Description

Files

Files (12.2 GB)

Additional details

Related works