Medical Concept Embeddings for SNOMED-CT (Jan 2019 version)
- 1. IIIT Hyderabad
- 2. TCS Research
Description
This dataset contains the SNOMED-CT medical concept embeddings trained using the following text and graph embedding methods.
- Averaged Word Embedding (300)
- ELMo (1024)
- Universal Sentence Encoder (512)
- BERT (768)
- Deepwalk (128)
- Node2Vec (128)
- HARP (128)
- LINE (128)
The tar file contains eight JSON files corresponding to the aforementioned embedding techniques. The number (in parenthesis) besides each embedding method represents the dimensionality of the embedding. Each JSON file contains a python dictionary of the form
SNOMED concept ID (String): Embedding (List).
If you find this resource useful in your research, please consider citing our paper:
"Pattisapu, N., Patil, S., Palshikar, G. and Varma, V., Medical Concept Normalization by Encoding Target Knowledge, Proceedings of Machine Learning Research 116:246–259, 2020 Machine Learning for Health (ML4H) at NeurIPS 2019"
Warning: The dataset size is large (~12 GB). Please ensure that you have sufficient network bandwidth and disk space before requesting a download.
Files
Files
(12.2 GB)
Name | Size | Download all |
---|---|---|
md5:9d0a5e1d0a9261f345933cbb649487c5
|
12.2 GB | Download |
Additional details
Related works
- References
- Conference paper: http://proceedings.mlr.press/v116/pattisapu20a/pattisapu20a.pdf (URL)