Published February 20, 2025 | Version v1
Dataset Open

Graph Neural Network and Sentence Transformer Embeddings for SNOMED CT concepts

  • 1. ROR icon Universidad de Murcia
  • 2. ROR icon Trinity College Dublin
  • 3. Universidad de Murcia - Campus de Espinardo

Description

Embeddings for SNOMED CT concepts produced by Graph Neural Networks (GNNs) or sentence transformer. Each file contains a JSON file that links the ID of a SNOMED CT concept to its corresponding embedding.

Files base_mini_lm_dict.json and fine_tuned_mini_lm_dict.json contain the embeddings of the sentence transformer models, where the former is using the base MiniLM model and the latter is using the fine-tuned MiniLM model on the concept similarity task. Files gnn_mul_sct_dict.json and gnn_sim_sct_dict.json contain the embeddings produced by a GNN on a dataset produced by transforming the SNOMED CT ontology and on the task of concept similarity.

These embeddings were generated and studied in the paper Assessing the Effectiveness of Embedding Methods in Capturing Clinical Information from SNOMED CT () and more information can also be found in the following repository: https://github.com/JavierCastellD/AssessingSNOMEDEmbeddings.

Files

base_mini_lm_sct_dict.json

Files (16.6 GB)

Name Size Download all
md5:132d022770bb93fc3e4b52321389f5e7
4.2 GB Preview Download
md5:762eff3b209a22bef67bea5f2fa0393f
4.2 GB Preview Download
md5:86397e3b7549b394a62e5f2bd3e6e622
4.1 GB Preview Download
md5:2e89956b88ff69c35707be3b895feaa6
4.0 GB Preview Download

Additional details