BERT-CRel: Improved Biomedical Word Embeddings in the Transformer Era

Noh, Jiho; Kavuluru, Ramakanth

doi:10.5281/zenodo.4383195

Published December 21, 2020 | Version v1

Other Open

BERT-CRel: Improved Biomedical Word Embeddings in the Transformer Era

1. University of Kentucky

BERT-CRel is a transformer model for fine-tuning biomedical word embeddings that are jointly learned along with concept embeddings using a pre-training phase with fastText and a fine-tuning phase with a transformer setup. The goal is to provide high quality pre-trained biomedical embeddings that can be used in any downstream task by the research community. The corpus used for BERT-CRel contains biomedical citations from PubMed and the concepts are from the Medical Subject Headings (MeSH codes) terminology used to index citations.

BERT-CRel-all

This contains word embeddings and all the MeSH descriptors and a subset of supplementary concepts each of which meets a frequency threshold. Vocabulary is divided into three sections: (1) BERT special tokens (2) MeSH codes (3) English words in descending frequency order. (vocabulary size is 333,301)

BERT-CRel-MeSH

These files contain only MeSH code embeddings. (vocabulary size is 45,015)

BERT-CRel-words

These files contain only English word embeddings. (vocabulary size is 288,281)

More details can be found in our pre-print ()

Files

Files (3.6 GB)

Name	Size	Download all
BERT-CRel-all.bin md5:6f85d34c197368796f30a126b7176578	531.0 MB	Download
BERT-CRel-all.vec md5:1bb7e584d4503592292a8c244a698321	1.3 GB	Download
BERT-CRel-meshes.bin md5:a5ac5b7a419726a3d5eb2d4e2a22dbef	71.9 MB	Download
BERT-CRel-meshes.vec md5:86c150da930987fd6746e67d1f9b8449	168.0 MB	Download
BERT-CRel-words.bin md5:f6bb738f5cf22323d0effde81eb995e9	459.1 MB	Download
BERT-CRel-words.vec md5:dbf234c788fe1714238d0ddb95e7483e	1.1 GB	Download

	All versions	This version
Views	414	413
Downloads	356	356
Data volume	332.2 GB	332.2 GB

BERT-CRel: Improved Biomedical Word Embeddings in the Transformer Era

Authors/Creators

Description

Files

Files (3.6 GB)