Published August 4, 2020 | Version v1.0
Dataset Open

Biomedical ELECTRA based deep language representation models for biomedical text mining.

Authors/Creators

  • 1. Department of Neurosciences, UCSD

Description

The gzipped tar file contains two biomedical language representation models based on ELECTRA  (Clark et al., 2020)  deep transformers architecture to be used for down-stream biomedical text mining tasks. 

Bio-ELECTRA is pre-trained from scratch on PubMed abstracts for 1.8 million steps. Bio-ELECTRA++ is the further pre-trained version of Bio-ELECTRA trained on a corpus of open access full papers from PubMed.

Files

Files (321.3 MB)

Name Size Download all
md5:3b54d8b6f2fa9b52d2c7ac2e1332afd4
321.3 MB Download