Biomedical ELECTRA based deep language representation models for biomedical text mining.

Ibrahim Burak Ozyurt

doi:10.5281/zenodo.3971235

Published August 4, 2020 | Version v1.0

Dataset Open

Biomedical ELECTRA based deep language representation models for biomedical text mining.

Ibrahim Burak Ozyurt¹

1. Department of Neurosciences, UCSD

The gzipped tar file contains two biomedical language representation models based on ELECTRA (Clark et al., 2020) deep transformers architecture to be used for down-stream biomedical text mining tasks.

Bio-ELECTRA is pre-trained from scratch on PubMed abstracts for 1.8 million steps. Bio-ELECTRA++ is the further pre-trained version of Bio-ELECTRA trained on a corpus of open access full papers from PubMed.

Files

Files (321.3 MB)

Name	Size
bio_electra_models.tgz md5:3b54d8b6f2fa9b52d2c7ac2e1332afd4	321.3 MB	Download

432

Views

129

Downloads

Show more details

	All versions	This version
Views	432	432
Downloads	129	129
Data volume	42.1 GB	42.1 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: August 7, 2020
Modified: August 7, 2020

Biomedical ELECTRA based deep language representation models for biomedical text mining.

Authors/Creators

Description

Files

Files (321.3 MB)