Published March 13, 2026 | Version v1.0.0
Software Open

bert-embeddings-doc-relevance

  • 1. ROR icon ZB MED - Information Centre for Life Sciences

Description

This repository presents an approach exploring and assessing literature-based doc-2-doc recommendations using BERT models with its application to the RELISH dataset. The dataset used is the RELISH Corpus, an expert-curated collection of biomedical literature consisting of pairwise document assessments. The workflow involves two main steps: First, BERT models are used without any fine-tuning, generating document embeddings to assess document recommendations. Second, the BERT models are fine-tuned on a specific training set derived from the RELISH dataset, and the resulting model is used to generate document recommendations on a separate test set. The performance of both the pretrained and fine-tuned models is then compared to assess the impact of fine-tuning on the quality of document recommendations.

 

 

This work used deNBI resources and therefore was supported by the de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) and ELIXIR-DE (Forschungszentrum Jülich and W-de.NBI-001, W-de.NBI-004, W-de.NBI-008, W-de.NBI-010, W-de.NBI-013, W-de.NBI-014, W-de.NBI-016, W-de.NBI-022).

Files

bert-embeddings-doc-relevance-1.0.0.zip

Files (16.5 MB)

Name Size Download all
md5:8de827ff340602864d5259b43ff1ef9e
8.2 MB Download
md5:72d737f8dce43574daa73d1ab973b873
8.3 MB Preview Download

Additional details

Funding

Deutsche Forschungsgemeinschaft
STELLA Project 407518790
Deutsche Forschungsgemeinschaft
NFDI4DS - NFDI for Data Science and Artificial Intelligence 460234259

Software