bert-embeddings-doc-relevance
Authors/Creators
Description
This repository presents an approach exploring and assessing literature-based doc-2-doc recommendations using BERT models with its application to the RELISH dataset. The dataset used is the RELISH Corpus, an expert-curated collection of biomedical literature consisting of pairwise document assessments. The workflow involves two main steps: First, BERT models are used without any fine-tuning, generating document embeddings to assess document recommendations. Second, the BERT models are fine-tuned on a specific training set derived from the RELISH dataset, and the resulting model is used to generate document recommendations on a separate test set. The performance of both the pretrained and fine-tuned models is then compared to assess the impact of fine-tuning on the quality of document recommendations.
This work used deNBI resources and therefore was supported by the de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) and ELIXIR-DE (Forschungszentrum Jülich and W-de.NBI-001, W-de.NBI-004, W-de.NBI-008, W-de.NBI-010, W-de.NBI-013, W-de.NBI-014, W-de.NBI-016, W-de.NBI-022).
Files
bert-embeddings-doc-relevance-1.0.0.zip
Files
(16.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:8de827ff340602864d5259b43ff1ef9e
|
8.2 MB | Download |
|
md5:72d737f8dce43574daa73d1ab973b873
|
8.3 MB | Preview Download |
Additional details
Funding
Software
- Repository URL
- https://github.com/zbmed-semtec/bert-embeddings-doc-relevance
- Programming language
- Python