Published March 13, 2026 | Version v1.0.0
Software Open

hybrid-doc-relevance-training

Description

This software corresponds to Hybrid-doc-relevance-training which provides a set of hybrid embedding approaches for literature-based document-to-document similarity, leveraging the RELISH corpus along with integrating semantic understanding. It implements three distinct algorithms: pre-annotation, post-annotation, and post-reduction annotation, utilizing models such as Doc2Vec, Word2Vec, and FastText. The software combines these models with ontology-based background knowledge to improve document similarity, relevance, and recommendation. Detailed documentation is provided, including input data preprocessing and execution instructions for easy integration and use.

 

 

This work used deNBI resources and therefore was supported by the de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) and ELIXIR-DE (Forschungszentrum Jülich and W-de.NBI-001, W-de.NBI-004, W-de.NBI-008, W-de.NBI-010, W-de.NBI-013, W-de.NBI-014, W-de.NBI-016, W-de.NBI-022).

Files

hybrid-doc-relevance-training-1.0.0.zip

Files (7.7 MB)

Name Size Download all
md5:4a7c688f3185bd040a1d9044510be5a0
3.8 MB Download
md5:fb9fb7df1f821373db764d773f93b289
3.9 MB Preview Download

Additional details

Funding

Deutsche Forschungsgemeinschaft
STELLA II (Infrastructure for Living Labs) 407518790
Deutsche Forschungsgemeinschaft
NFDI4DS - NFDI for Data Science and Artificial Intelligence 460234259

Software

Repository URL
https://github.com/zbmed-semtec/hybrid-doc-relevance-training
Programming language
Python
Development Status
Active