Prioritization of COVID-19-related literature via unsupervised keyphrase extraction and document representation learning

doi:10.1007/978-3-030-88942-5_16

Published November 5, 2021 | Version v1

Conference paper Open

Prioritization of COVID-19-related literature via unsupervised keyphrase extraction and document representation learning

1. Jožef Stefan Institute
2. University of Maribor, Maribor, Slovenia

The COVID-19 pandemic triggered a wave of novel scientific literature that is impossible to inspect and study in a reasonable time frame manually. Current machine learning methods offer to project such body of literature into the vector space, where similar documents are located close to each other, offering an insightful exploration of scientific papers and other knowledge sources associated with COVID-19. However, to start searching, such texts need to be appropriately annotated, which is seldom the case due to the lack of human resources. In our system, the current body of COVID-19-related literature is annotated using unsupervised keyphrase extraction, facilitating the initial queries to the latent space containing the learned document embeddings (lowdimensional representations). The solution is accessible through a web server capable of interactive search, term ranking, and exploration of potentially interesting literature. We demonstrate the usefulness of the approach via case studies from the medicinal chemistry domain.

Files

2110.08874.pdf

Files (1.4 MB)

Name	Size	Download all
2110.08874.pdf md5:7bdc4faa3209e36a3f3857d1eba5a65a	1.4 MB	Preview Download

Additional details

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153: European Commission

	All versions	This version
Views	46	46
Downloads	39	39
Data volume	62.7 MB	62.7 MB

Prioritization of COVID-19-related literature via unsupervised keyphrase extraction and document representation learning

Creators

Description

Files

2110.08874.pdf

Files (1.4 MB)

Additional details

Funding