Published July 31, 2023 | Version v2
Preprint Open

Automatizing biocurators' intuition: filtering scientific papers by analyzing titles and short summaries

  • 1. Institute for Globally Distributed Open Research and Education (IGDORE)

Description

We present a text classification task arising in the biocuration of cellular chemical reactions when searching for curatable literature. We explore the suitability of various NLP and ML methods for this task. In summary, while fine-tuned domain-specific language models show the best results, random forests are nearly as good, with a much lighter computational footprint.

Files

EXP.pdf

Files (106.1 kB)

Name Size Download all
md5:0efdb111092a73a959d607043e603561
92.2 kB Preview Download
md5:baad0bb1347bbc6bb03a25c7c70e5631
13.9 kB Download

Additional details

References