Published October 2, 2024 | Version 0.0.1
Dataset Open

Pubmed Journal Recommendation System dataset

  • 1. Universidad Politécnica de Madrid
  • 2. Universidad Nacional de Educación a Distancia
  • 1. Universidad Nacional de Educación
  • 2. Universidad Politécnica de Madrid

Description

Dataset for Journal recommendation, includes title, abstract, keywords, and journal.

We extracted the journals and more information of:

Jiasheng Sheng. (2022). PubMed-OA-Extraction-dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6330817.

Dataset Components:

  • data_pubmed_all: This dataset encompasses all articles, each containing the following columns: 'pubmed_id', 'title', 'keywords', 'journal', 'abstract', 'conclusions', 'methods', 'results', 'copyrights', 'doi', 'publication_date', 'authors', 'AKE_pubmed_id', 'AKE_pubmed_title', 'AKE_abstract', 'AKE_keywords', 'File_Name'.

  • data_pubmed: To focus on recent and relevant publications, we have filtered this dataset to include articles published within the last five years, from January 1, 2018, to December 13, 2022—the latest date in the dataset. Additionally, we have exclusively retained journals with more than 200 published articles, resulting in 262,870 articles from 469 different journals.

  • data_pubmed_train, data_pubmed_val, and data_pubmed_test: For machine learning and model development purposes, we have partitioned the 'data_pubmed' dataset into three subsets—training, validation, and test—using a random 60/20/20 split ratio. Notably, this division was performed on a per-journal basis, ensuring that each journal's articles are proportionally represented in the training (60%), validation (20%), and test (20%) sets. The resulting partitions consist of 157,540 articles in the training set, 52,571 articles in the validation set, and 52,759 articles in the test set.

Files

data_pubmed.csv

Files (5.1 GB)

Name Size Download all
md5:389cc5465020638358387789a6562274
491.2 MB Preview Download
md5:2c0c9666b893c86930a160a8f602e1c3
4.1 GB Preview Download
md5:ef4c1dd5ccb38f3048fc35a50b3709ff
98.6 MB Preview Download
md5:244f62e576c862d005cb1664ec028255
294.3 MB Preview Download
md5:720d8f5893095f348d29b8952d609577
98.3 MB Preview Download

Additional details

Related works

Is supplement to
Software: https://github.com/oeg-upm/LuckyLook (URL)

Funding

Comunidad de Madrid
Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo
520rt0011

Software

Repository URL
https://github.com/oeg-upm/LuckyLook
Programming language
Python
Development Status
Inactive