Pubmed Journal Recommendation System dataset
- 1. Universidad Politécnica de Madrid
- 2. Universidad Nacional de Educación a Distancia
Contributors
Supervisors:
- 1. Universidad Nacional de Educación
- 2. Universidad Politécnica de Madrid
Description
Dataset for Journal recommendation, includes title, abstract, keywords, and journal.
We extracted the journals and more information of:
Jiasheng Sheng. (2022). PubMed-OA-Extraction-dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6330817.
Dataset Components:
-
data_pubmed_all: This dataset encompasses all articles, each containing the following columns: 'pubmed_id', 'title', 'keywords', 'journal', 'abstract', 'conclusions', 'methods', 'results', 'copyrights', 'doi', 'publication_date', 'authors', 'AKE_pubmed_id', 'AKE_pubmed_title', 'AKE_abstract', 'AKE_keywords', 'File_Name'.
-
data_pubmed: To focus on recent and relevant publications, we have filtered this dataset to include articles published within the last five years, from January 1, 2018, to December 13, 2022—the latest date in the dataset. Additionally, we have exclusively retained journals with more than 200 published articles, resulting in 262,870 articles from 469 different journals.
-
data_pubmed_train, data_pubmed_val, and data_pubmed_test: For machine learning and model development purposes, we have partitioned the 'data_pubmed' dataset into three subsets—training, validation, and test—using a random 60/20/20 split ratio. Notably, this division was performed on a per-journal basis, ensuring that each journal's articles are proportionally represented in the training (60%), validation (20%), and test (20%) sets. The resulting partitions consist of 157,540 articles in the training set, 52,571 articles in the validation set, and 52,759 articles in the test set.
Files
data_pubmed.csv
Files
(5.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:389cc5465020638358387789a6562274
|
491.2 MB | Preview Download |
|
md5:2c0c9666b893c86930a160a8f602e1c3
|
4.1 GB | Preview Download |
|
md5:ef4c1dd5ccb38f3048fc35a50b3709ff
|
98.6 MB | Preview Download |
|
md5:244f62e576c862d005cb1664ec028255
|
294.3 MB | Preview Download |
|
md5:720d8f5893095f348d29b8952d609577
|
98.3 MB | Preview Download |
Additional details
Identifiers
Related works
- Is supplement to
- Software: https://github.com/oeg-upm/LuckyLook (URL)
Funding
- Comunidad de Madrid
- Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo
- 520rt0011
Software
- Repository URL
- https://github.com/oeg-upm/LuckyLook
- Programming language
- Python
- Development Status
- Inactive