BIP! DB: A Dataset of Impact Measures for Scientific Publications

doi:10.5281/zenodo.4527341

Published February 10, 2021 | Version 2

Dataset Open

BIP! DB: A Dataset of Impact Measures for Scientific Publications

1. IMSI, ATHENA RC
2. CNR
3. OpenAIRE

This dataset contains impact measures (metrics/indicators) for 106,788,227 scientific articles. In particular, for each article we have calculated the following measures:

Citation count: This is the total number of citations, reflecting the "influence" (i.e., the total impact) of an article.
Incubation Citation Count (3-year CC): It is essentially a time-restricted version of the citation count, where the time window is distinct for each paper, i.e., only citations 3 years after its publication are counted. This measure can be seen as an indicator of a paper's "impulse", i.e., its initial momentum directly after its publication.
PageRank score: This is a citation-based measure reflecting the "influence" (i.e., the total impact) of an article. It is based on the PageRank¹ network analysis method. In the context of citation networks, PageRank estimates the importance of each article based on its centrality in the whole network.
RAM score: This is a citation-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the RAM² citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). RAM alleviates this problem using an approach known as “time-awareness”. This is why it is more suitable to capture the current “hype” of an article.
AttRank score: This is a citation-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the AttRank³ citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). AttRank alleviates this problem incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher’s preference to read papers which received a lot of attention recently. This is why it is more suitable to capture the current “hype” of an article.

We provide five compressed CSV files (one for each measure/score provided) having lines of the form “DOI \t score”. The configuration of each measure has been captured in the corresponding filename. Regarding the different measures/scores, you can find more intuition inside a previous extensive experimental study⁴.

The data of the citation network used to produce this dataset have been gathered from (a) the OpenCitations’ COCI dataset (Sep-2020 version), (b) a MAG^5,6 snapshot from Aug-2020, and (c) a Crossref snapshot from Mar-2020. The union of all distinct DOI-to-DOI citations that could be found in these sources have been considered (entries without a DOI were omitted).

The work is based on the following publications:

R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839
K. Wang et al., “A Review of Microsoft Academic Services for Science of Science Studies”, Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045

A Web user interface that uses these data to facilitate literature exploration, can be found here. Moreover, the exact same scores can be gathered through BIP! Finder’s API.

Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license.

Notes

Please cite: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi. "BIP! DB: A Dataset of Impact Measures for Scientific Publications". arXiv:2101.12001

Files

Files (5.5 GB)

Name	Size	Download all
3-year_CC_indicator_graph_universe3.txt.gz md5:c56901f37c27f62813317dbf762959ce	970.3 MB	Download
AttRank_graph_universe3_a0.2_b0.5_g0.3_rho-0.16_year2018-2021_error1e-12.txt.gz md5:cd3f114250b45592cf85ba41cf439c72	1.2 GB	Download
CC_graph_universe3.txt.gz md5:12825589d5a7f85cd229ff1aff8de723	859.1 MB	Download
PR_graph_universe3_a0.5_error1e-12.txt.gz md5:4d944540c20e7ba1a57e2ec741ff1865	1.4 GB	Download
RAM_graph_universe3_c0.6_year2021.txt.gz md5:a3e46fec5d2857dfc9d1a639f80e1adc	1.1 GB	Download

Additional details

R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839
K. Wang et al., "A Review of Microsoft Academic Services for Science of Science Studies", Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045

	All versions	This version
Views	4,499	231
Downloads	787	14
Data volume	2.6 TB	20.6 GB

BIP! DB: A Dataset of Impact Measures for Scientific Publications

Creators

Description

Notes

Files

Files (5.5 GB)

Additional details

References