Dataset Open Access

BIP! DB: A Dataset of Impact Measures for Scientific Publications

Thanasis Vergoulis; Ilias Kanellos; Claudio Atzori; Andrea Mannocci; Serafeim Chatzopoulos; Sandro La Bruzzo; Natalia Manola; Paolo Manghi

This dataset contains impact measures (metrics/indicators) for ~117Μ scientific articles. In particular, for each article we have calculated the following measures:

  • Citation count: The total number of citations, reflecting the "influence" (i.e., the total impact) of an article.

  • Incubation Citation Count (3-year CC): This is a time-restricted version of the citation count, where the time window length is fixed for all papers and the time window depends on the publication date of the paper, i.e., only citations 3 years after each paper’s publication are counted. This measure can be seen as an indicator of a paper's "impulse", i.e., its initial momentum directly after its publication.

  • PageRank score: This is a citation-based measure reflecting the "influence" (i.e., the total impact) of an article. It is based on the PageRank1 network analysis method. In the context of citation networks, PageRank estimates the importance of each article based on its centrality in the whole network.

  • RAM score: This is a citation-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the RAM2 method and is essentially a citation count where recent citations are considered as more important. This type of “time awareness” alleviates problems of methods like PageRank, which are biased against recently published articles (new articles need time to receive a “sufficient” number of citations).  Hence, RAMI is more suitable to capture the current “hype” of an article.

  • AttRank score: This is a citation network analysis-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the AttRank3 method. AttRank alleviates PageRank’s bias against recently published papers by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher’s preference to read papers which received a lot of attention recently. This is why it is more suitable to capture the current “hype” of an article.

We provide five compressed CSV files (one for each measure/score provided) where each line follows the format  “DOI <tab> score”. The parameter setting  of each measure is encoded in the corresponding filename. For more details on the different measures/scores see our extensive experimental study4 and the configuration of AttRank in the original paper.3 

The data used to produce the citation network on which we calculated the provided measures have been gathered from (a) the OpenCitations’ COCI dataset (Dec-2020 version), (b) a MAG5,6 snapshot from Nov-2020, and (c) a Crossref snapshot from Jan-2021. The union of all distinct DOI-to-DOI citations that could be found in these sources have been considered (entries without a DOI were omitted). 

References:

  1. R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.

  2. Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380

  3. I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)

  4. I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)

  5. Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839

  6. K. Wang et al., “A Review of Microsoft Academic Services for Science of Science Studies”, Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045    

Find our Academic Search Engine built on top of these data here. Further note, that we also provide all calculated scores through BIP! Finder’s API

Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license.

More details about BIP! DB can be found in our pre-print:

T. Vergoulis, I. Kanellos, C. Atzori, A. Mannocci, S. Chatzopoulos, S. La Bruzzo, N. Manola, P. Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. arXiv 2021, 2101.12001

We kindly request that any published research that makes use of BIP! DB cite the above article.

Please cite: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi. "BIP! DB: A Dataset of Impact Measures for Scientific Publications". arXiv:2101.12001
Files (5.7 GB)
Name Size
3-year_CC.txt.gz
md5:0613f0072d375015a57d3f7a9799f72c
912.1 MB Download
AttRank_a0.2_b0.5_g0.3_rho-0.16_year2021-2018_error1e-12.gz
md5:acfb8ff3f01f6b0830f3fe068807af99
1.2 GB Download
CC.txt.gz
md5:4a2c58e61ff576b22f701eeb3c4f58cf
937.3 MB Download
PR_a0.5_error1e-12.gz
md5:ebaee0e9bfd10aa81985a63f1ea03181
1.5 GB Download
RAM_c0.6_year2021.gz
md5:76eb368b8774f8b8e83af422ddbd3efe
1.1 GB Download
  • R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.

  • Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380

  • I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)

  • I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)

  • Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839

  • K. Wang et al., "A Review of Microsoft Academic Services for Science of Science Studies", Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045

734
101
views
downloads
All versions This version
Views 734226
Downloads 10147
Data volume 112.6 GB53.4 GB
Unique views 641199
Unique downloads 3818

Share

Cite as