There is a newer version of the record available.

Published February 10, 2021 | Version 2
Dataset Open

BIP! DB: A Dataset of Impact Measures for Scientific Publications

Description

This dataset contains impact measures (metrics/indicators) for 106,788,227 scientific articles. In particular, for each article we have calculated the following measures:

  • Citation count: This is the total number of citations, reflecting the "influence" (i.e., the total impact) of an article.

  • Incubation Citation Count (3-year CC): It is essentially a time-restricted version of the citation count, where the time window is distinct for each paper, i.e., only citations 3 years after its publication are counted. This measure can be seen as an indicator of a paper's "impulse", i.e., its initial momentum directly after its publication.

  • PageRank score: This is a citation-based measure reflecting the "influence" (i.e., the total impact) of an article. It is based on the PageRank1 network analysis method. In the context of citation networks, PageRank estimates the importance of each article based on its centrality in the whole network. 

  • RAM score: This is a citation-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the RAM2 citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). RAM alleviates this problem using an approach known as “time-awareness”. This is why it is more suitable to capture the current “hype” of an article. 

  • AttRank score: This is a citation-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the AttRank3 citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). AttRank alleviates this problem incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher’s preference to read papers which received a lot of attention recently. This is why it is more suitable to capture the current “hype” of an article.

We provide five compressed CSV files (one for each measure/score provided) having lines of the form “DOI \t score”. The configuration of each measure has been captured in the corresponding filename. Regarding the different measures/scores, you can find more intuition inside a previous extensive experimental study4

The data of the citation network used to produce this dataset have been gathered from (a) the OpenCitations’ COCI dataset (Sep-2020 version), (b) a MAG5,6 snapshot from Aug-2020, and (c) a Crossref snapshot from Mar-2020. The union of all distinct DOI-to-DOI citations that could be found in these sources have been considered (entries without a DOI were omitted). 

The work is based on the following publications:

  1. R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.

  2. Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380

  3. I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)

  4. I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)

  5. Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839

  6. K. Wang et al., “A Review of Microsoft Academic Services for Science of Science Studies”, Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045

A Web user interface that uses these data to facilitate literature exploration, can be found here. Moreover, the exact same scores can be gathered through BIP! Finder’s API.

Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license.

Notes

Please cite: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi. "BIP! DB: A Dataset of Impact Measures for Scientific Publications". arXiv:2101.12001

Files

Files (5.5 GB)

Name Size Download all
md5:c56901f37c27f62813317dbf762959ce
970.3 MB Download
md5:cd3f114250b45592cf85ba41cf439c72
1.2 GB Download
md5:12825589d5a7f85cd229ff1aff8de723
859.1 MB Download
md5:4d944540c20e7ba1a57e2ec741ff1865
1.4 GB Download
md5:a3e46fec5d2857dfc9d1a639f80e1adc
1.1 GB Download

Additional details

References

  • R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
  • Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380
  • I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)
  • I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)
  • Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839
  • K. Wang et al., "A Review of Microsoft Academic Services for Science of Science Studies", Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045