There is a newer version of this record available.

Dataset Open Access

# BIP! Finder DB: A Dataset of Impact Measures for Scientific Publications

Thanasis Vergoulis; Ilias Kanellos; Claudio Atzori; Andrea Mannocci; Sandro La Bruzzo; Natalia Manola; Paolo Manghi

### Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:creator>Thanasis Vergoulis</dc:creator>
<dc:creator>Ilias Kanellos</dc:creator>
<dc:creator>Claudio Atzori</dc:creator>
<dc:creator>Andrea Mannocci</dc:creator>
<dc:creator>Sandro La Bruzzo</dc:creator>
<dc:creator>Natalia Manola</dc:creator>
<dc:creator>Paolo Manghi</dc:creator>
<dc:date>2020-12-23</dc:date>
<dc:description>This dataset contains impact measures (metrics/indicators) for 104,769,307 scientific articles. In particular, for each article we have calculated the following measures:

PageRank score: This is a citation-based measure reflecting the influence (i.e., the total impact) of an article. It is based on the PageRank1 network analysis method. In the context of citation networks, PageRank estimates the importance of each article based on its centrality in the whole network.

RAM score: This is a citation-based measure reflecting the popularity (i.e., the current impact) of an article. It is based on the RAM2 citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). RAM alleviates this problem using an approach known as “time-awareness”. This is why it is more suitable to capture the current “hype” of an article.

AttRank score: This is a citation-based measure reflecting the popularity (i.e., the current impact) of an article. It is based on the AttRank3 citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). AttRank alleviates this problem incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher’s preference to read papers which received a lot of attention recently. This is why it is more suitable to capture the current “hype” of an article.

We provide three compressed CSV files (one for each measure/score provided) having lines of the form “DOI \t score”. The configuration of each measure have bes captured in the corresponding filename. Regarding the different measures/scores, you can find more intuition inside a previous extensive experimental study4.

The data of the citation network used to produce this dataset have been gathered from (a) the OpenCitations’ COCI dataset (Sep-2020 version), (b) a MAG5,6 snapshot from Aug-2020, and (c) a Crossref snapshot from Mar-2020. The union of all distinct DOI-to-DOI citations that could be found in these sources have been considered (entries without a DOI were omitted).

The work is based on the following publications:

R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.

Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380

I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)

I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)

Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839

K. Wang et al., “A Review of Microsoft Academic Services for Science of Science Studies”, Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045

A Web user interface that uses these data to facilitate literature exploration, can be found here. Moreover, the exact same scores can be gathered through BIP! Finder’s API.

<dc:identifier>https://zenodo.org/record/4386935</dc:identifier>
<dc:identifier>10.5281/zenodo.4386935</dc:identifier>
<dc:identifier>oai:zenodo.org:4386935</dc:identifier>
<dc:relation>doi:10.5281/zenodo.4386934</dc:relation>
<dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
<dc:subject>Scientometrics</dc:subject>
<dc:subject>Research assessment</dc:subject>
<dc:subject>Research impact</dc:subject>
<dc:title>BIP! Finder DB: A Dataset of Impact Measures for Scientific Publications</dc:title>
<dc:type>info:eu-repo/semantics/other</dc:type>
<dc:type>dataset</dc:type>
</oai_dc:dc>

342
52
views