7665898
doi
10.5281/zenodo.7665898
oai:zenodo.org:7665898
user-eu
Ilias Kanellos
IMSI, ATHENA RC
Claudio Atzori
CNR
Andrea Mannocci
CNR
Serafeim Chatzopoulos
IMSI, ATHENA RC
Sandro La Bruzzo
CNR
Natalia Manola
OpenAIRE
Paolo Manghi
CNR
BIP! DB: A Dataset of Impact Measures for Scientific Publications
Thanasis Vergoulis
IMSI, ATHENA RC
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Scientometrics
Research assessment
Research impact
<p>This dataset contains impact measures (metrics/indicators) for ~136M distinct DOIs that correspond to scientific articles. In particular, for each article we have calculated the following measures:</p>
<ul>
<li>
<p><strong><em>Citation count:</em></strong> The total number of citations, reflecting the "influence" (i.e., the total impact) of an article.</p>
</li>
<li>
<p><em>I<strong>ncubation Citation Count (3-year CC):</strong> </em>This is a time-restricted version of the citation count, where the time window length is fixed for all papers and the time window depends on the publication date of the paper, i.e., only citations 3 years after each paper’s publication are counted. This measure can be seen as an indicator of a paper's "impulse", i.e., its initial momentum directly after its publication.</p>
</li>
<li>
<p><strong><em>PageRank score:</em> </strong>This is a citation-based measure reflecting the "influence" (i.e., the total impact) of an article. It is based on the PageRank<sup>1</sup> network analysis method. In the context of citation networks, PageRank estimates the importance of each article based on its centrality in the whole network.</p>
</li>
<li>
<p><strong><em>RAM score:</em> </strong>This is a citation-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the RAM<sup>2</sup> method and is essentially a citation count where recent citations are considered as more important. This type of “time awareness” alleviates problems of methods like PageRank, which are biased against recently published articles (new articles need time to receive a “sufficient” number of citations). Hence, RAM is more suitable to capture the current “hype” of an article.</p>
</li>
<li>
<p><strong><em>AttRank score:</em> </strong>This is a citation network analysis-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the AttRank<sup>3</sup> method. AttRank alleviates PageRank’s bias against recently published papers by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher’s preference to read papers which received a lot of attention recently. This is why it is more suitable to capture the current “hype” of an article.</p>
</li>
</ul>
<p>More details about the aforementioned impact measures, the way they are calculated and their interpretation can be found <a href="https://bip.imsi.athenarc.gr/site/indicators">here</a>.</p>
<p>For version 5.1 onward, the impact measures are calculated in two levels:</p>
<ul>
<li>The <strong>DOI level</strong> (assuming that each DOI corresponds to a distinct scientific article.</li>
<li>The <strong>OpenAIRE-id level</strong> (leveraging DOI synonyms based on OpenAIRE's deduplication algorithm<sup>4</sup> - each distinct article has its own OpenAIRE id). </li>
</ul>
<p>Previous versions of the dataset only provided the scores at the DOI level.</p>
<p>Also, for version 7 onward, for each article in our files we also offer an impact class, which informs the user about the percentile into which the article score belongs compared to the impact scores of the rest articles in the database. The impact classes are: C1 (in top 0.01%), C2 (in top 0.1%), C3 (in top 1%), C4 (in top 10%), and C5 (in bottom 90%).</p>
<p>For each calculation level (DOI / OpenAIRE-id) we provide five (5) compressed CSV files (one for each measure/score provided) where each line follows the format “identifier <tab> score <tab> class”. The parameter setting of each measure is encoded in the corresponding filename. For more details on the different measures/scores see our extensive experimental study<sup>5</sup> and the configuration of AttRank in the original paper.<sup>3</sup> Files for the OpenAIRE-ids case contain the keyword "openaire_ids" in the filename. </p>
<p>From version 9 onward, we also provide topic-specific impact classes for DOI-identified publications. In particular, we associated those articles with 2nd level concepts from OpenAlex (284 in total); we chose to keep only the three most dominant concepts for each publication, based on their confidence score, and only if this score was greater than 0.3. Then, for each publication and impact measure, we compute its class within its respective concepts. We provide finally the "topic_based_impact_classes.txt" file where each line follows the format “identifier <tab> concept <tab> pagerank_class <tab> attrank_class <tab> 3-cc_class <tab> cc_class”.</p>
<p>The data used to produce the citation network on which we calculated the provided measures have been gathered from (a) the OpenCitations’ COCI dataset (Dec-2022 version), (b) a MAG<sup>6,7</sup> snapshot from Dec-2021, and (c) a Crossref snapshot from Jan-2023. The union of all distinct DOI-to-DOI citations that could be found in these sources have been considered (entries without a DOI were omitted). </p>
<p>References:</p>
<ol>
<li>
<p>R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.</p>
</li>
<li>
<p>Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380</p>
</li>
<li>
<p>I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)</p>
</li>
<li>
<p>P. Manghi, C. Atzori, M. De Bonis, A. Bardi, Entity deduplication in big data graphs for scholarly communication, Data Technologies and Applications (2020).</p>
</li>
<li>
<p>I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)</p>
</li>
<li>
<p>Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839</p>
</li>
<li>
<p>K. Wang et al., “A Review of Microsoft Academic Services for Science of Science Studies”, Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045 </p>
</li>
</ol>
<p>Find our Academic Search Engine built on top of these data <a href="https://bip.imsi.athenarc.gr/">here</a>. Further note, that we also provide all calculated scores through <a href="https://bip-api.imsi.athenarc.gr/documentation">BIP! Finder’s API</a>. </p>
<p><em>Terms of use:</em> These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license.</p>
<p>More details about BIP! DB can be found in our relevant peer-reviewed publication:</p>
<p><em>Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-460</em></p>
<p>We kindly request that any published research that makes use of BIP! DB cite the above article.</p>
Please cite: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-460
Zenodo
2023-02-22
info:eu-repo/semantics/other
4386934
user-eu
9
award_title=OpenAIRE-Nexus Scholarly Communication Services for EOSC users; award_number=101017452; award_identifiers_scheme=url; award_identifiers_identifier=https://cordis.europa.eu/projects/101017452; funder_id=00k4n6c32; funder_name=European Commission;
1693577817.259948
2786131624
md5:a4e77f9df11a48bbbd357a03dd1f2d9a
https://zenodo.org/records/7665898/files/3-year_CC_openaire_ids.txt.gz
1310369742
md5:b2d74f0f772c116c9b83015f5ca9a52a
https://zenodo.org/records/7665898/files/3-year_CC.txt.gz
3132731911
md5:b2d74140b9cb7d538b904821cac1f87e
https://zenodo.org/records/7665898/files/PR_a0.5_error1e-12_openaire_ids.txt.gz
2103393002
md5:cfda6c74c0be6165a5870c53279ece8b
https://zenodo.org/records/7665898/files/topic_based_impact_classes.txt.gz
2430661824
md5:a51baa0e9e80b2920c157324f04970c9
https://zenodo.org/records/7665898/files/RAM_c0.6_year2024.txt.gz
3238165281
md5:66694c3ffa04f326e25c67d21fd755eb
https://zenodo.org/records/7665898/files/AttRank_a0.2_b0.5_c0.3_rho-0.16_year2021-2024_error1e-12_openaire_ids.txt.gz
3093357687
md5:14cfdb2cd80c02e8f55be1f0efe1ceb5
https://zenodo.org/records/7665898/files/RAM_c0.6_year2024_openaire_ids.txt.gz
1758920076
md5:5e88cb7c582c196178e041b3895df7cc
https://zenodo.org/records/7665898/files/AttRank_a0.2_b0.5_c0.3_rho-0.16_year2022-2024_error1e-12.txt.gz
1985743263
md5:070c5c399f3b0f2650fbc8c3879d12b3
https://zenodo.org/records/7665898/files/PR_a0.5_error1e-12.txt.gz
2815176278
md5:a58a272f57fc196459b3a7984dd5945f
https://zenodo.org/records/7665898/files/CC_openaire_ids.txt.gz
1330928472
md5:5034d22a2e32b0617e209b941cc371de
https://zenodo.org/records/7665898/files/CC.txt.gz
public
10.5281/zenodo.4386934
isVersionOf
doi