There is a newer version of this record available.

Dataset Open Access

DBLP Article Similarities (DBLP-ArtSim) dataset

Serafeim Chatzopoulos; Thanasis Vergoulis; Ilias Kanellos; Theodore Dalamagas; Christos Tryfonopoulos

This dataset contains similarity scores among articles in AMiner's DBLP v10 dataset.

Similarities are calculated using the JoinSim [1] similarity measure on the derived citation network using the following metapaths: 

  • Paper - Author - Paper (PAP_similarities.csv)
  • Paper - Topic - Paper (PTP_similarities.csv)

The file ids.csv contains a mapping from AMiner's ids to our internal numeric ids used in the similarities files.

[1] Xiong, Y., Zhu, Y., Yu, P.S.: Top-k similarity join in heterogeneous information networks. IEEE Transactions on Knowledge and Data Engineering 27(6), 1710– 1723 (2015)

We acknowledge support of this work by the project "Moving from Big Data Management to Data Science" (MIS 5002437/3) which is implemented under the Action "Reinforcement of the Research and Innovation Infrastructure", funded by the Operational Programme "Competitiveness, Entrepreneurship and Innovation" (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund).
Files (8.1 GB)
Name Size
ids.csv
md5:e4dd4b9d19976e84f16d347fb192b8d6
137.4 MB Download
PAP_similarities.csv
md5:72720e946d276182352be86386348ee1
1.1 GB Download
PTP_similarities.csv
md5:58d4f5e24c9c90739df07be491881abe
6.8 GB Download
113
140
views
downloads
All versions This version
Views 11372
Downloads 140123
Data volume 250.7 GB132.3 GB
Unique views 9364
Unique downloads 9182

Share

Cite as