Published February 27, 2021 | Version 2
Dataset Open

DBLP Article Similarities (DBLP-ArtSim) dataset

  • 1. Athena Research Center
  • 2. Univ. of the Peloponnese

Description

This dataset contains similarity scores among articles in AMiner's DBLP v10 dataset.

Similarities are calculated using the JoinSim [1] similarity measure on the derived citation network using the following metapaths: 

  • Paper - Author - Paper (PAP.csv.gz)
  • Paper - Topic - Paper (PTP.csv.gz)
  • Paper - Venue - Paper (PVP.csv.gz)

The Paper to Venue relationships also also provided  in PV_relationships.csv.gz.

The file aminer_ids.csv.gz contains a mapping from AMiner's ids to our internal numeric ids used in the similarities files.

 

[1] Xiong, Y., Zhu, Y., Yu, P.S.: Top-k similarity join in heterogeneous information networks. IEEE Transactions on Knowledge and Data Engineering 27(6), 1710– 1723 (2015)

Notes

We acknowledge support of this work by the project "Moving from Big Data Management to Data Science" (MIS 5002437/3) which is implemented under the Action "Reinforcement of the Research and Innovation Infrastructure", funded by the Operational Programme "Competitiveness, Entrepreneurship and Innovation" (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund).

Files

Files (39.4 GB)

Name Size Download all
md5:5ef49e871a0ee2a639c2fe0bec9fbe30
71.5 MB Download
md5:e0588acecda7f4be9bd977f769d038ae
202.9 MB Download
md5:7c9a59039f6d014ad0ece558ab9c19c4
1.1 GB Download
md5:5cb087f904461783ec9822341e134964
11.0 MB Download
md5:c5c9c9c2790cf31ffaca5f0328bf0c75
38.0 GB Download