Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published December 8, 2023 | Version v1
Dataset Open

Graph Machine Learning Dataset SOA-SW

  • 1. ROR icon Karlsruhe Institute of Technology

Contributors

Contact person:

Data collector:

  • 1. ROR icon Karlsruhe Institute of Technology

Description

SOA-SW  is a heterogeneous graph machine learning dataset based on the RDF knowledge graph SemOpenAlex-SemanticWeb.

SOA-SW contains six node types - works (95,575 nodes), authors (19,970 nodes), concepts (38,050 nodes), sources (10,739 nodes), institutions (5,846 nodes), and publishers (786 nodes), and seven edge types.  

Each node has rich semantic node features as node representation (content-based and topology-based node features are available).

More information can be found in the README.txt and on https://github.com/davidlamprecht/AutoRDF2GML.

 

soa-sw-homogeneous-author only models the co-author network of SOA-SW.

It is a homogeneous graph containing the author node type (19,970 nodes) and the edge type author- author.

The authors' content-based features (nodes-nld) are based on the titles and abstracts of the authors' works (128-dimensional SciBERT embeddings).

Files

soa-sw-homogeneous-author.zip

Files (380.2 MB)

Name Size Download all
md5:460e8de1055acc35df98be301b34ae6b
46.2 MB Preview Download
md5:3ca2c28ba62493205fdde6c9389d375a
334.0 MB Preview Download