Published July 1, 2023 | Version v1
Dataset Open

SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning

  • 1. ROR icon Purdue University West Lafayette
  • 2. ROR icon Peking University
  • 3. ROR icon Georgia Institute of Technology

Description

criteo-click contains a sample of 30 days of Criteo live traffic data, each corresponding to one impression (a banner) displayed to a user and whether it is clicked [1]. Each record has 9 contextual features that are aggregated into a 270-dimensional edge feature. There are 675 unique campaign banners and 6.1M users, consisting of a bipartite graph of 16.5M edges: 97% is used for training, and the rest is evenly split for validation and testing based on temporal orders. The task is to predict which campaign the user is most likely to click among 651 candidates.

twitter-2010 is an industry-level social network with 1.5B user following relations [2]. An edge (𝑖, 𝑗) of this network indicates that user 𝑖 is followed by user 𝑗. 1% of Twitter users who follow 10 to 1000 accounts are randomly sampled for evaluation. The task is to recommend which account they will most likely follow among 1001 candidates.

Files

criteo_click.zip

Files (10.2 GB)

Name Size Download all
md5:cfc984058ed3a5a86a65e188aa580298
1.3 GB Preview Download
md5:d713b2efb8670ea14499a857255bcccf
9.0 GB Preview Download

Additional details

Additional titles

Alternative title
SGRL Large-Scale Dataset

References

  • [1] Eustache Diemert, Julien Meynet, Pierre Galland, and Damien Lefortier. 2017. Attribution modeling increases efficiency of bidding in display advertising. In Proceedings of the AdKDD and TargetAd Workshop. ACM, 1–6.
  • [2] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web. 591–600.