SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning
Creators
Description
criteo-click
contains a sample of 30 days of Criteo live traffic data, each corresponding to one impression (a banner) displayed to a user and whether it is clicked [1]. Each record has 9 contextual features that are aggregated into a 270-dimensional edge feature. There are 675 unique campaign banners and 6.1M users, consisting of a bipartite graph of 16.5M edges: 97% is used for training, and the rest is evenly split for validation and testing based on temporal orders. The task is to predict which campaign the user is most likely to click among 651 candidates.
twitter-2010
is an industry-level social network with 1.5B user following relations [2]. An edge (𝑖, 𝑗) of this network indicates that user 𝑖 is followed by user 𝑗. 1% of Twitter users who follow 10 to 1000 accounts are randomly sampled for evaluation. The task is to recommend which account they will most likely follow among 1001 candidates.
Files
criteo_click.zip
Files
(10.2 GB)
Name | Size | Download all |
---|---|---|
md5:cfc984058ed3a5a86a65e188aa580298
|
1.3 GB | Preview Download |
md5:d713b2efb8670ea14499a857255bcccf
|
9.0 GB | Preview Download |
Additional details
Additional titles
- Alternative title
- SGRL Large-Scale Dataset
Software
- Repository URL
- https://github.com/Graph-COM/SUREL_Plus
References
- [1] Eustache Diemert, Julien Meynet, Pierre Galland, and Damien Lefortier. 2017. Attribution modeling increases efficiency of bidding in display advertising. In Proceedings of the AdKDD and TargetAd Workshop. ACM, 1–6.
- [2] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web. 591–600.