Published June 4, 2024 | Version v2
Dataset Open

Towards Neural Scaling Laws for Foundation Models on Temporal Graphs

  • 1. ROR icon McGill University
  • 2. ROR icon Mila - Quebec Artificial Intelligence Institute
  • 3. ROR icon University of Manitoba
  • 4. ROR icon The University of Texas at Dallas
  • 5. ROR icon Université de Montréal
  • 6. ROR icon Canadian Institute for Advanced Research
  • 7. ROR icon University of Central Florida

Description

Datasets provided in this storage are introduced in the paper: Towards Neural Scaling Laws for Foundation Models on Temporal Graphs

  • Each .csv file represents all transactions of the token network that has the same name as the file name (<tokenname.csv>)
  • Each transaction corresponds to a row in each file.
  • Each transaction has:
    •  blockNumber :  is the block ID of Ethereum that includes this transaction
    • timestamp: time that the transaction is made in UNIX timestamp format
    • tokenAddress : the address that specifies a unique ERC20 token
    • from: address of sender
    • to: address of receiver
    • value: the amount the transaction
    • fileBlock: we split the whole number of blocks count to 35 buckets and assigned the bucket ID to the transaction to trace the blocks 
  • To use the same setting as described in the papers, we include edge list and label that contain node interactions and labels for each snapshot in each token network.
    • Each transaction in the edge list also has "from","to" and "amount" fields, but with an additional "snapshot" field to indicate the index of the snapshot that the transaction below to
    • Each row in label file indicates the ground truth label of the snapshot having an index corresponding to the index of the row (e.g first row indicates the label of the first snapshot)
    • We provided the way to generate edge lists and label files in the following Github repository: https://github.com/benjaminnNgo/ScalingTGNs/blob/main/script/utils/TGS.py
  • However, we also provide raw .csv  to divide into generate edgeslist and label with a different setting.



Abstract

The field of temporal graph learning aims to learn from evolving network data to forecast future interactions. Given a collection of observed temporal graphs, is it possible to predict the evolution of an unseen network from the same domain?
To answer this question, we first present the Temporal Graph Scaling(TGS) dataset, a large collection of temporal graphs, consisting of eighty-four ERC20 token transaction networks collected from 2017 to 2023. Next, we evaluate the transferability of Temporal Graph Neural Networks(TGNNs) for the temporal graph property prediction task by pre-training on a collection of up to sixty-four token transaction networks and then evaluate their downstream performance on twenty unseen token types. We observe that the neural scaling law observed in NLP and Computer Vision also applies in temporal graph learning where pre-training on more networks and more parameters leads to better downstream performance. To the best of our knowledge, this is the first time that the transferability of temporal graphs has been shown empirically. Notably, on the downstream token networks, the largest pre-trained model outperforms fine-tuned TGNNs on thirteen unseen test networks. Therefore, we believe this is a promising first step towards building foundation models for temporal graphs. The code and datasets are publicly available at https://github.com/benjaminnNgo/ScalingTGNs under MIT licence.

 

Files

TGS_edgelists.zip

Files (3.5 GB)

Name Size Download all
md5:8693d0f8ed1ab9ccc83650f2969d29ca
2.9 GB Preview Download
md5:8a5f0e0d8670628904cdbea211782f7b
21.2 kB Preview Download
md5:2660de4f0bca8e49950240bd11a76cc5
602.2 MB Preview Download

Additional details

Dates

Submitted
2024-06-05
NeurIPS 2024 Datasets and Benchmarks Track

Software

Repository URL
https://github.com/benjaminnNgo/ScalingTGNs
Programming language
Python
Development Status
Active