Towards Neural Scaling Laws for Foundation Models on Temporal Graphs
Creators
- Razieh, Shirzadkhani (Researcher)1, 2
- Ngo, Bao (Researcher)3
- Kiarash, Shamsi (Researcher)3
- Huang, Shenyang (Researcher)1, 2
- Poursafaei, Farimah (Researcher)1, 2
- Poupak, Azad (Data collector)3
- Rabbany, Reihaneh (Researcher)1, 2
- Baris, Coskunuzer (Researcher)4
- Guillaume, Rabusseau (Researcher)5, 2, 6
- Cuneyt, Gurcan Akcora (Researcher)7
Description
Datasets provided in this storage are introduced in the paper: Towards Neural Scaling Laws for Foundation Models on Temporal Graphs
- Each .csv file represents all transactions of the token network that has the same name as the file name (<tokenname.csv>)
- Each transaction corresponds to a row in each file.
- Each transaction has:
- blockNumber : is the block ID of Ethereum that includes this transaction
- timestamp: time that the transaction is made in UNIX timestamp format
- tokenAddress : the address that specifies a unique ERC20 token
- from: address of sender
- to: address of receiver
- value: the amount the transaction
- fileBlock: we split the whole number of blocks count to 35 buckets and assigned the bucket ID to the transaction to trace the blocks
- To use the same setting as described in the papers, we include edge list and label that contain node interactions and labels for each snapshot in each token network.
- Each transaction in the edge list also has "from","to" and "amount" fields, but with an additional "snapshot" field to indicate the index of the snapshot that the transaction below to
- Each row in label file indicates the ground truth label of the snapshot having an index corresponding to the index of the row (e.g first row indicates the label of the first snapshot)
- We provided the way to generate edge lists and label files in the following Github repository: https://github.com/benjaminnNgo/ScalingTGNs/blob/main/script/utils/TGS.py
- However, we also provide raw .csv to divide into generate edgeslist and label with a different setting.
Abstract
The field of temporal graph learning aims to learn from evolving network data to forecast future interactions. Given a collection of observed temporal graphs, is it possible to predict the evolution of an unseen network from the same domain?
To answer this question, we first present the Temporal Graph Scaling(TGS) dataset, a large collection of temporal graphs, consisting of eighty-four ERC20 token transaction networks collected from 2017 to 2023. Next, we evaluate the transferability of Temporal Graph Neural Networks(TGNNs) for the temporal graph property prediction task by pre-training on a collection of up to sixty-four token transaction networks and then evaluate their downstream performance on twenty unseen token types. We observe that the neural scaling law observed in NLP and Computer Vision also applies in temporal graph learning where pre-training on more networks and more parameters leads to better downstream performance. To the best of our knowledge, this is the first time that the transferability of temporal graphs has been shown empirically. Notably, on the downstream token networks, the largest pre-trained model outperforms fine-tuned TGNNs on thirteen unseen test networks. Therefore, we believe this is a promising first step towards building foundation models for temporal graphs. The code and datasets are publicly available at https://github.com/benjaminnNgo/ScalingTGNs under MIT licence.
Files
TGS_edgelists.zip
Additional details
Dates
- Submitted
-
2024-06-05NeurIPS 2024 Datasets and Benchmarks Track
Software
- Repository URL
- https://github.com/benjaminnNgo/ScalingTGNs
- Programming language
- Python
- Development Status
- Active