Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning
Authors/Creators
- 1. University of California, Merced
- 2. Microsoft Research
Description
Betty introduces two novel techniques, redundancy-embedded graph (REG) partitioning and memory-aware partitioning, to effectively mitigate the redundancy and load imbalances issues across the partitions. Redundancy-embedded graph (REG) is implemented in 'graph\_partitioner.py'; Memory-aware partitioning implementation is based on memory estimation, details are in 'block\_dataloader.py'
In artifact evaluation, figure 2 illustrates the OOM situation of current advanced GNN training, and figure 10 shows Betty breaks the memory wall. We use figure 12 to denote the tendency of peak memory consumption and training time per epoch as the number of micro batches increases. Further, the model convergence is not impacted by Betty and micro-batch training can be proved by figure 13.
The framework of Betty is developed upon DGL(pytorch backend). The requirements: pytorch >= 1.7, DGL >= 0.7. The other software dependency include sortedcontainers, pyvis, pynvml, tqdm, pymetis.
Our experiments result denoted in paper were collected from the machine with a RTX6000 GPU(24 GB memory) and Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz. You can use a different configuration with at least one GPU.
Files
Betty-master.zip
Files
(477.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:41da81514a8a04ff6bcd05126281d3de
|
477.4 kB | Preview Download |
Additional details
References
- Wang, M. Y. (2019, January). Deep graph library: Towards efficient and scalable deep learning on graphs. In ICLR workshop on representation learning on graphs and manifolds.