There is a newer version of the record available.

Published March 25, 2023 | Version v1
Other Open

Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning

  • 1. University of California, Merced
  • 2. Microsoft Research

Contributors

Producer:

  • 1. University of California, Merced

Description

Betty introduces two novel techniques, redundancy-embedded graph (REG) partitioning and memory-aware partitioning, to effectively mitigate the redundancy and load imbalances issues across the partitions. Redundancy-embedded graph (REG) is implemented in 'graph\_partitioner.py';  Memory-aware partitioning implementation is based on memory estimation, details are in 'block\_dataloader.py'  
In artifact evaluation, figure 2 illustrates the OOM situation of current advanced GNN training, and figure 10 shows Betty breaks the memory wall. We use figure 12 to denote the tendency of peak memory consumption and training time per epoch as the number of micro batches increases. Further,  the model convergence is not impacted by Betty and micro-batch training can be proved by figure 13.

The framework of Betty is developed upon DGL(pytorch backend). The requirements: pytorch >= 1.7, DGL >= 0.7. The other software dependency include sortedcontainers, pyvis, pynvml, tqdm, pymetis. 

Our experiments result denoted in paper were collected from the machine with a RTX6000 GPU(24 GB memory) and Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz. You can use a different configuration with at least one GPU.

Files

Betty-master.zip

Files (477.4 kB)

Name Size Download all
md5:41da81514a8a04ff6bcd05126281d3de
477.4 kB Preview Download

Additional details

References

  • Wang, M. Y. (2019, January). Deep graph library: Towards efficient and scalable deep learning on graphs. In ICLR workshop on representation learning on graphs and manifolds.