Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published April 25, 2022 | Version v1
Journal article Open

Federated Learning of Molecular Properties with Graph Neural Networks in a Heterogeneous Setting

  • 1. University of Rochester

Description

Chemistry research has both high material and computational costs to conduct experiments. Institutions thus consider chemical data valuable, and there have been few efforts to construct large public datasets for machine learning. Another challenge is that different intuitions are interested in different classes of molecules, creating heterogeneous data that cannot be easily joined by conventional distributed training. This work introduces federated heterogeneous molecular learning to address these challenges. Federated learning allows end-users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets for heterogeneously distributed client data. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align the local training across heterogeneous clients by improving the performance for uncertain samples. Experiments conducted on FedChem validate the advantages of this method. This work should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about sharing valuable chemical data.

Files

fedchem-main.zip

Files (377.2 kB)

Name Size Download all
md5:3f0e3714d2a1fac4cb2aff030aba683b
377.2 kB Preview Download