Published June 2, 2021 | Version v1

Decentralized Word2Vec Using Gossip Learning

  • 1. KTH Royal Institute of Technology
  • 2. RISE Research Institutes of Sweden

Description

Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a reduction in ground-truth similarity scores as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training.

Files

nodalida_camera_ready.pdf

Files (1.4 MB)

Name Size Download all
md5:55476f5b3f063d6188206fd30474eed1
1.4 MB Preview Download

Additional details

Funding

European Commission
RAIS - RAIS: Real-time Analytics for the Internet of Sports 813162