3516093
doi
10.5281/zenodo.3516093
oai:zenodo.org:3516093
user-eu
Aldinucci, Marco
University of Torino
Deep Learning at Scale with Nearest Neighbours Communications
Viviani, Paolo
University of Torino
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
HPC
Deep Learning
Machine Learning
Distributed Training
<p>As deep learning techniques become more and more popular, there is the need to move these applications from the data scientist’s Jupyter notebook to efficient and reliable enterprise solutions. Moreover, distributed training of deep learning models will happen more and more outside the well-known borders of cloud and HPC infrastructure and will move to <em>edge </em>and mobile platforms. Current techniques for distributed deep learning have drawbacks in both these scenarios, limiting their long-term applicability. </p>
<p>After a critical review of the established techniques for Data Parallel training from both a distributed computing and deep learning perspective, a novel approach based on nearest-neighbour communications is presented in order to overcome some of the issues related to mainstream approaches, such as global communication patterns. Moreover, in order to validate the proposed strategy, the Flexible Asynchronous Scalable Training (FAST) framework is introduced, which allows to apply the nearest-neighbours communications approach to a deep learning framework of choice. </p>
<p>Finally, a relevant use-case is deployed on a medium-scale infrastructure to demonstrate both the framework and the methodology presented. Training convergence and scalability results are presented and discussed in comparison to a baseline defined by using state-of-the-art distributed training tools provided by a well-known deep learning framework.</p>
Zenodo
2019-09-01
info:eu-repo/semantics/doctoralThesis
3516092
user-eu
1.0
award_title=Factories of the Future Resources, Technology, Infrastructure and Services for Simulation and Modelling 2; award_number=680481; award_identifiers_scheme=url; award_identifiers_identifier=https://cordis.europa.eu/projects/680481; funder_id=00k4n6c32; funder_name=European Commission;
1579540995.95285
2341498
md5:be2bd5fc4c1682c68606e529aace7183
https://zenodo.org/records/3516093/files/20190910_final_pdf.pdf
public
10.5281/zenodo.3516092
isVersionOf
doi