MathInstitutes Dev

Optimization Methods for Large Scale Distributed Deep Learning

Presenter

Rio Yokota

September 28, 2018

Optimization Methods for Large Scale Distributed Deep Learning Thumbnail

Abstract

Rio Yokota Tokyo Institute of Technology As deep neural networks increase in size, the amount of data and time to train them become prohibitively large to handle on a single compute node. Distributed deep learning on thousands of GPUs forces the mini-batch stochastic descent methods to operate in a regime where the increasing batch size starts to have detrimental effect on the convergence and generalization. We investigate the possibility of using second order optimization methods with proper regularization as an alternative to conventional stochastic gradient decent methods.

Abstract

Supplementary Materials

Optimization Methods for Large Scale Distributed Deep Learning

Videos

Optimization Methods for Large Scale Distributed Deep Learning

Presenter

Abstract

Supplementary Materials