Videos

Optimization Methods for Large Scale Distributed Deep Learning

Presenter
September 28, 2018
Abstract
Rio Yokota Tokyo Institute of Technology As deep neural networks increase in size, the amount of data and time to train them become prohibitively large to handle on a single compute node. Distributed deep learning on thousands of GPUs forces the mini-batch stochastic descent methods to operate in a regime where the increasing batch size starts to have detrimental effect on the convergence and generalization. We investigate the possibility of using second order optimization methods with proper regularization as an alternative to conventional stochastic gradient decent methods.
Supplementary Materials