MathInstitutes Dev

[Moved Online] Hot Topics: Optimal Transport And Applications To Machine Learning And Statistics - From stochastic gradient descent to Wasserstein gradient flows

Presenter

Andrea Montanari

May 8, 2020

MSRI

Keywords:

Neural networks
Mean field
Wasserstein gradient flow

MSC:

35Q68
60K35

[Moved Online] Hot Topics: Optimal Transport And Applications To Machine Learning And Statistics - From stochastic gradient descent to Wasserstein gradient flows Thumbnail

Play Video

Abstract

Modern neural networks contain millions of parameters, and training them requires to optimize a highly non-convex objective. Despite the apparent complexity of this task, practitioners successfully train such models using simple first order methods such as stochastic gradient descent (SGD). I will survey recent efforts to understand this surprising phenomenon using tools from the theory of partial differential equations. Namely, I will discuss a mean field limit in which the number of neurons becomes large, and the SGD dynamics is approximated by a certain Wasserstein gradient flow. [Joint work with Adel Javanmard, Song Mei, Theodor Misiakiewicz, Marco Mondelli, Phan-Minh Nguyen]

Abstract

Videos

[Moved Online] Hot Topics: Optimal Transport And Applications To Machine Learning And Statistics - From stochastic gradient descent to Wasserstein gradient flows

Presenter

Abstract