MathInstitutes Dev

Communication Efficient Algorithms for Distributed Machine Learning

Presenter

Nina Balcan

May 16, 2016

IMA

Keywords:

distributed machine learning; communication efficient algorithms

Communication Efficient Algorithms for Distributed Machine Learning Thumbnail

Play Video

Abstract

We consider the problem of learning from distributed data and analyze fundamental algorithmic and communication complexity questions involved. Broadly, we consider a framework where information is distributed between several locations, and our goal is to learn a low-error hypothesis with respect to the overall data by using as little communication, and as few rounds of communication, as possible. As an example, suppose k research groups around the world have collected large scientific datasets, such as genomic sequence data or sky survey data, and we wish to perform learning over the union of all these different datasets without too much communication. In this talk, I will first discuss a general statistical or PAC style framework for analyzing communication complexity issues involved when doing distributed supervised machine learning, i.e., learning from annotated data distributed across multiple locations. I will also discuss algorithms with good communication complexity for unsupervised learning and dimensionality reduction problems, with interesting connections to efficient distributed coreset construction.

Videos

Communication Efficient Algorithms for Distributed Machine Learning

Presenter

Abstract