Videos

Architectural Implications of Communication-Avoiding Algorithms

Presenter
November 27, 2018
Abstract
James Demmel - University of California, Berkeley (UC Berkeley) Algorithms have two costs: arithmetic and communication, i.e. moving data between levels of a memory hierarchy or processors over a network. Communication costs (measured in time or energy per operation) already greatly exceed arithmetic costs, and the gap is growing over time following technological trends. Thus our goal is to design algorithms that minimize communication. We overview recent results in this area, and give more details on results with architectural implications: (1) We show that some algorithms can achieve perfect strong scaling (in time and energy), subject to topology requirements on the interconnect network, while others cannot. (2) Since writes can be more expensive than reads in some emerging nonvolatile memory technologies, we extend our communication lower bounds and optimal algorithms to independently minimize writes and reads; in some cases it is possible to do asymptotically fewer writes than reads, but not in other cases. (3) The next IEEE 754 Floating Point Standard is addressing increasing demand for bitwise reproducibility in numerical computation, by including new instructions intended to make floating point summation associative at low cost, so that algorithms, including much of linear algebra, can be bitwise reproducible independent of data layout, the number of processors, etc. We show how our communication lower bounds and optimal algorithms can be extended to include reproducibility. Finally, we outline the mathematical ideas, based on on recent extensions of the Holder-Brascamp-Lieb inequality, that extend these ideas to general algorithms that can be described as nested loops accessing arrays.
Supplementary Materials