Improving Communication Performance in Dense Linear Algebra via Topology Aware Collectives

SESSION: Optimizing Communication Performance


TIME: 4:30PM - 5:00PM

AUTHOR(S):Edgar Solomonik, Abhinav Bhatele, James Demmel


Recent results have shown that topology aware mapping reduces network contention in communication-intensive kernels on massively parallel machines. We demonstrate that on mesh interconnects, topology aware mapping allows for utilization of highly-efficient topology aware collectives. We map novel 2.5D dense linear algebra algorithms to cuboid partitions allocated by a Blue Gene/P supercomputer. Our mappings allow the algorithms to exploit optimized line multicasts and reductions. Commonly used 2D algorithms cannot be mapped in this fashion. On 65,536 cores of Blue Gene/P, 2.5D algorithms with rectangular collectives are 2.6x and 2.7x faster for matrix multiply and LU factorization, respectively. For LU, communication time drops by up to 92%. We derive a novel performance model based on the LogP model for rectangular broadcasts and reductions. We model performance on a hypothetical exascale architecture. Our study evaluates the benefits of topology aware collectives for high performance algorithms.

Edgar Solomonik - University of California, Berkeley

Abhinav Bhatele - University of Illinois at Urbana-Champaign

James Demmel - University of California, Berkeley

The full paper can be found in the ACM Digital Library

