Efficient Data Race Detection for Distributed Memory Parallel Programs

SESSION: Debugging


TIME: 4:00PM - 4:30PM

AUTHOR(S):Chang-Seo Park, Paul Hargrove, Costin Iancu, Koushik Sen


We present a precise data race detection technique for distributed memory programs. Our technique, called Active Testing, builds on our previous work on race detection for shared-memory Java/C programs, handles programs written using shared-memory approaches as well as bulk communication. Active testing works in two phases: phase1 performs a dynamic analysis of an execution of the program and finds potential data races that could happen if the program is executed with a different thread schedule. Phase2 re-executes the program by actively controlling the thread schedule to confirm data races reported in phase1. A key highlight of our technique is that it can scalably handle distributed programs with bulk communication and single- and split-phase barriers. We implement the framework for UPC and demonstrate scalability up to a thousand cores for both fine-grained and bulk MPI-style programs. The tool confirms previously known bugs and uncovers several unknown ones.

Chair/Author Details:

Chang-Seo Park - University of California, Berkeley

Paul Hargrove - Lawrence Berkeley National Laboratory

Costin Iancu - Lawrence Berkeley National Laboratory

Koushik Sen - University of California, Berkeley

