BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20111116T011500Z DTEND:20111116T030000Z LOCATION:WSCC North Galleria 2nd/3rd Floors DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Faults have become the norm rather than the exception for high-end computing on clusters with 10s/100s of thousands of cores. Exacerbating this situation, some of these faults will not be detected, manifesting themselves as silent errors that will corrupt memory while applications continue to operate and report incorrect results. This poster introduces RedMPI, an MPI library which resides in the MPI profiling layer. RedMPI is capable of both online detection and correction of soft errors that occur in MPI applications without requiring any modifications to the application source. By providing redundancy, RedMPI is capable of transparently detecting corrupt messages from MPI processes that become faulted during execution. Furthermore, with triple redundancy RedMPI additionally ``votes'' out MPI messages of a faulted process by replacing corrupted results with corrected results from unfaulted processes. We present an experimental evaluation of RedMPI on an assortment of applications to demonstrate the effectiveness of this approach. SUMMARY:Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing PRIORITY:3 END:VEVENT END:VCALENDAR