When viewing the Technical Program schedule, on the far righthand side
is a column labeled "PLANNER." Use this planner to build your own
schedule. Once you select an event and want to add it to your personal
schedule, just click on the calendar icon of your choice (outlook
calendar, ical calendar or google calendar) and that event will be
stored there. As you select events in this manner, you will have your
own schedule to guide you through the week.
You can also create your personal schedule on the SC11 app (Boopsie) on your smartphone. Simply select a session you want to attend and "add" it to your plan. Continue in this manner until you have created your own personal schedule. All your events will appear under "My Event Planner" on your smartphone.
Log Analysis for Fault Management in Large-scale Systems
SESSION: Doctoral Research Showcase (2 of 2)
EVENT TYPE: Doctoral Research Showcase
TIME: 4:15PM - 4:30PM
SESSION CHAIR: Volodymyr Kindratenko
Presenter(s):Ziming Zheng
ROOM:TCC LL1
ABSTRACT: With the increasing scale and complexity of high performance
computing (HPC) systems, reliability is becoming critical for these
systems. System logs are the primary source of information to
understand and analyze system problems. Nevertheless, manual log
processing is time-consuming, errorprone, and not scalable.
Currently little study has been done on automated log analysis for
practical use in HPC systems. In this study, we present a log
analysis infrastructure by exploiting data mining and statistical learning technologies. Our work can be broadly divided into four parts: log pre-processing, online failure prediction, automatic root cause diagnosis, and reliability modeling. We evaluate our preliminary results by means of system logs collected from production HPC systems. The work can greatly improve our
understanding of faults and failures arising from hardware/software
components and their interactions in HPC systems. It can further
facilitate the resilience research for HPC systems.