BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20111114T163000Z DTEND:20111114T200000Z LOCATION: DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Datasets are growing larger and larger each year. The goals of this tutorial are to give an introduction to some of the tools and techniques that can be used for managing, analyzing, and transporting large datasets.=0A=0A1) We will give an introduction to managing scientific datasets using distributed file systems, such as Hadoop, and NoSQL databases, such as HBase.=0A=0A2) We will give an introduction to parallel programming frameworks, such as MapReduce, Hadoop streams, and related techniques.=0A=0A3) We will give an introduction to some of the specialized tools used for transporting large datasets, such as GridFTP and UDT.=0A=0AWe will illustrate these technologies and techniques using several case studies, including the management and analysis of the large datasets produced by next generation sequencing devices and the analysis of the high volume data streams and large datasets that arise with NetFlow data. SUMMARY:M01: An Introduction to Data Intensive Computing PRIORITY:3 END:VEVENT END:VCALENDAR