Joint-lab workshop Nov. 19-21 2012

The workshop will take place at Argonne National Laboratory.

This event is supported by INRIA, ANL, UIUC and NCSA, as well as by EDF

Schedule under construction

Main Topics	Schedule	Speakers	Types of presentation	Topic	Download
	Sunday Nov. 18th 19:00	Dinner	Giordano's 641 PLAINFIELD RD WILLOWBROOK, IL 60521 (630) 325-6710	http://www.giordanos.com/ http://maps.google.com/maps?f=q&hl=en&q=641%20PLAINFIELD%20RD.,+WILLOWBROOK,+IL+60527+US&ie=UTF8&z=15&om=1&iwloc=A
Workshop Day 1 (Room 1416, TCS conference center)	Monday Nov. 19th
	07:30-8:30	Transportation: Guest House to TCS (building 240)		(Entrance of the conference center)
	08:00	Contiental Breakfast and Registration		Food available in Room 1407, Lunch seating in room 1416 (second half)
Welcome and Introduction	08:30	Franck Cappello, INRIA & UIUC, Marc Snir ANL	Opening	Welcome, formal opening and workshop details
	08:40	Marc Snir	Opening	ANL presentation and vision of the collaboration
	08:50	Bill Gropp	Opening	UIUC/NCSA update and vision of the collaboation
	09:00	Frederic Desprez	Opening	INRIA update on HPC strategy and vision of the collaboration
Big Apps, Big DATA - Big I/O chair: Rajeev Thakur	09:15	Robert Jacob	Trends in HPC	Climate simulation at extreme scale
	09:45	Rob Ross, ANL	Trends in HPC	Trends in HPC I/O and File systems
	10:15	Break
	10:45	Rob Pennington, NCSA	Trends in HPC	Big Data
	11:15	Andrew Chien, ANL	Potential collaboration	Big Data
	11:45	Matthieu Dorier, INRIA	Joint Results	Visualization
	12:15	Lunch
Programming Models/Runtime chair: Sanjay Kale	13:30	Wen-Mei Hwu, UIUC	TBA	Accelerators
	14:00	Pavan Balaji, ANL	Potential collaboration	MPI3 and Unified Runtime
	14:30	Andra Hugo, Raymond Namyst, INRIA	Potential collaboration	Composing multiple StarPU applications over heterogeneous machines: a supervised approach
	15:00	Jean-François Mehaut, INRIA	Potential collaboration	Optimizations for modern NUMA
	15:30	Break
Numerical algorithms and Methods Chair: Paul Hovland	16:00	TBA, ANL	TBA	TBA
	16:30	Laura Grigori	Results	Communication avoiding
	17:00	Bill Gropp, UIUC	Results	Hybrid Scheduling
	17:30	Laurent Hascoet, INRIA	Early Results	TBA
	18:00	Adjourn
	19:00	Dinner	Jameson's Woodridge 1001 W. 75th Street Woodridge, IL 60517 630.910.9700	http://www.jamesons-charhouse.com/index.html MAP

Workshop Day 2 (Main room)	Tuesday Nov. 20th

Big Systems Chair: Jean François Mehaut	08:30	Pete Beckman, ANL	Trends	New Directions in Extreme-Scale Operating Systems and Runtime Software
	09:00	Bill Kramer, UIUC/NCSA	Trends	Blue Waters update
Cloud Chair: Gabriel Antoniu	09:30	Ian Foster, ANL	Potential collaboration	TBA
	10:00	Christine Morin, INRIA	Potential collaboration	Contrial
	10:30	Break
	11:00	Frederic Desprez, INRIA	Potential collaboration	TBA
Resilience: Chair: Christine Morin	11:30	Mohamed Slim Bouguerra, INRIA	Early Result	Performance modeling of checkpointing under failure prediction
	12:00	Rinku Gupta, ANL	Potential collaboration	Interlayer error notification, coordination and CIFTS
	12:30	Ana Gainaru, UIUC	Early Results	Coupling failure prediction, proactive and preventive checkpoint for current production HPC systems.
	13:00	Lunch		Food buffet in Room 1407, Lunch seating in room 1416 (second half)
				Parallel Session
Mini workshop on Numerical libraries Chair: Paul Hovland (room 1406, TCS conference center)	8:30	Stefan Wild, ANL	Potential collaboration	TBA
	09:00	Bill Gropp, UIUC	Potential collaboration	TBA
	09:30	Laura Grigori, INRIA	Potential collaboration	TBA
	10:00	Break		TBA
	10:30	Anshu Dubey, ANL	Potential collaboration	TBA
	11:00	Discussion
	12:00	Adjourn
	13:00	Lunch
				Parallel Sessions
Mini workshop on Performance Modeling and simulation Chair: Marc Snir	14:30	Sanjay Kale, UIUC	Early Results	BIG SIM
	15:00	Arnaud Legrand, INRIA		SIM GRID
	15:30	Torsten Hoefler, ETH	Early Results	TBA
	16:00	Break
	16:30	Yves Robert, INRIA	Early Results	TBA
	17:00	Discussion
	18:00	Adjourn
	19:00	Dinner	Meggaiano's <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ba3bcf72-638a-4da2-8b0f-0d57ba036ede"><ac:plain-text-body><![CDATA[240 Oakbrook Center Oak Brook, IL 60523	[http://www.maggianos.com/EN/Oak-Brook_Oak-Brook_IL/Pages/LocationLanding.aspx?AspxAutoDetectCookieSupport=1 ]]></ac:plain-text-body></ac:structured-macro> ] MAP

Mini workshop on Cloud Chair: Kate Keahey	14:30	Kate Keahey, ANL	Potential collaboration	TBA
	15:00	Narayan Deai, ANL	Potential collaboration	TBA
	15:30	Jonathan Rouzaud, INRIA	Potential collaboration	TBA
	16:00	Break
	16:30	Michael Wilde	Potential collaboration	Swift: simpler parallel programming for cloud and HPC domains http://www.ci.uchicago.edu/swift (Swift for clouds and clusters) http://www.mcs.anl.gov/exm (Swift for extreme-scale domains)
	17:00	Discussion
	18:00	Adjourn
	19:00	Dinner	Meggaiano's <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="2cbd5543-1d36-4cb8-92d8-89e0a0f8911b"><ac:plain-text-body><![CDATA[240 Oakbrook Center Oak Brook, IL 60523	[http://www.maggianos.com/EN/Oak-Brook_Oak-Brook_IL/Pages/LocationLanding.aspx?AspxAutoDetectCookieSupport=1 ]]></ac:plain-text-body></ac:structured-macro> ] MAP

Workshop Day 3 (Main room)	Wednesday Nov 21st
				Parallel Sessions
Mini workshop on Programming models/runtime Chair: Pavan Balaji	08:30	Emmanuel Jeannot, INRIA	Results	TBA
	09:00	Sanjay Kale, UIUC		Charm++ update
	09:30	Christian Perez, INRIA		TBA
	10:00	Break
	10:30	Jim Dinan		One sided communication
	11:00	Sebastien Fourestier	Potential collaboration	Parallel repartitioning and re-mapping in Scotch
	11:30	Discussion
	12:30	Closing
	13:00	Lunch

Mini workshop on Resilience Chair: Franck Cappello	08:30	TBA	TBA	TBA
	09:00	Peter Brune, ANL	TBA	TBA
	09:30	Bogdan Nicolae, IBM	Results	Optimizing checkpoint image pages storage
	10:00	Break
	10:30	Tatiana Martsinkevich, INRIA	Results	Fully distributed recovery for send-determinism applications
	11:00	Amina Guermouche, INRIA	Results	TBA
	11:30	Discussion
	12:30	Closing
	13:00	Lunch		Boxe Lunches

Abstracts

Robert Ross, ANL

Trends in HPC I/O and File systems

All aspects of HPC systems are undergoing change as we move into petascale and towards exascale computing. The traditional "I/O software stack" is no exception: the layers, capabilities, and abstractions in the stack are all in flux as we consider how to best support future HPC applications. This talk will discuss these developmental trends, using ongoing work at Argonne as examples of some directions of study.

Andra Hugo, INRIA

Composing multiple StarPU applications over heterogeneous machines: a supervised approach

Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a single runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource oversubscription, undesirable cache flushes or memory bus contention.
In this talk, I will present an extension to the StarPU runtime system that enables multiple StarPU kernels to simultaneously run over the same CPU+GPU architecture. Further on, I will present some experimental results showing the improvements our solution brings to the efficiency of parallel applications composing several parallel libraries (e.g.: libraries in the domain of dense linear algebra or fluid mechanics). Eventually, I will give some insights about the main challenges of the composability problem and I will present the main topics we are interested in for the future work.

Pete Beckman, ANL

New Directions in Extreme-Scale Operating Systems and Runtime Software

For more than a decade, extreme-scale operating systems and runtime software have been evolving very slowly. Today's large-scale systems use slightly retooled "node" operating systems glued together with ad hoc local agents to handle I/O, job launch, and management. These extreme-scale systems are only slightly more tightly integrated than are generic Linux clusters with InfiniBand. As we look forward to a new era for large-scale HPC systems, we see that power and fault management will become key design issues. Software management of power and support for resilience must now be part of the whole-system design. Extreme-scale operating systems and runtime software will not be simply today's node code with a few control interfaces, but rather a tightly integrated "global OS" that spans the entire platform and works cooperatively across portions of the machine in order to manage power and provide resilience.

Sebastien Fourestier, INRIA

Parallel repartitioning and re-mapping in Scotch

Scotch is a software package for sequential and parallel graph partitioning, static mapping, sparse matrix block ordering, clustering and sequential mesh and hypergraph ordering. As a research project, it is subject to continuous improvement, resulting from several on-going research tasks. Our talk will address several new features we have recently added to Scotch. We will present some threaded algorithms for shared-memory coarsening and refinement. We will also show early results regarding its parallel repartitioning and sequential remapping functionalities.

Michael Wilde, ANL

Swift: simpler parallel programming for cloud and HPC domains

Ana Gainaru, UIUC

Coupling failure prediction, proactive and preventive checkpoint for current production HPC systems.

A large percentage of computing capacity in today’s large high-performance computing systems is wasted due to failures and recoveries. A way of reducing the overhead induced by these strategies is by combining them with failure avoidance methods. Failure avoidance is based on a prediction model that detects fault occurrences ahead of time and allows preventive measures to be taken, such as task migration or checkpointing the application. This talk presents the implementation and results of a prototype implementation of proactive checkpointing based on the ELSA toolkit coupled with periodic multi-level checkpointing based on FTI. The proactive checkpointing is implemented as a level zero (L0) in a four-level scheme, providing the fastest checkpoint, which is necessary to act quickly between the failure prediction and the moment of the failure. We evaluate the proposed approach on the TSUBAME system and we show that the overhead in comparison with a preventive checkpoint execution only represents only 2% to 6%.

Child pages