Joint-lab workshop Nov. 21-23 2011

This event is supported by INRIA, UIUC and NCSA, the French ministry of foreign affairs, as well as by EDF

Main Topics	Schedule	Speaker	Affiliation	Type of presentation	Title (tentative)	Download
	Sunday Nov. 20th	Dinner at ...

Workshop Day 1	Monday Nov. 21th
					ALL TITLES ARE TEMPORARY
Registration	08:00
Welcome and Introduction	08:30	Marc Snir + Franck Cappello	INRIA&UIUC	Background	Welcome Workshop objectives and organization
	08:40	Danny Powell	NCSA	Background	NCSA 5 year Strategy
	08:50	Claude Kirchner / Thierry Priol / Jean Roman	INRIA	Background	Update on INRIA and HPC
Sustained Petascale Chair: Marc Snir	09:00	Billl Kramer	NCSA	Background	Blue Waters
	09:30	Bill Gropp	UIUC	Background	Application challenges for sustained Petascale
	10:00	Break
	11:30	Michele Buttler and Bill Kramer	NCSA	Background	Storage system issues for sustained petascale systems
	11:00	Wen-Mei Hwu	UIUC	Background	Sustained petascale systems and Accelerators
From Petascale to Exascale Chair: Franck Cappello	11:30	Marc Snir	ANL & UIUC	Background	Potential extension of the collaboration to ANL and BG/Q
	12:00	Lunch
	13:30	Rajeev Thakur	ANL	Background	Challenges in Scaling MPI to Exascale
	14:00	Robert Ross	ANL	Background	Key I/O challenges for Petascale and Beyond
	14:30	Paul Hovland	ANL	Background	TBA
	15:00	George Bosilca	UTK/ICL	Background	ICL Research on Resilience and Numerical Algorithms
	15:30	Break
System software Chair: Thierry Priol	16:00	Franck Cappello	INRIA&UIUC	Joint Results	Introduction of the activities in System + talk
	16:30	Ana Gainaru	UIUC & NCSA	Joint Results	Signal Analysis for Modeling the Normal and Faulty Behavior of Large-scale HPC Systems
	17:00	Thomas Ropars	EPFL	Joint Results	On Distributed Recovery for Send-Deterministic-Aware MPI Applications
	17:30	Leonardo Bautista Gomez	Titech	Joint Results	Hierarchical groups for multilevel checkpoints and partial restart

		Dinner at ...

Workshop Day 2	Tuesday Nov. 22th

System Software cont. Chair: Torsten Hoefler	08:30	Olivier Gluck	INRIA	Joint Results	Reducing energy consumption of fault tolerance algorithms
	09:00	Gabriel Antoniu & Matthieu Dorrier	INRIA	Joint Results	Update on DAMARIS: Making CM1 scalling linarly up to 10,000 cores
Numerical Library Chair: Jean Roman	09:30	Bill Gropp	UIUC	Joint Results	Introduction of the activity in Numerical Algorithms and Libraries + talk
	10:00	Luc Giraud	INRIA	Joint Results	Fault tolerant Numerical Methods
	10:30	Break
	11:00	Laura Grigori	INRIA	Joint Early Results	Hybrid scheduling and communication avoiding for CALU
	11:30	Sébastien Fourestier	INRIA	Joint Early Results	Last improvements in Scotch and ongoing collaborations.
	12:00	Yves Robert	INRIA	Background	Linear algebra kernels on petascale/exascale platforms: scheduling issues
	12:30	Lunch

Numerical Lib. Cont. Chair: Bill Gropp	14:00	Marc Baboulin	INRIA	Joint Early Results	A parallel tiled solver for dense symmetric indefinite systems on multicore architectures
	14:30	Daisuke Takahashi & Alex Yee	U. Tsukuba	Joint Results	A Scalable Parallel Algorithm for 3-D FFT
Programming environments Chair: Rajeev Thakur	15:00	Sanjay Kale	UIUC	Joint Early Results	Introduction of the activities in Programming Models + talk
	15:30	Julien Bigot / Christian Perez	INRIA	Joint Early Results	Modularizing an FFT library with Charm++ & HLCM: combining performance and portability
	16:00	Break
	16:30	Alexandre Duchateau	UIUC	Joint Early Results	Generation and Tuning of parallel solutions for linear algebra equations
	17:00	Jean François Mehaud	INRIA	Joint Early Results	TBA
	17:30	Emmanuel Jeannot	INRIA	Joint Early Results	TBA
	18:00	Franck Cappello & Marc snir	INRIA &UIUC & ANL		Preparation of the working groups

	19:00	Banquet

Workshop Day 3	Wednesday June 29th

	8:30	Franck Cappello & Marc snir	INRIA &UIUC & ANL		Indications for working groups
Working groups	9:00- 10:30	Bill Gropp			Numerical libraries 3 groups (Laura Grigori, Yves Robert, Sebastien Lefourestier + Paul Hovland + Wen-Mei Hwu, ...)
	9:00 - 10:30	Marc Snir			I/O (Bill Kramer + Gabriel Antoniu + Matthieu Dorrier + Michele Buttler + Brett Bode + Rajeev Thakur + Rob Ross + Pavan Balaji + ...)
	10:30	Break
	11:00 - 12:30	Sanjay Kale			Programming models 4 groups (Jean Francois Mehaut, Sebastien Fourestier, Chrsitian Perez, Emmanuel Jeannot, Pavan Balaji + Wen-Mei Hwu ...)
	11:00 - 12:30	Franck Cappello			Resilience 2 groups: resilient algorithms (Bill Gropp, George Bosilca, Yves Robert, Laura Grigori + ...) and resilient systems (Bill Kramer, Marc Snir, George Bosilca, Ana Gainaru, Leonardo Bautista, Yves Robert + Rajeev Thakur + ...)
	12:30	Adjourn
	13:00	Lunch
	14:30 - 18:00				Informal working groups
	19:00	Dinner at ...

Abstracts

Rajeev Thakur: Challenges in Scaling MPI to Exascale

This talk will discuss challenges in using MPI effectively at exascale. I will describe ongoing research at Argonne aimed at addressing these challenges. I will also give an update on recent activities of the MPI Forum and what new features are being considered for inclusion in MPI-3.

Ana Gainaru: Signal Analysis for Modeling the Normal and Faulty Behavior of Large-scale HPC Systems

This talk will present a novel way of characterizing the normal and faulty behavior of the system by using signal analysis concepts. All analysis modules create ELSA (Event Log Signal Analyzer), a toolkit that has the purpose of modeling the normal flow of each state event during a HPC system lifetime, and how it is affected when a failure hits the system. Current event mining approaches do not take into consideration the specific behavior of each type of events and as a consequence, fail to analyze them according to their characteristics. We will show that our models provide an accurate view of the system output, which improves the effectiveness of proactive fault tolerance algorithms. Specifically, we implemented a filtering algorithm and short-term fault prediction methodology based on the extracted model and test it against real failure traces from a large-scale system. We show that by analyzing each event according to its specific behavior, we get a more realistic overview of the entire system.

Thomas Ropars: On Distributed Recovery for Send-Deterministic-Aware MPI Applications

The send-deterministic execution model states that in any correct execution of an application, the processes send the same sequence of messages for a given set of input parameters. Many large scale MPI HPC applications comply with this model. Send-determinism allows to design new rollback-recovery protocols that: i) can rely on uncoordinated checkpointing without suffering from the domino effect; ii) can provide failure containment with a limited performance overhead. One major challenge remains: how to make recovery efficient and scalable ?
In this talk, we first give a brief overview of the principles and the performances of HydEE, our hybrid rollback-recovery protocol based on send-determinism. Then we discuss the problems related to performance on recovery, and we show how recovery could be made fully distributed in such a protocol if the application was able to express its send-determinism.

Olivier Gluck: Reducing energy consumption of fault tolerance algorithms

Over the past few years, energy consumption of supercomputers has become a major issue. In order to be able to meet the important needs in terms of performance that express scientists in various fields, supercomputers are growing too fast. In fact, they involve more and more computing nodes, which consequently increase both their total energy consumption and their probability to experience a failure. Especially, in order to ensure the transition to the exascale era by 2018 which will involve millions of cores, we need to address these two challenges by providing efficient fault tolerance mechanisms while reducing the total energy consumption.
In this talk, we first present some techniques used to reduce the energy consumptions of large scale distributed systems and particularly in future supercomputers. Then, we present our current research works for reducing energy consumption costs of fault tolerance algorithms in exascale supercomputers.

Yves Robert: Linear algebra kernels on petascale/exascale platforms: scheduling issues

Future exascale machines will likely be massively parallel architectures, with 100K to 1000K processors, each processor itself being equipped with 1K to 10Kcores. At the node level, the architecture is a shared-memory machine, running many parallel threads on the cores. At the machine level, the architecture is a distributed-memory machine. This additional level of hierarchy, together with massive parallelism at the node level, dramatically complicates the design of new versions of the standard numerical linear algebra algorithms that are at the heart of many scientific applications. On exascale platforms, resilience is a key challenge. Failures are much more likely to occur during the execution of parallel jobs that enroll increasingly larger numbers of processors. The design of efficient fault-tolerant scheduling strategies will be key to high performance. Such strategies can involve either checkpointing, or task replication, or dynamic task re-execution, or any combination. But they all incur big overheads in terms of performance, and of energy-consumption. The main goal of the talk is to survey the challenges faced to design linear algebra algorithm on exascale architectures, and to provide a few examples of algorithms and scheduling techniques
that constitute a first step to solving these challenges. Joint work with Marin Bougeret, Henri Casanova, Jack Dongarra, Thoma Hérault, Julien Langou, Mathieu Faverge, and Frédéric Vivien.

Sebastien, Fourestier: Last improvements in Scotch and ongoing collaborations.

Scotch is a software package for sequential and parallel graph partitioning, static mapping, sparse matrix block ordering, and sequential mesh and hypergraph ordering. As a research project, it is subject to continuous improvement, resulting from several on-going research tasks. Our talk will focus on the last improvements we have done in Scotch and the ongoing collaborations within the joint laboratory. We will also briefly present other ongoing work, in the context of our new roadmap.

Marc Baboulin: A parallel tiled solver for dense symmetric indefinite systems on multicore architectures

We present an efficient and innovative parallel tiled algorithm for solving symmetric indefinite systems on multicore architectures. This solver avoids the communication overhead due to pivoting by using symmetric randomization. This randomization is computationally inexpensive and requires very little storage. Following randomization, a tiled LDLT factorization is used that reduces synchronization by using static or dynamic scheduling. We compare Gflop/s performance of our solver with other types of factorizations on a current multicore machine and we provide tests on accuracy using LAPACK test cases.

Daisuke Tekahashi and Alex Yee: A Scalable Parallel Algorithm for 3-D FFT

In this talk, a scalable parallel algorithm for 3-D fast Fourier transform (FFT) is presented. A typical decomposition for performing a parallel 3-D FFT is slab-wise. In this case, for N^3-point FFT, N must be greater than or equal to the number of MPI processes. Our proposed parallel 3-D FFT algorithm allows up to N^(3/2) MPI processes for N^3-point FFT. Moreover, this scheme requires only one all-to-all communication for
transposed-order output. Performance results of parallel 3-D FFTs on clusters of multi-core processors are reported.

Julien Bigot: Modularizing an FFT library with Charm++ & HLCM: combining performance and portability

When designing a High Performance application, one usually has to handle two kinds of decomposition. The first one is dictated by the parallelism of the hardware platform. The second one follows the logical module that form the application. In order to combine high performance with a high level of code re-usability, the code should reflect both. Programming models such as Charm++ offer a good support for parallelism. Charm++ encourages a philosophy of over-decomposition. Applications are decomposed into chares, objects that communicate by exchanging messages. They are executed in parallel on the available processors. Object-oriented languages do however lack intrinsic support for modular decomposition. The paradigm of component based software engineering has been proposed to tackle this problem. Components are pieces of code that can be externally assembled to form the whole application. When combining these two kinds of decomposition, care should be taken as they can interfere. For example, replacing a given component with an implementation relying on a different parallel decomposition can lead to inefficient data redistribution at the interface between components. The HLCM component assembly model has been designed to support the efficient combination of both form of decomposition. It supports user defined interactions that can be optimized for various kind of hardware platforms and is based on a compilation approach to prevent any overhead at runtime. We present an implementation that enables the use of HLCM to assemble Charm++ components. We show how this has been used to modularize an FFT library with minimal modification to the code. We evaluate this by showing that the modularized code behaves similarly to the initial one with respect to performance while easing the replacement of some of its module with code optimized for specific hardware.

Alexandre Duchateau: Generation and Tuning of parallel solutions for linear algebra equations

An auto-tuning system and methodology for algorithm exploration for a class of linear algebra problems. Starting with a description of equations, the system automatically finds divide and conquer algorithms to solve the equations with the main objective of exposing parallelism. The same strategy can be used to improve cache locality.

Child pages

Joint-lab workshop Nov. 21-23 2011

This event is supported by INRIA, UIUC and NCSA, the French ministry of foreign affairs, as well as by EDF

ALL TITLES ARE TEMPORARY

Abstracts