This event is supported by INRIA, UIUC and NCSA, the French ministry of foreign affairs, as well as by EDF
Main Topics |
Schedule |
Speaker |
Affiliation |
Type of presentation |
Title (tentative) |
Download |
||
|
Sunday Nov. 20th |
Dinner at ... |
|
|
|
|
||
|
|
|
|
|
|
|
||
Workshop Day 1 |
Monday Nov. 21th |
|
|
|
|
|
||
|
|
|
|
|
ALL TITLES ARE TEMPORARY |
|
||
Registration |
08:00 |
|
|
|
|
|
||
Welcome and Introduction |
08:30 |
Marc Snir + Franck Cappello |
INRIA&UIUC |
Background |
Welcome Workshop objectives and organization |
|
||
|
08:40 |
Danny Powell |
NCSA |
Background |
NCSA 5 year Strategy |
|
||
|
08:50 |
Claude Kirchner / Thierry Priol / Jean Roman |
INRIA |
Background |
Update on INRIA and HPC |
|
||
Sustained Petascale |
09:00 |
Billl Kramer |
NCSA |
Background |
Blue Waters |
|
||
|
09:30 |
Bill Gropp |
UIUC |
Background |
Application challenges for sustained Petascale |
|
||
|
10:00 |
Break |
|
|
|
|
||
|
11:30 |
Michele Buttler and Bill Kramer |
NCSA |
Background |
Storage system issues for sustained petascale systems |
|
||
|
11:00 |
Wen-Mei Hwu |
UIUC |
Background |
Sustained petascale systems and Accelerators |
|
||
From Petascale to Exascale |
11:30 |
Marc Snir |
ANL & UIUC |
Background |
Potential extension of the collaboration to ANL and BG/Q |
|
||
|
12:00 |
Lunch |
|
|
|
|
||
|
13:30 |
Rajeev Thakur |
ANL |
Background |
MPI challenges for sustained Petaflops and Exacale |
|
||
|
14:00 |
Robert Ross |
ANL |
Background |
Key I/O challenges for Petascale and Beyond |
|
||
|
14:30 |
Paul Hovland |
ANL |
Background |
TBA |
|
||
|
15:00 |
George Bosilca |
UTK/ICL |
Background |
ICL Research on Resilience and Numerical Algorithms |
|
||
|
15:30 |
Break |
|
|
|
|
||
System software |
16:00 |
Franck Cappello |
INRIA&UIUC |
Joint Results |
Introduction of the activities in System + talk |
|
||
|
16:30 |
Ana Gainaru |
UIUC & NCSA |
Joint Results |
Signal Analysis for Modeling the Normal and Faulty Behavior of Large-scale HPC Systems |
|
||
|
17:00 |
Thomas Ropars |
EPFL |
Joint Results |
On Distributed Recovery for Send-Deterministic-Aware MPI Applications |
|
||
|
|
|
|
|
|
|
||
|
|
Dinner at ... |
|
|
|
|
||
|
|
|
|
|
|
|
||
Workshop Day 2 |
Tuesday Nov. 22th |
|
|
|
|
|
||
|
|
|
|
|
|
|
||
System Software cont. |
08:30 |
Leonardo Bautista Gomez |
Titech |
Joint Results |
Hierarchical groups for multilevel checkpoints and partial restart |
|
||
|
09:00 |
Olivier Gluck |
INRIA |
Joint Results |
Energy consumption of fault tolerance |
|
||
|
09:30 |
Gabriel Antoniu & Matthieu Dorrier |
INRIA |
Joint Results |
Update on DAMARIS: Making CM1 scalling linarly up to 10,000 cores |
|
||
Numerical Library |
10:00 |
Bill Gropp |
UIUC |
Joint Results |
Introduction of the activity in Numerical Algorithms and Libraries + talk |
|
||
|
10:30 |
Break |
|
|
|
|
||
|
11:00 |
Luc Giraud |
INRIA |
Joint Results |
Fault tolerant Numerical Methods |
|
||
|
11:30 |
Laura Grigori |
INRIA |
Joint Early Results |
Hybrid scheduling and communication avoiding for CALU |
|
||
|
12:00 |
Sébastien Fourestier |
INRIA |
Joint Early Results |
TBA |
|
||
|
12:30 |
Yves Robert |
INRIA |
Background |
Linear Algebra kernels on Petascale/exascale platforms: scheduling issues |
|
||
|
13:00 |
Lunch |
|
|
|
|
||
Numerical Lib. Cont. |
14:30 |
Marc Baboulin |
INRIA |
Joint Early Results |
A parallel tiled solver for dense symmetric indefinite systems on multicore architectures |
|
||
|
15:00 |
Daisuke Takahashi & Alex Yee |
U. Tsukuba |
Joint Results |
Early results on 1 All2all 3D FFT |
|
||
Programming environments |
15:30 |
Sanjay Kale |
UIUC |
Joint Early Results |
Introduction of the activities in Programming Models + talk |
|
||
|
16:00 |
Break |
|
|
|
|
||
|
16:30 |
Julien Bigot / Christian Perez |
INRIA |
Joint Early Results |
TBA |
|
||
|
17:00 |
Jean François Mehaud |
INRIA |
Joint Early Results |
TBA |
|
||
|
17:30 |
Emmanuel Jeannot |
INRIA |
Joint Early Results |
TBA |
|
||
|
18:00 |
Franck Cappello & Marc snir |
INRIA &UIUC & ANL |
|
Preparation of the working groups |
|
||
|
|
|
|
|
|
|
||
|
19:00 |
Banquet |
|
|
|
|
||
|
|
|
|
|
|
|
||
Workshop Day 3 |
Wednesday June 29th |
|
|
|
|
|
||
|
|
|
|
|
|
|
||
|
8:30 |
Franck Cappello & Marc snir |
INRIA &UIUC & ANL |
|
Indications for working groups |
|
||
Working groups |
9:00- 10:30 |
Bill Gropp |
|
|
Numerical libraries 3 groups (Laura Grigori, Yves Robert, Sebastien Lefourestier + Paul Hovland + Wen-Mei Hwu, ...) |
|
||
|
9:00 - 10:30 |
Marc Snir |
|
|
I/O (Bill Kramer + Gabriel Antoniu + Matthieu Dorrier + Michele Buttler + Brett Bode + Rajeev Thakur |
|
||
|
10:30 |
Break |
|
|
|
|
||
|
11:00 - 12:30 |
Sanjay Kale |
|
|
Programming models 4 groups (Jean Francois Mehaut, Sebastien Fourestier, |
|
||
|
11:00 - 12:30 |
Franck Cappello |
|
|
Resilience 2 groups: resilient algorithms (Bill Gropp, George Bosilca, Yves Robert, Laura Grigori + ...) |
|
||
|
12:30 |
Adjourn |
|
|
|
|
||
|
13:00 |
Lunch |
|
|
|
|
|
|
|
14:30 - 18:00 |
|
|
|
Informal working groups |
|
||
|
19:00 |
Dinner at ... |
|
|
|
|
Abstracts
Ana Gainaru: Signal Analysis for Modeling the Normal and Faulty Behavior of Large-scale HPC Systems
This talk will present a novel way of characterizing the normal and faulty behavior of the system by using signal analysis concepts. All analysis modules create ELSA (Event Log Signal Analyzer), a toolkit that has the purpose of modeling the normal flow of each state event during a HPC system lifetime, and how it is affected when a failure hits the system. Current event mining approaches do not take into consideration the specific behavior of each type of events and as a consequence, fail to analyze them according to their characteristics. We will show that our models provide an accurate view of the system output, which improves the effectiveness of proactive fault tolerance algorithms. Specifically, we implemented a filtering algorithm and short-term fault prediction methodology based on the extracted model and test it against real failure traces from a large-scale system. We show that by analyzing each event according to its specific behavior, we get a more realistic overview of the entire system.
Thomas Ropars: On Distributed Recovery for Send-Deterministic-Aware MPI Applications
The send-deterministic execution model states that in any correct execution of an application, the processes send the same sequence of messages for a given set of input parameters. Many large scale MPI HPC applications comply with this model. Send-determinism allows to design new rollback-recovery protocols that: i) can rely on uncoordinated checkpointing without suffering from the domino effect; ii) can provide failure containment with a limited performance overhead. One major challenge remains: how to make recovery efficient and scalable ?
In this talk, we first give a brief overview of the principles and the performances of HydEE, our hybrid rollback-recovery protocol based on send-determinism. Then we discuss the problems related to performance on recovery, and we show how recovery could be made fully distributed in such a protocol if the application was able to express its send-determinism.
Marc Baboulin: A parallel tiled solver for dense symmetric indefinite systems on multicore architectures
We present an efficient and innovative parallel tiled algorithm for solving symmetric indefinite systems on multicore architectures. This solver avoids the communication overhead due to pivoting by using symmetric randomization. This randomization is computationally inexpensive and requires very little storage. Following randomization, a tiled LDLT factorization is used that reduces synchronization by using static or dynamic scheduling. We compare Gflop/s performance of our solver with other types of factorizations on a current multicore machine and we provide tests on accuracy using LAPACK test cases.