You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 27 Next »

This event is supported by INRIA, UIUC and NCSA, the French ministry of foreign affairs, as well as by EDF

Main Topics

Schedule

             Speaker

Affiliation

Type of presentation

Title (tentative)

Download

 

Sunday Nov. 20th

Dinner at ...

 

 

 

 

 

 

 

 

 

 

 

Workshop Day 1

Monday Nov. 21th

 

 

 

 

 

 

 

 

 

 

 ALL TITLES ARE TEMPORARY 

 

Registration

08:00

 

 

 

 

 

Welcome and Introduction

08:30

Marc Snir + Franck Cappello

INRIA&UIUC

Background

Welcome Workshop objectives and organization

 

 

08:40

Danny Powell

NCSA

Background

NCSA 5 year Strategy

 

 

08:50

Claude Kirchner / Thierry Priol / Jean Roman

INRIA

Background

Update on INRIA and HPC

 

Sustained Petascale
Chair: Marc Snir

09:00

Billl Kramer

NCSA

Background

Blue Waters

 

 

09:30

Bill Gropp

UIUC

Background

Application challenges for sustained Petascale

 

 

10:00

Break

 

 

 

 

 

11:30

Michele Buttler and Bill Kramer

NCSA

Background

Storage system issues for sustained petascale systems

 

 

11:00

Wen-Mei Hwu

UIUC

Background

Sustained petascale systems and Accelerators

 

From Petascale to Exascale
Chair: Franck Cappello

11:30

Marc Snir

ANL & UIUC

Background

Potential extension of the collaboration to ANL and BG/Q

 

 

12:00

Lunch

 

 

 

 

 

13:30

Rajeev Thakur

ANL

Background

MPI challenges for sustained Petaflops and Exacale

 

 

14:00

Robert Ross

ANL

Background

Key I/O challenges for Petascale and Beyond

 

 

14:30

Paul Hovland

ANL

Background

TBA

 

 

15:00

George Bosilca

UTK/ICL

Background

ICL Research on Resilience and Numerical Algorithms

 

 

15:30

Break

 

 

 

 

System software
Chair: Thierry Priol

16:00

Franck Cappello

INRIA&UIUC

Joint Results

Introduction of the activities in System + talk

 

 

16:30

Ana Gainaru

UIUC & NCSA

Joint Results

Signal Analysis for Modeling the Normal and Faulty Behavior of Large-scale HPC Systems

 

 

17:00

Thomas Ropars

EPFL

Joint Results

On Distributed Recovery for Send-Deterministic-Aware MPI Applications

 

 

17:30

Leonardo Bautista Gomez

Titech

Joint Results

Hierarchical groups for multilevel checkpoints and partial restart

 

 

 

 

 

 

 

 

 

 

Dinner at ...

 

 

 

 

 

 

 

 

 

 

 

Workshop Day 2

Tuesday Nov. 22th

 

 

 


 

 

 

 

 

 

 

 

System Software cont.
Chair: Torsten Hoefler

08:30

Olivier Gluck

INRIA

Joint Results

Energy consumption of fault tolerance

 

 

09:00

Gabriel Antoniu & Matthieu Dorrier

INRIA

Joint Results

Update on DAMARIS: Making CM1 scalling linarly up to 10,000 cores

 

Numerical Library
Chair: Jean Roman

09:30

Bill Gropp

UIUC

Joint Results

Introduction of the activity in Numerical Algorithms and Libraries + talk

 


10:00

Luc Giraud

INRIA

Joint Results

Fault tolerant Numerical Methods

 

 

10:30

Break

 

 

 

 

 

11:00

Laura Grigori

INRIA

Joint Early Results

Hybrid scheduling and communication avoiding for CALU

 

 

11:30

Sébastien Fourestier

INRIA

Joint Early Results

TBA

 

 

12:00

Yves Robert

INRIA

Background

Linear Algebra kernels on Petascale/exascale platforms: scheduling issues

 

 

12:30

Lunch



 

 

 





 

 

Numerical Lib. Cont.
Chair: Bill Gropp

14:00

Marc Baboulin

INRIA

Joint Early Results

A parallel tiled solver for dense symmetric indefinite systems on multicore architectures

 

 

14:30

Daisuke Takahashi & Alex Yee

U. Tsukuba

Joint Results

Early results on 1 All2all 3D FFT

 

Programming environments
Chair: Rajeev Thakur

15:00

Sanjay Kale

UIUC

Joint Early Results

Introduction of the activities in Programming Models + talk

 

 

15:30

Julien Bigot / Christian Perez

INRIA

Joint Early Results

TBA

 

 

16:00

Break

 

 

 

 

 

16h30

Alexandre Duchateau

UIUC

Joint Early Results

TBA

 

 

17:00

Jean François Mehaud

INRIA

Joint Early Results

TBA

 

 

17:30

Emmanuel Jeannot

INRIA

Joint Early Results

TBA

 

 

18:00

Franck Cappello & Marc snir

INRIA &UIUC & ANL

 

Preparation of the working groups

 

 

 

 

 

 

 

 

 

19:00

Banquet

 

 


 

 

 

 

 

 

 

 

Workshop Day 3

Wednesday June 29th

 

 

 

 

 

 

 

 

 

 

 

 


8:30

Franck Cappello & Marc snir

INRIA &UIUC & ANL

 

Indications for working groups

 

Working groups

9:00- 10:30

Bill Gropp

 

 

Numerical libraries 3 groups (Laura Grigori, Yves Robert, Sebastien Lefourestier + Paul Hovland + Wen-Mei Hwu, ...)

 

 

9:00 - 10:30

Marc Snir

 

 

I/O (Bill Kramer + Gabriel Antoniu + Matthieu Dorrier + Michele Buttler + Brett Bode + Rajeev Thakur
+ Rob Ross + Pavan Balaji + ...)

 

 

10:30

Break

 

 

 

 

 

11:00 - 12:30

Sanjay Kale

 

 

Programming models  4 groups (Jean Francois Mehaut, Sebastien Fourestier,
Chrsitian Perez, Emmanuel Jeannot, Pavan Balaji + Wen-Mei Hwu ...)

 


11:00 - 12:30

Franck Cappello

 

 

Resilience 2 groups: resilient algorithms (Bill Gropp, George Bosilca, Yves Robert, Laura Grigori + ...)
and resilient systems (Bill Kramer, Marc Snir, George Bosilca, Ana Gainaru, Leonardo Bautista,
Yves Robert + Rajeev Thakur + ...)

 

 

12:30

Adjourn

 

 

 

 

 

13:00

Lunch


 

 

 

 

 

 

14:30 - 18:00


 

 

Informal working groups

 

 

19:00

Dinner at ...

 

 

 

 

Abstracts

Ana Gainaru: Signal Analysis for Modeling the Normal and Faulty Behavior of Large-scale HPC Systems

This talk will present a novel way of characterizing the normal and faulty behavior of the system by using signal analysis concepts. All analysis modules create ELSA (Event Log Signal Analyzer), a toolkit that has the purpose of modeling the normal flow of each state event during a HPC system lifetime, and how it is affected when a failure hits the system. Current event mining approaches do not take into consideration the specific behavior of each type of events and as a consequence, fail to analyze them according to their characteristics. We will show that our models provide an accurate view of the system output, which improves the effectiveness of proactive fault tolerance algorithms. Specifically, we implemented a filtering algorithm and short-term fault prediction methodology based on the extracted model and test it against real failure traces from a large-scale system. We show that by analyzing each event according to its specific behavior, we get a more realistic overview of the entire system.

Thomas Ropars: On Distributed Recovery for Send-Deterministic-Aware MPI Applications

The send-deterministic execution model states that in any correct execution of an application, the processes send the same sequence of messages for a given set of input parameters. Many large scale MPI HPC applications comply with this model. Send-determinism allows to design new rollback-recovery protocols that: i) can rely on uncoordinated checkpointing without suffering from the domino effect; ii) can provide failure containment with a limited performance overhead. One major challenge remains: how to make recovery efficient and scalable ?
In this talk, we first give a brief overview of the principles and the performances of HydEE, our hybrid rollback-recovery protocol based on send-determinism. Then we discuss the problems related to performance on recovery, and we show how recovery could be made fully distributed in such a protocol if the application was able to express its send-determinism.

Marc Baboulin: A parallel tiled solver for dense symmetric indefinite systems on multicore architectures

We present an efficient and innovative parallel tiled algorithm for solving symmetric indefinite systems on multicore architectures. This solver avoids the communication overhead due to pivoting by using symmetric randomization. This randomization is computationally inexpensive and requires very little storage. Following randomization, a tiled LDLT factorization is used that reduces synchronization by using static or dynamic scheduling. We compare Gflop/s performance of our solver with other types of factorizations on a current multicore machine and we provide tests on accuracy using LAPACK test cases.

  • No labels