Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Main Topics

Schedule

            Speaker

Affiliation

Type of presentation

Title (tentative)

Download

 

 

 

 

 

 

 

Dinner Before the Workshop

7:00 PM

Only people registered for the dinner

 

 

 

 

 

 

 

 

 

 

 

Workshop Day 1

Monday Nov. 25th

 

 

 

 

 

 

 

 

 

 

TITLES ARE TEMPORARY (except if in bold font)

 

Registration

08:00

 

 

 

 

 

Welcome and Introduction

Amphitheatre

Chair: Franck

08:30

Marc Snir + Franck Cappello

INRIA&UIUC&ANL

Background

Welcome, Workshop objectives and organization

 

 

08:45

Peter Schiffer

UIUC

Background

Welcome from UIUC Vice Chancellor for Research

 

 

09:00

Ed. Siedel

UIUC

Background

NCSA update and vision of the collaboration

 

 

09:15

Michel Cosnard

Inria

Background

INRIA updates and vision of the collaboration

 


9:30

Marc Snir

ANL

Background

Argonne updates and vision of the collaboration

 

 

9h45

Franck Cappello

ANL

Background

Joint-Lab, New Joint-Lab, PUF articulation

 

 

10:15

Break

 

 

 

 

Extreme Scale Systems and infrastructures

Amphitheatre

Chair: Marc Snir

10:45

Pete Beckman

ANL

 

Extreme Scale Computing & Co-design Challenges

 

 

11:15

John Towns

UIUC

 

Plenary talk

 
 11:45Gabriel AntoniuINRIA  Plenary talk 

 

12:15

Lunch

 

 

Plenary talk

 


13:45

Bill Kramer

UIUC

Blue Waters

BW Observations and new challenges

 


14:15

Marc Snir

UIUC

 

G8 ECS and international collaboration toward extreme scale climate simulation

 

 

14:45

Rob Ross

ANL

 

Thinking Past POSIX: Persistent Storage in Extreme Scale Systems

 
 15:15François PellegriniINRIA Plenary talk 
 15:45Break    

 

16:15

Yves Robert

INRIA

 

Assessing the impact of ABFT & Checkpoint composite strategies

 
 16:15Pavan BalagiANL Conflict 
 16:45Wen Mei HwuUIUC 

Plenary talk

 
 17:15Adjourn    

 

18:45

Bus for Diner

 

 

 

 

 

 

 

 

 

 

 

Workshop Day 2


Tuesday Nov. 26

 

 

 

 

 

Applications, I/O, Visualization, Big data

Amphitheatre

Chair: Rob Ross

08:30

Greg BauerUIUC  Applications and their challenges on Blue Waters

 

 

09:00

Matthieu Dorier

INRIA

Joint-result, submitted

CALCioM: Mitigating I/O Interferences in HPC Systems through Cross-Application Coordination

 
 

09:30

Dries Kempe

ANL

 

Plenary talk

 

 

10:00

Venkat Vishwanath

ANL

 

Plenary talk

 

 

10:30

Break

 

 

 

 

 

11:00

Babak Behzad

UIUC

ACM/IEEE SC13

Taming Parallel I/O Complexity with Auto-Tuning

 

 

11:30

McHenry, Kenton Guadron

UIUC

 

NSF CIF21 DIBBs: Brown Dog

 

 

12:00

Lunch

 

 


 

 

 

 

 

 

 

 

Mini Workshop1

Resilience

Room 1030

Chair: Yves Robert

 

 

 

 

 

 

 

13:30

Leonardo

ANL

Joint-result


 

 

14:00

Tatiana

INRIA

Joint-result


 

 

14:30

Mohamed Slim Bouguera

INRIA

Joint-result, submitted


 

 

15:00

Ana Gainaru

UIUC

Joint-result, submitted


 

 

15:30

Break

 

 

 

 

 

16:00

Sheng Di

INRIA

Joint-result, submitted


 

 

16:30

Frederic Vivien

INRIA

 


 

 

17h00

Weslay Bland

ANL

 

Fault Tolerant Runtime Research at ANL

 

 

17H30

Adjourn

 

 

 

 

 

19:00

Bus for Diner

 

 

 

 

       

Mini Workshop2

Numerical Agorithms

Room 1040

Chair: Bill Gropp

 

 

 

 

 

 

 

13:30

Luke Olson

UIUC

 

  
 14:00 Prasanna BalaprakashANL  Active-Learning-based Surrogate Models for Empirical Performance Tuning 

 

14:30

Yushan Wang

INRIA

 

Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems.

 

 

15:00

Jed Brown

ANL

 

 

 

 

15:30

Break

 

 

 

 

 

16:00

Pierre Jolivet

INRIA

Best Paper nomiee, IEEE, ACM SC13


 
 16:30Vincent BaudouiTotal&ANL   
 17:00TBD  TBD 

 

17:30

Adjourn

 

 

 

 

       

 

19:00

Bus for diner

 

 

 

 

 

 

 

 

 

 

 

Workshop Day 3


Wednesday Nov. 27

 

 

 

 

 

 

 

 

 

 

 

 

Mini Workshop3


 

 

 

 

 

 

 Programming models, compilation and runtime.

Room 1030

Chair: Marc Snir

08:30

Grigori Fursin

INRIA

 

 

 

 

09:00

Maria Garzaran

UIUC

 


 


09:30

Jean-François Mehaut

INRIA

 


 
 10:00Break    

 

10:30

Pavan Balaji

ANL

 

Can only talk on Monday

 

 

11:00

Rafael Tesser

INRIA

Joint result PDP 2013


 

 

11:30

Emmanuel Jeannot

INRIA

Joint-result, IEEE Cluster2013

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch

 

 

12:00

Closing

 

 

 

 

 

12:30

Lunch

 

 

 

 

       

 

18:00

Bus for diner

 

 

 

 

Mini Workshop4

Large scale systems and their simulators

Room 1040

Chair: Bill Kramer

 

 

 

 

 

 


08:30

Sanjay Kale

 

 


 

 

09:00

Arnault Legrand

 

 

SMPI: Toward Better Simulation of MPI Applications

 


09:30

Kate Kahey

 

 


 

 

10:00

Break

 

 

 

 


10:30

Gille Fedak

 

 


 

 

11:00

Jeremy Henos

 

 


 

 

11:30

TBD

 

 


 

 

12:00

Closing

 

 

 

 

 

12:30

Lunch

 

 

 

 

       
 18:00Bus for diner    

...

Failure prediction: what to do with unpredicted failures ?

 

 

As large parallel systems increase in size and complexity, failures are inevitable and exhibit complex space and time dynamics. Several key results have demonstrated that recent advances in event log analysis can provide precise failure prediction. The state of the art in failure prediction provides a ratio of correctly identified failures to the number of all predicted failures of over 90\% and  able to discover around 50\% of all failures in a system. However, large parts of failures are not predicted and are considered as false negative alerts. Therefore, developing  efficient fault tolerance strategies to tolerate failures requires a good  perception and understanding of failure prediction  characteristics.  To understand the properties of  false negative alerts, we conducted a statistical analysis of the probability distribution of such alerts and their impact on fault tolerance techniques. Specifically  we studied  failures logs from different HPC production systems. We show that (i)  the false negative distribution has the same nature as the failure distribution (ii) After adding failure prediction, we were able to infer statistical models that describe the inter-arrival time between false negative alerts and hence current fault tolerance can be applied to these systems. Moreover, we show that  the current failures traces have a high correlation between the failure inter-arrival time that can be used to improve the failure prediction mechanism.  Another important result is that checkpoint intervals for unpredicted failures can be computed from the existing high-order Daly's formula. We show how we can apply the proposed statistical-model to combine proactive migration and preventive checkpoints. Trace based simulations show that the proposed combination leads to an improvement of the execution useful work by more than 13\% with only 45\% of recall.