Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Main Topics

Schedule

Speakers

Types of presentation

Titles (tentative)

 

 

 

 

 

Workshop Day 1 (Auditorium)

Monday Nov. 22cd

 


 

Welcome and Introduction

08:30

Franck Cappello, INRIA & UIUC, France and Thom dunning, NCSA, USA

Background

Workshop details

Post PetaScale and Exascale Systems 

08:45

Mitsuhisa Sato, U. Tsukuba, Japan

Trends in HPC

Next Gen and Exascale initiative in Japan

 

09:15

Marc Snir, UIUC, USA

Trends in HPC

Exascale Challenges

 

09:45

Wen Mei Wu, UIUC, USA

Trends in HPC

Exascale and Accelerators

 

10:15

Arun Rodrigues, Sandia, USA

Trends in HPC

X-Caliber (DARPA UHPC)

 

10:45

Break

 

 

Post Petascale Applications  and System Software

11:15

Pete Beckman, ANL, USA

Trends in HPC

Exascale Sofware Center

 

11:45

Michael Norman, SDSC, USA

Trends in HPC

ENZO

 

12:15

Eric Bohm, UIUC, USA

Trends in HPC

NAMD

 

12:30

Lunch

 

 

 

 

 

 

 

 

 

 

 

 

BLUE WATERS

14:00

Bill Kramer, NCSA, USA

Overview

Update on Blue Waters

Collaborations on System Software

14:30

Ana Gainaru, NCSA, USA

Early Results

A Framework for System Event Analysis

 

15:00

Thomas Ropars, INRIA, France

Results

Uncoordinated checkpointing without domino effect for send-deterministic applications

 

15:30

Esteban Menese, UIUC, USA

Results/International collaboration with China

Clustering for Performance and Fault tolerance

 

16:00

Break

 

 

Collaborations on System Software

16:30

Leonardo Bautista, Titech, Japan

Results/International collaboration with Japan

Transparent low-overhead checkpoint for GPU-accelerated clusters

 

17:00

Gabriel Antoniu, INRIA/IRISA, France

Results

Concurrency-optimized I/O for visualizing HPC simulations: An Approach Using Dedicated I/O cores

 

17:30

Mathias Jacquelin, INRIA/ENS Lyon

Results

Vertical vs Horizontal parity for tape archives

 

18:00

Olivier Richard, INRIA/U. Grenoble, France

Early Results

I/O aware Resource Management Software

 

18:30

Torsten Hoefler, NCSA, USA

Potential collaboration

TBA

 

 

 

 

 

Workshop Day 2 (Auditorium)

Tuesday Nov. 23rd

 

 

 

 

 

 

 

 

Collaborations on System Software

08:30

Frederic  Viven, INRIA/ENS Lyon, France

Potential collaboration

On Scheduling Checkpoints of Exascale Application

Collaborations on Programming models

09:00

Thierry Gautier

Early Results

TBA

 

09:30

Jean François Méhaut, INRIA/U. Grenoble, France

Early Results

TBA

 

10:00

Emmanuel Jeannot, INRIA/U. Bordeaux, France

Early Results

TBA

 

10:30

Break

 

 

 

11:00

Raymon Namyst, INRIA/U. Bordeaux, France

Early Results

TBA

 

11:30

Brian Amedo, INRIA/U. Nice, France

Potential collaboration

TBA

 

12:00

Christian Perez, INRIA/ENS Lyon, France

Early Results TBA

High Performance Component with Charm++ and OpenAtom

 

12:30

Lunch

 

 

Collaborations on Numerical Algorithms and Libraries

14:00

Bill Gropp, UIUC, USA

Early Results

TBA

 

14:30

Simplice Donfac, INRIA/U. Paris Sud, France

Early Results

TBA

 

15:00

Desiré Nuentsa, INRIA/IRISA, France

Early Results

TBA

 

15:30

Sebastien Fourestier, INRIA/U. Bordeaux, France

Early Results

TBA

 

16:00

Break

 

 

 

16:30

Marc Baboulin, INRIA, U. Paris Sud, France

Early Results

Accelerating linear algebra computations with hybrid GPU-multicore systems

 

17:00

Daisuke Takahashi, U. Tsukuba, Japan

Results/International collaboration with Japan

Optimization of a Parallel 3-D FFT with 2-D Decomposition

 

17:30

Alex Yee, UIUC, USA

Early Results

A Single-Transpose implementation of the Distributed out-of-order 3D-FFT

 

17:50

Jeongnim Kim, NCSA, USA

Early Results

Toward petaflop 3D FFT on clusters of SMP

 

 

 

 

 

 

 

 

 

 

Workshop Day 3 (Auditorium)

Wednesday Nov 24th

 

 

 

 

 

 

 

 

Break out sessions introduction

8:30

Cappello, Snir

Overview

Objectives of Break-out, expected results
Collaborations mechanisms (internship, visits, etc.)

Topics

 

Participants

Other NCSA participants

 

Break out session 1

9:00-10:30

 

 

 

Routing, topology mapping, scheduling, perf. modeling

 

Snir, Hoefler, Vivien, Jeannot, Kale

 

Room

3D-FFT

 

Cappello, Takahashi, Yee, Jeongnim

 

Room

Libraries

 

Gropp, Baboulin, Désiré, Simplice, Sébastien, Fourestier

 

Room

 

 

 

 

 

 

10:15

Break

 

 

Break out session 2

10:30-12:00

 

 

 

Resilience

 

Kramer, Cappello, Gainaru, Ropars, Menese, Beautista,

 

Room

Programing models / GPU

 

Kale, Méhaut, Namyst, Wu, Amedo, Perez, Hoefler, Jeannot

 

Room

I/O

 

Snir, Viven, Jaquelin, Antoniu, Richard

 

 

Break out session report

12:00

Speakers: Snir, Cappello, Gropp, Kramer, Kale

 

Auditorium

Closing

12:30

Cappello, Snir

 

Auditorium

 

13:00

Lunch

 

 

...

Checkpointing is one of the tools used to provide resilience to applications run on failure-prone platforms. It is usually claimed that checkpoints should occur periodically, as such a policy is optimal. However, most of the existing proofs rely on approximations. One such assumption is that the probability that a fault occurs during the execution of an application is very small, an assumption that is no longer valid in the context of exascale platforms. We have begun studying this problem in a fully general context. We have established that, when failures follow a Poisson law, the periodic checkpointing policy is optimal. We have also showed an unexpected result: in some cases, when the platform is sufficiently large, the checkpointing costs sufficiently expensive, or the failures frequent enough, one should limit the application parallelism and duplicate tasks, rather than fully parallelize the application on the whole platform.

Anchor
Perez_A
Perez_A

Christian Perez INRIA/ENS Lyon

High Performance Component with Charm++ and OpenAtom

Software component models appear as a solution to handle the complexity and the evolution of applications. It turns out to be a powerful abstraction mechanism for dealing with parallel and heterogeneous machines as it enable the structure of an application to be manipulated, and hence specialized. HLCM is a hierarchical component model with support for genericity & connector that enables to adapt an application to the resources as well as to input parameters. HLCM is an abstract model as it does not depend on on a particular primitive component implementation. This talk will present our ongoing work on defining and implementing HLCM/Charm+, a specialization of HLCM with primitive component expressed in Charm. It will also provide information on a study on the benefits HLCM/Charm+ can bring to OpenAtom.

Anchor
Baboulin_A
Baboulin_A

...