Main Topics | Schedule | Speakers | Types of presentation | Titles (tentative) |
|
|
|
|
|
Workshop Day 1 (Auditorium) | Monday Nov. 22cd |
|
| |
Welcome and Introduction | 08:30 | Franck Cappello, INRIA & UIUC, France and Thom dunning, NCSA, USA | Background | Workshop details |
Post PetaScale and Exascale Systems | 08:45 | Mitsuhisa Sato, U. Tsukuba, Japan | Trends in HPC | Next Gen and Exascale initiative in Japan |
| 09:15 | Marc Snir, UIUC, USA | Trends in HPC | Exascale Challenges |
| 09:45 | Wen Mei Wu, UIUC, USA | Trends in HPC | Exascale and Accelerators |
| 10:15 | Arun Rodrigues, Sandia, USA | Trends in HPC | X-Caliber (DARPA UHPC) |
| 10:45 | Break |
|
|
Post Petascale Applications and System Software | 11:15 | Pete Beckman, ANL, USA | Trends in HPC | Exascale Sofware Center |
| 11:45 | Michael Norman, SDSC, USA | Trends in HPC | ENZO |
| 12:15 | Eric Bohm, UIUC, USA | Trends in HPC | NAMD |
| 12:30 | Lunch |
|
|
|
|
|
|
|
|
|
|
|
|
BLUE WATERS | 14:00 | Bill Kramer, NCSA, USA | Overview | Update on Blue Waters |
Collaborations on System Software | 14:30 | Ana Gainaru, NCSA, USA | Early Results | A Framework for System Event Analysis |
| 15:00 | Thomas Ropars, INRIA, France | Results | Uncoordinated checkpointing without domino effect for send-deterministic applications |
| 15:30 | Esteban Menese, UIUC, USA | Results/International collaboration with China | Clustering for Performance and Fault tolerance |
| 16:00 | Break |
|
|
Collaborations on System Software | 16:30 | Leonardo Bautista, Titech, Japan | Results/International collaboration with Japan | Transparent low-overhead checkpoint for GPU-accelerated clusters |
| 17:00 | Gabriel Antoniu, INRIA/IRISA, France | Results | Concurrency-optimized I/O for visualizing HPC simulations: An Approach Using Dedicated I/O cores |
| 17:30 | Mathias Jacquelin, INRIA/ENS Lyon | Results | Vertical vs Horizontal parity for tape archives |
| 18:00 | Olivier Richard, INRIA/U. Grenoble, France | Early Results | I/O aware Resource Management Software |
| 18:30 | Torsten Hoefler, NCSA, USA | Potential collaboration | TBA |
|
|
|
|
|
Workshop Day 2 (Auditorium) | Tuesday Nov. 23rd |
|
|
|
|
|
|
|
|
Collaborations on System Software | 08:30 | Frederic Viven, INRIA/ENS Lyon, France | Potential collaboration | |
Collaborations on Programming models | 09:00 | Thierry Gautier | Early Results | TBA |
| 09:30 | Jean François Méhaut, INRIA/U. Grenoble, France | Early Results | TBA |
| 10:00 | Emmanuel Jeannot, INRIA/U. Bordeaux, France | Early Results | TBA |
| 10:30 | Break |
|
|
| 11:00 | Raymon Namyst, INRIA/U. Bordeaux, France | Early Results | TBA |
| 11:30 | Brian Amedo, INRIA/U. Nice, France | Potential collaboration | TBA |
| 12:00 | Christian Perez, INRIA/ENS Lyon, France | Early Results TBA | |
| 12:30 | Lunch |
|
|
Collaborations on Numerical Algorithms and Libraries | 14:00 | Bill Gropp, UIUC, USA | Early Results | TBA |
| 14:30 | Simplice Donfac, INRIA/U. Paris Sud, France | Early Results | TBA |
| 15:00 | Desiré Nuentsa, INRIA/IRISA, France | Early Results | TBA |
| 15:30 | Sebastien Fourestier, INRIA/U. Bordeaux, France | Early Results | TBA |
| 16:00 | Break |
|
|
| 16:30 | Marc Baboulin, INRIA, U. Paris Sud, France | Early Results | Accelerating linear algebra computations with hybrid GPU-multicore systems |
| 17:00 | Daisuke Takahashi, U. Tsukuba, Japan | Results/International collaboration with Japan | |
| 17:30 | Alex Yee, UIUC, USA | Early Results | A Single-Transpose implementation of the Distributed out-of-order 3D-FFT |
| 17:50 | Jeongnim Kim, NCSA, USA | Early Results | |
|
|
|
|
|
|
|
|
|
|
Workshop Day 3 (Auditorium) | Wednesday Nov 24th |
|
|
|
|
|
|
|
|
Break out sessions introduction | 8:30 | Cappello, Snir | Overview | Objectives of Break-out, expected results |
Topics |
| Participants | Other NCSA participants |
|
Break out session 1 | 9:00-10:30 |
|
|
|
Routing, topology mapping, scheduling, perf. modeling |
| Snir, Hoefler, Vivien, Jeannot, Kale |
| Room |
3D-FFT |
| Cappello, Takahashi, Yee, Jeongnim |
| Room |
Libraries |
| Gropp, Baboulin, Désiré, Simplice, Sébastien, Fourestier |
| Room |
|
|
|
|
|
| 10:15 | Break |
|
|
Break out session 2 | 10:30-12:00 |
|
|
|
Resilience |
| Kramer, Cappello, Gainaru, Ropars, Menese, Beautista, |
| Room |
Programing models / GPU |
| Kale, Méhaut, Namyst, Wu, Amedo, Perez, Hoefler, Jeannot |
| Room |
I/O |
| Snir, Viven, Jaquelin, Antoniu, Richard |
|
|
Break out session report | 12:00 | Speakers: Snir, Cappello, Gropp, Kramer, Kale |
| Auditorium |
Closing | 12:30 | Cappello, Snir |
| Auditorium |
| 13:00 | Lunch |
|
|
...
Checkpointing is one of the tools used to provide resilience to applications run on failure-prone platforms. It is usually claimed that checkpoints should occur periodically, as such a policy is optimal. However, most of the existing proofs rely on approximations. One such assumption is that the probability that a fault occurs during the execution of an application is very small, an assumption that is no longer valid in the context of exascale platforms. We have begun studying this problem in a fully general context. We have established that, when failures follow a Poisson law, the periodic checkpointing policy is optimal. We have also showed an unexpected result: in some cases, when the platform is sufficiently large, the checkpointing costs sufficiently expensive, or the failures frequent enough, one should limit the application parallelism and duplicate tasks, rather than fully parallelize the application on the whole platform.
Anchor | ||||
---|---|---|---|---|
|
Christian Perez INRIA/ENS Lyon
High Performance Component with Charm++ and OpenAtom
Software component models appear as a solution to handle the complexity and the evolution of applications. It turns out to be a powerful abstraction mechanism for dealing with parallel and heterogeneous machines as it enable the structure of an application to be manipulated, and hence specialized. HLCM is a hierarchical component model with support for genericity & connector that enables to adapt an application to the resources as well as to input parameters. HLCM is an abstract model as it does not depend on on a particular primitive component implementation. This talk will present our ongoing work on defining and implementing HLCM/Charm+, a specialization of HLCM with primitive component expressed in Charm. It will also provide information on a study on the benefits HLCM/Charm+ can bring to OpenAtom.
Anchor | ||||
---|---|---|---|---|
|
...