Main Topics | Schedule | Speakers | Types of presentation | Titles (tentative) |
|
|
|
|
|
Workshop Day 1 (Auditorium) | Monday Nov. 22cd |
|
| |
Welcome and Introduction | 08:30 | Franck Cappello, INRIA & UIUC, France and Thom dunning, NCSA, USA | Background | Workshop details |
Post PetaScale and Exascale Systems | 08:45 | Mitsuhisa Sato, U. Tsukuba, Japan | Trends in HPC | Next Gen and Exascale initiative in Japan |
| 09:15 | Marc Snir, UIUC, USA | Trends in HPC | |
| 09:45 | Wen Mei Wu, UIUC, USA | Trends in HPC | Exascale and Accelerators |
| 10:15 | Arun Rodrigues, Sandia, USA | Trends in HPC | X-Caliber (DARPA UHPC) |
| 10:45 | Break |
|
|
Post Petascale Applications and System Software | 11:15 | Pete Beckman, ANL, USA | Trends in HPC | Exascale Sofware Center |
| 11:45 | Michael Norman, SDSC, USA | Trends in HPC | ENZO |
| 12:15 | Eric Bohm, UIUC, USA | Trends in HPC | NAMD |
| 12:30 | Lunch |
|
|
|
|
|
|
|
|
|
|
|
|
BLUE WATERS | 14:00 | Bill Kramer, NCSA, USA | Overview | Update on Blue Waters |
Collaborations on System Software | 14:30 | Ana Gainaru, NCSA, USA | Early Results | A Framework for System Event Analysis |
| 15:00 | Thomas Ropars, INRIA, France | Results | Uncoordinated checkpointing without domino effect for send-deterministic applications |
| 15:30 | Esteban Menese, UIUC, USA | Early Results | Clustering Message Passing Applications to Enhance Fault Tolerance Protocols |
| 16:00 | Break |
|
|
Collaborations on System Software | 16:30 | Leonardo Bautista, Titech, Japan | Results/International collaboration with Japan | Transparent low-overhead checkpoint for GPU-accelerated clusters |
| 17:00 | Gabriel Antoniu, INRIA/IRISA, France | Results | Concurrency-optimized I/O for visualizing HPC simulations: An Approach Using Dedicated I/O cores |
| 17:30 | Mathias Jacquelin, INRIA/ENS Lyon | Results | Vertical vs Horizontal parity for tape archives |
| 18:00 | Olivier Richard, INRIA/U. Grenoble, France | Early Results | I/O aware Resource Management Software |
| 18:30 | Torsten Hoefler, NCSA, USA | Potential collaboration | TBA |
|
|
|
|
|
Workshop Day 2 (Auditorium) | Tuesday Nov. 23rd |
|
|
|
|
|
|
|
|
Collaborations on System Software | 08:30 | Frederic Viven, INRIA/ENS Lyon, France | Potential collaboration | |
Collaborations on Programming models | 09:00 | Thierry Gautier | Early Results | TBA |
| 09:30 | Jean François Méhaut, INRIA/U. Grenoble, France | Early Results TBA | Charm++ on NUMA Platforms: the impact of SMP Optimizations and a NUMA-aware Load Balancing |
| 10:00 | Emmanuel Jeannot, INRIA/U. Bordeaux, France | Early Results | TBA |
| 10:30 | Break |
|
|
| 11:00 | Raymon Namyst, INRIA/U. Bordeaux, France | Early Results | TBA |
| 11:30 | Brian Amedo, INRIA/U. Nice, France | Potential collaboration | TBA |
| 12:00 | Christian Perez, INRIA/ENS Lyon, France | Early Results | |
| 12:30 | Lunch |
|
|
Collaborations on Numerical Algorithms and Libraries | 14:00 | Bill Gropp, UIUC, USA | Early Results | TBA |
| 14:30 | Simplice Donfac, INRIA/U. Paris Sud, France | Early Results | TBA |
| 15:00 | Desiré Nuentsa, INRIA/IRISA, France | Early Results | Parallel Implementation of deflated GMRES in the PETSc package |
| 15:30 | Sebastien Fourestier, INRIA/U. Bordeaux, France | Early Results | TBA |
| 16:00 | Break |
|
|
| 16:30 | Marc Baboulin, INRIA, U. Paris Sud, France | Early Results | Accelerating linear algebra computations with hybrid GPU-multicore systems |
| 17:00 | Daisuke Takahashi, U. Tsukuba, Japan | Results/International collaboration with Japan | |
| 17:30 | Alex Yee, UIUC, USA | Early Results | A Single-Transpose implementation of the Distributed out-of-order 3D-FFT |
| 17:50 | Jeongnim Kim, NCSA, USA | Early Results | |
|
|
|
|
|
|
|
|
|
|
Workshop Day 3 (Auditorium) | Wednesday Nov 24th |
|
|
|
|
|
|
|
|
Break out sessions introduction | 8:30 | Cappello, Snir | Overview | Objectives of Break-out, expected results |
Topics |
| Participants | Other NCSA participants |
|
Break out session 1 | 9:00-10:30 |
|
|
|
Routing, topology mapping, scheduling, perf. modeling |
| Snir, Hoefler, Vivien, Jeannot, Kale |
| Room |
3D-FFT |
| Cappello, Takahashi, Yee, Jeongnim |
| Room |
Libraries |
| Gropp, Baboulin, Désiré, Simplice, Sébastien, Fourestier |
| Room |
|
|
|
|
|
| 10:15 | Break |
|
|
Break out session 2 | 10:30-12:00 |
|
|
|
Resilience |
| Kramer, Cappello, Gainaru, Ropars, Menese, Beautista, |
| Room |
Programing models / GPU |
| Kale, Méhaut, Namyst, Wu, Amedo, Perez, Hoefler, Jeannot |
| Room |
I/O |
| Snir, Viven, Jaquelin, Antoniu, Richard |
|
|
Break out session report | 12:00 | Speakers: Snir, Cappello, Gropp, Kramer, Kale |
| Auditorium |
Closing | 12:30 | Cappello, Snir |
| Auditorium |
| 13:00 | Lunch |
|
|
...
Checkpointing is one of the tools used to provide resilience to applications run on failure-prone platforms. It is usually claimed that checkpoints should occur periodically, as such a policy is optimal. However, most of the existing proofs rely on approximations. One such assumption is that the probability that a fault occurs during the execution of an application is very small, an assumption that is no longer valid in the context of exascale platforms. We have begun studying this problem in a fully general context. We have established that, when failures follow a Poisson law, the periodic checkpointing policy is optimal. We have also showed an unexpected result: in some cases, when the platform is sufficiently large, the checkpointing costs sufficiently expensive, or the failures frequent enough, one should limit the application parallelism and duplicate tasks, rather than fully parallelize the application on the whole platform.
Anchor | ||||
---|---|---|---|---|
|
Jean-François Mehaut INRIA/U. Grenoble
Charm++ on NUMA Platforms: the impact of SMP Optimizations and a
NUMA-aware Load Balancing
Abstract: Cache-coherent Non-Uniform Memory Access (ccNUMA) platforms based on multi-core chips are now a common resource in High Performance Computing. To overcome scalability issues in such platforms, the shared memory is physically distributed among several memory banks. Its memory access costs may vary depending on the distance between processing units and data. The main challenge of a ccNUMA platform is to manage efficiently threads, data distribution and communication over all the machine nodes. Charm++ is a parallel programming system that provides a portable programming model for platforms based on shared and distributed memory. In this work, we revisit some of the implementation decisions currently featured on Charm++ on the context of ccNUMA platforms. First, we studied the impact of the new -- shared-memory based -- inter-object communication scheme utilized by Charm+. We show how this shared-memory approach can impact the performance of Charm+ on ccNUMA machines. Second, we conduct a performance evaluation of the CPU and memory affinity mechanisms provided by Charm++ on ccNUMA platforms. Results show that SMP optimizations and affinity support can improve the overall performance of our benchmarks in up to 75%. Finally, in light of these studies, we have designed and implemented a NUMA-aware load balancing algorithm that addresses the issues found. The performance evaluation of our prototype showed results as good as the ones obtained by GreedyLB and significant improvements when compared to GreedyCommLB.
Anchor | ||||
---|---|---|---|---|
|
Christian Perez INRIA/ENS Lyon
...