Main Topics | Schedule | Speakers | Types of presentation | Titles (tentative) |
|
|
|
|
|
Workshop Day 1 (Auditorium) | Monday Nov. 22cd |
|
| |
Welcome and Introduction | 08:30 | Franck Cappello, INRIA & UIUC, France and Thom dunning, NCSA, USA | Background | Workshop details |
Post PetaScale and Exascale Systems | 08:45 | Mitsuhisa Sato, U. Tsukuba, Japan | Trends in HPC | Next Gen and Exascale initiative in Japan |
| 09:15 | Marc Snir, UIUC, USA | Trends in HPC | Exascale Challenges |
| 09:45 | Wen Mei Wu, UIUC, USA | Trends in HPC | Exascale and Accelerators |
| 10:15 | Arun Rodrigues, Sandia, USA | Trends in HPC | X-Caliber (DARPA UHPC) |
| 10:45 | Break |
|
|
Post Petascale Applications and System Software | 11:15 | Pete Beckman, ANL, USA | Trends in HPC | Exascale Sofware Center |
| 11:45 | Michael Norman, SDSC, USA | Trends in HPC | ENZO |
| 12:15 | Eric Bohm, UIUC, USA | Trends in HPC | NAMD |
| 12:30 | Lunch |
|
|
|
|
|
|
|
|
|
|
|
|
BLUE WATERS | 14:00 | Bill Kramer, NCSA, USA | Overview | Update on Blue Waters |
Collaborations on System Software | 14:30 | Ana Gainaru, NCSA, USA | Early Results | A Framework for System Event Analysis |
| 15:00 | Thomas Ropars, INRIA, France | Results | Uncoordinated checkpointing without domino effect for send-deterministic applications |
| 15:30 | Esteban Menese, UIUC, USA | Results/International collaboration with China | Clustering for Performance and Fault tolerance |
| 16:00 | Break |
|
|
Collaborations on System Software | 16:30 | Leonardo Bautista, Titech, Japan | Results/International collaboration with Japan | Transparent low-overhead checkpoint for GPU-accelerated clusters |
| 17:00 | Gabriel Antoniu, INRIA/IRISA, France | Results | Concurrency-optimized I/O for visualizing HPC simulations: An Approach Using Dedicated I/O cores |
| 17:30 | Mathias Jacquelin, INRIA/ENS Lyon | Results | Vertical vs Horizontal parity for tape archives |
| 18:00 | Olivier Richard, INRIA/U. Grenoble, France | Early Results | I/O aware Resource Management Software |
| 18:30 | Torsten Hoefler, NCSA, USA | Potential collaboration | TBA |
|
|
|
|
|
Workshop Day 2 (Auditorium) | Tuesday Nov. 23rd |
|
|
|
|
|
|
|
|
Collaborations on System Software | 08:30 | Frederic Viven, INRIA/ENS Lyon, France | Potential collaboration | |
Collaborations on Programming models | 09:00 | Thierry Gautier | Early Results | TBA |
| 09:30 | Jean François Méhaut, INRIA/U. Grenoble, France | Early Results | TBA |
| 10:00 | Emmanuel Jeannot, INRIA/U. Bordeaux, France | Early Results | TBA |
| 10:30 | Break |
|
|
| 11:00 | Raymon Namyst, INRIA/U. Bordeaux, France | Early Results | TBA |
| 11:30 | Brian Amedo, INRIA/U. Nice, France | Potential collaboration | TBA |
| 12:00 | Christian Perez, INRIA/ENS Lyon, France | Early Results | TBA |
| 12:30 | Lunch |
|
|
Collaborations on Numerical Algorithms and Libraries | 14:00 | Bill Gropp, UIUC, USA | Early Results | TBA |
| 14:30 | Simplice Donfac, INRIA/U. Paris Sud, France | Early Results | TBA |
| 15:00 | Desiré Nuentsa, INRIA/IRISA, France | Early Results | TBA |
| 15:30 | Sebastien Fourestier, INRIA/U. Bordeaux, France | Early Results | TBA |
| 16:00 | Break |
|
|
| 16:30 | Marc Baboulin, INRIA, U. Paris Sud, France | Early Results | Accelerating linear algebra computations with hybrid GPU-multicore systems |
| 17:00 | Daisuke Takahashi, U. Tsukuba, Japan | Results/International collaboration with Japan | |
| 17:30 | Alex Yee, UIUC, USA | Early Results 3D | A Single-Transpose implementation of the Distributed out-of-order 3D-FFT FFTs as Big 1D FFTs |
| 17:50 | Jeongnim Kim, NCSA, USA | Early Results | |
|
|
|
|
|
|
|
|
|
|
Workshop Day 3 (Auditorium) | Wednesday Nov 24th |
|
|
|
|
|
|
|
|
Break out sessions introduction | 8:30 | Cappello, Snir | Overview | Objectives of Break-out, expected results |
Topics |
| Participants | Other NCSA participants |
|
Break out session 1 | 9:00-10:30 |
|
|
|
Routing, topology mapping, scheduling, perf. modeling |
| Snir, Hoefler, Vivien, Jeannot, Kale |
| Room |
3D-FFT |
| Cappello, Takahashi, Yee, Jeongnim |
| Room |
Libraries |
| Gropp, Baboulin, Désiré, Simplice, Sébastien, Fourestier |
| Room |
|
|
|
|
|
| 10:15 | Break |
|
|
Break out session 2 | 10:30-12:00 |
|
|
|
Resilience |
| Kramer, Cappello, Gainaru, Ropars, Menese, Beautista, |
| Room |
Programing models / GPU |
| Kale, Méhaut, Namyst, Wu, Amedo, Perez, Hoefler, Jeannot |
| Room |
I/O |
| Snir, Viven, Jaquelin, Antoniu, Richard |
|
|
Break out session report | 12:00 | Speakers: Snir, Cappello, Gropp, Kramer, Kale |
| Auditorium |
Closing | 12:30 | Cappello, Snir |
| Auditorium |
| 13:00 | Lunch |
|
|
...
We describe how hybrid multicore+GPU systems can be used to enhance performance of linear algebra libraries in high performance computing.
We illustrate this approach with the solution of general linear systems based on a hybrid LU factorization where we split the computation over a multicore and a graphic processor, and use particular statistical techniques to reduce the amount of pivoting and communication between the hybrid components. We also show how mixed precision algorithms can be used for accelerating performance.
Anchor |
---|
Kim_A |
Jeongnim Kim, NCSA, UIUC
Toward petaflop 3D FFT on clusters of SMP
A wide range of scientific applications employs 3D FFT. Sustained petaflop performance of 3D FFT is necessary to meet the NSF Direct Numerical Simulation (DNS) turbulence benchmark on the Blue Waters which represents the current generation of HPC platforms, clusters of multi/many-core SMPs. I present the analysis of 3D FFT implementations and the optimization strategies on the BW. Also discussed is the design of parallel 3D FFT library that can meet the diverse requirements of applications using 3D FFT.
Anchor | |||
---|---|---|---|
|
Daisuke Takahashi, U. Tsukuba
...
Optimization of a Parallel 3-D FFT with 2-D Decomposition
In this talk, an optimization method for parallel 3-D fast Fourier transform (FFT) with 2-D decomposition is presented.The 2-D decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes. The another way to reduce the communication overhead is to overlap communication and computation. An overlapping method for the parallel 3-D FFT is also presented. Performance results of parallel 3-D FFTs on clusters of multi-core processors are reported.
Anchor | ||||
---|---|---|---|---|
|
Alex Yee, UIUC
A Single-Transpose implementation of the Distributed out-of-order 3D-FFT
The classic approach to computing the distributed in-order 3D-FFT requires up to 3 expensive all-to-all communication transpose steps. Given the memory-bound nature of the FFT, these transposes are dominant factors in the total run-time. Here we present a new approach that reduces the number of transposes to 2 for the in-order transform, and 1 for the out-of-order transform.
Anchor | ||||
---|---|---|---|---|
|
Jeongnim Kim, NCSA, UIUC
Toward petaflop 3D FFT on clusters of SMP
A wide range of scientific applications employs 3D FFT. Sustained petaflop performance of 3D FFT is necessary to meet the NSF Direct Numerical Simulation (DNS) turbulence benchmark on the Blue Waters which represents the current generation of HPC platforms, clusters of multi/many-core SMPs. I present the analysis of 3D FFT implementations and the optimization strategies on the BW. Also discussed is the design of parallel 3D FFT library that can meet the diverse requirements of applications using 3D FFT.