...
Main Topics | Schedule | Speaker | Affiliation | Type of presentation | Title (tentative) | Download | ||
| Sunday Nov. 20th | Dinner at ... |
|
|
|
| ||
|
|
|
|
|
|
| ||
Workshop Day 1 | Monday Nov. 21th |
|
|
|
|
| ||
|
|
|
|
| ALL TITLES ARE TEMPORARY |
| ||
Registration | 08:00 |
|
|
|
|
| ||
Welcome and Introduction | 08:30 | Marc Snir + Franck Cappello | INRIA&UIUC | Background | Welcome Workshop objectives and organization |
| ||
| 08:40 | Danny Powell | NCSA | Background | NCSA 5 year Strategy |
| ||
| 08:50 | Claude Kirchner / Thierry Priol / Jean Roman | INRIA | Background | Update on INRIA and HPC |
| ||
Sustained Petascale | 09:00 | Billl Kramer | NCSA | Background | Blue Waters |
| ||
| 09:30 | Bill Gropp | UIUC | Background | Application challenges for sustained Petascale |
| ||
| 10:00 | Break |
|
|
|
| ||
| 11:30 | Michele Buttler and Bill Kramer | NCSA | Background | Storage system issues for sustained petascale systems |
| ||
| 11:00 | Wen-Mei Hwu | UIUC | Background | Sustained petascale systems and Accelerators |
| ||
From Petascale to Exascale | 11:30 | Marc Snir | ANL & UIUC | Background | Potential extension of the collaboration to ANL and BG/Q |
| ||
| 12:00 | Lunch |
|
|
|
| ||
| 13:30 | Rajeev Thakur | ANL | Background | MPI challenges for sustained Petaflops and Exacale |
| ||
| 14:00 | Robert Ross | ANL | Background | Key I/O challenges for Petascale and Beyond |
| ||
| 14:30 | Paul Hovland | ANL | Background | TBA |
| ||
| 15:00 | George Bosilca | UTK/ICL | Background | ICL Research on Resilience and Numerical Algorithms |
| ||
| 15:30 | Break |
|
|
|
| ||
System software | 16:00 | Franck Cappello | INRIA&UIUC | Joint Results | Introduction of the activities in System + talk |
| ||
| 16:30 | Ana Gainaru | UIUC & NCSA | Joint Results | Signal Analysis for Modeling the Normal and Faulty Behavior of Large-scale HPC Systems |
| ||
| 17:00 | Thomas Ropars | EPFL | Joint Results | On Distributed Recovery for Send-Deterministic-Aware MPI Applications |
| ||
| 17:30 | Leonardo Bautista Gomez | Titech | Joint Results | Hierarchical groups for multilevel checkpoints and partial restart |
| ||
|
|
|
|
|
|
| ||
|
| Dinner at ... |
|
|
|
| ||
|
|
|
|
|
|
| ||
Workshop Day 2 | Tuesday Nov. 22th |
|
|
|
| |||
|
|
|
|
|
|
| ||
System Software cont. | 08:30 | Olivier Gluck | INRIA | Joint Results | Reducing energy consumption of fault tolerance algorithms |
| ||
| 09:00 | Gabriel Antoniu & Matthieu Dorrier | INRIA | Joint Results | Update on DAMARIS: Making CM1 scalling linarly up to 10,000 cores |
| ||
Numerical Library | 09:30 | Bill Gropp | UIUC | Joint Results | Introduction of the activity in Numerical Algorithms and Libraries + talk |
| ||
10:00 | Luc Giraud | INRIA | Joint Results | Fault tolerant Numerical Methods |
| |||
| 10:30 | Break |
|
|
|
| ||
| 11:00 | Laura Grigori | INRIA | Joint Early Results | Hybrid scheduling and communication avoiding for CALU |
| ||
| 11:30 | Sébastien Fourestier | INRIA | Joint Early Results | TBA |
| ||
| 12:00 | Yves Robert | INRIA | Background | Linear Algebra kernels on Petascale/exascale platforms: scheduling issues |
| ||
| 12:30 | Lunch |
|
| ||||
|
|
| ||||||
Numerical Lib. Cont. | 14:00 | Marc Baboulin | INRIA | Joint Early Results | A parallel tiled solver for dense symmetric indefinite systems on multicore architectures |
| ||
| 14:30 | Daisuke Takahashi & Alex Yee | U. Tsukuba | Joint Results Early | results on 1 All2all 3D A Scalable Parallel Algorithm for 3-D FFT |
| ||
Programming environments | 15:00 | Sanjay Kale | UIUC | Joint Early Results | Introduction of the activities in Programming Models + talk |
| ||
| 15:30 | Julien Bigot / Christian Perez | INRIA | Joint Early Results | TBA |
| ||
| 16:00 | Break |
|
|
|
| ||
| 16:30 | Alexandre Duchateau | UIUC | Joint Early Results | TBA |
| ||
| 17:00 | Jean François Mehaud | INRIA | Joint Early Results | TBA |
| ||
| 17:30 | Emmanuel Jeannot | INRIA | Joint Early Results | TBA |
| ||
| 18:00 | Franck Cappello & Marc snir | INRIA &UIUC & ANL |
| Preparation of the working groups |
| ||
|
|
|
|
|
|
| ||
| 19:00 | Banquet |
|
|
| |||
|
|
|
|
|
|
| ||
Workshop Day 3 | Wednesday June 29th |
|
|
|
|
| ||
|
|
|
|
|
|
| ||
8:30 | Franck Cappello & Marc snir | INRIA &UIUC & ANL |
| Indications for working groups |
| |||
Working groups | 9:00- 10:30 | Bill Gropp |
|
| Numerical libraries 3 groups (Laura Grigori, Yves Robert, Sebastien Lefourestier + Paul Hovland + Wen-Mei Hwu, ...) |
| ||
| 9:00 - 10:30 | Marc Snir |
|
| I/O (Bill Kramer + Gabriel Antoniu + Matthieu Dorrier + Michele Buttler + Brett Bode + Rajeev Thakur |
| ||
| 10:30 | Break |
|
|
|
| ||
| 11:00 - 12:30 | Sanjay Kale |
|
| Programming models 4 groups (Jean Francois Mehaut, Sebastien Fourestier, |
| ||
11:00 - 12:30 | Franck Cappello |
|
| Resilience 2 groups: resilient algorithms (Bill Gropp, George Bosilca, Yves Robert, Laura Grigori + ...) |
| |||
| 12:30 | Adjourn |
|
|
|
| ||
| 13:00 | Lunch |
|
|
|
|
| |
| 14:30 - 18:00 |
|
| Informal working groups |
| |||
| 19:00 | Dinner at ... |
|
|
|
|
...
We present an efficient and innovative parallel tiled algorithm for solving symmetric indefinite systems on multicore architectures. This solver avoids the communication overhead due to pivoting by using symmetric randomization. This randomization is computationally inexpensive and requires very little storage. Following randomization, a tiled LDLT factorization is used that reduces synchronization by using static or dynamic scheduling. We compare Gflop/s performance of our solver with other types of factorizations on a current multicore machine and we provide tests on accuracy using LAPACK test cases.
Daisuke Tekahashi and Alex Yee: A Scalable Parallel Algorithm for 3-D FFT
In this talk, a scalable parallel algorithm for 3-D fast Fourier transform (FFT) is presented. A typical decomposition for performing a parallel 3-D FFT is slab-wise. In this case, for N^3-point FFT, N must be greater than or equal to the number of MPI processes. Our proposed parallel 3-D FFT algorithm allows up to N^(3/2) MPI processes for N^3-point FFT. Moreover, this scheme requires only one all-to-all communication for
transposed-order output. Performance results of parallel 3-D FFTs on clusters of multi-core processors are reported.