...
Main Topics | Schedule | Speaker | Affiliation | Type of presentation | Title (tentative) | Download | ||
| Sunday Nov. 20th | Dinner at ... |
|
|
|
| ||
|
|
|
|
|
|
| ||
Workshop Day 1 | Monday Nov. 21th |
|
|
|
|
| ||
|
|
|
|
| ALL TITLES ARE TEMPORARY |
| ||
Registration | 08:00 |
|
|
|
|
| ||
Welcome and Introduction | 08:30 | Marc Snir + Franck Cappello | INRIA&UIUC | Background | Welcome Workshop objectives and organization |
| ||
| 08:40 | Danny Powell | NCSA | Background | NCSA 5 year Strategy |
| ||
| 08:50 | Claude Kirchner / Thierry Priol / Jean Roman | INRIA | Background | Update on INRIA and HPC |
| ||
Sustained Petascale | 09:00 | Billl Kramer | NCSA | Background | Blue Waters |
| ||
| 09:30 | Bill Gropp | UIUC | Background | Application challenges for sustained Petascale |
| ||
| 10:00 | Break |
|
|
|
| ||
| 11:30 | Michele Buttler and Bill Kramer | NCSA | Background | Storage system issues for sustained petascale systems |
| ||
| 11:00 | Wen-Mei Hwu | UIUC | Background | Sustained petascale systems and Accelerators |
| ||
From Petascale to Exascale | 11:30 | Marc Snir | ANL & UIUC | Background | Potential extension of the collaboration to ANL and BG/Q |
| ||
| 12:00 | Lunch |
|
|
|
| ||
| 13:30 | Rajeev Thakur | ANL | Background | MPI challenges for sustained Petaflops and Exacale |
| ||
| 14:00 | Robert Ross | ANL | Background | Key I/O challenges for Petascale and Beyond |
| ||
| 14:30 | Paul Hovland | ANL | Background | TBA |
| ||
| 15:00 | George Bosilca | UTK/ICL | Background | ICL Research on Resilience and Numerical Algorithms |
| ||
| 15:30 | Break |
|
|
|
| ||
System software | 16:00 | Franck Cappello | INRIA&UIUC | Joint Results | Introduction of the activities in System + talk |
| ||
| 16:30 | Ana Gainaru | UIUC & NCSA | Joint Results | Signal Analysis for Modeling the Normal and Faulty Behavior of Large-scale HPC Systems |
| ||
| 17:00 | Thomas Ropars | EPFL | Joint Results | On Distributed Recovery for Send-Deterministic-Aware MPI Applications |
| ||
| 17:30 | Leonardo Bautista Gomez | Titech | Joint Results | Hierarchical groups for multilevel checkpoints and partial restart |
| ||
|
|
|
|
|
|
| ||
|
| Dinner at ... |
|
|
|
| ||
|
|
|
|
|
|
| ||
Workshop Day 2 | Tuesday Nov. 22th |
|
|
|
| |||
|
|
|
|
|
|
| ||
System Software cont. | 08:30 | Olivier Gluck | INRIA | Joint Results | Reducing energy consumption of fault tolerance algorithms |
| ||
| 09:00 | Gabriel Antoniu & Matthieu Dorrier | INRIA | Joint Results | Update on DAMARIS: Making CM1 scalling linarly up to 10,000 cores |
| ||
Numerical Library | 09:30 | Bill Gropp | UIUC | Joint Results | Introduction of the activity in Numerical Algorithms and Libraries + talk |
| ||
10:00 | Luc Giraud | INRIA | Joint Results | Fault tolerant Numerical Methods |
| |||
| 10:30 | Break |
|
|
|
| ||
| 11:00 | Laura Grigori | INRIA | Joint Early Results | Hybrid scheduling and communication avoiding for CALU |
| ||
| 11:30 | Sébastien Fourestier | INRIA | Joint Early Results | TBA |
| ||
| 12:00 | Yves Robert | INRIA | Background | Linear Algebra algebra kernels on Petascalepetascale/exascale platforms: scheduling issues |
| ||
| 12:30 | Lunch |
|
| ||||
|
|
| ||||||
Numerical Lib. Cont. | 14:00 | Marc Baboulin | INRIA | Joint Early Results | A parallel tiled solver for dense symmetric indefinite systems on multicore architectures |
| ||
| 14:30 | Daisuke Takahashi & Alex Yee | U. Tsukuba | Joint Results | A Scalable Parallel Algorithm for 3-D FFT |
| ||
Programming environments | 15:00 | Sanjay Kale | UIUC | Joint Early Results | Introduction of the activities in Programming Models + talk |
| ||
| 15:30 | Julien Bigot / Christian Perez | INRIA | Joint Early Results | TBA |
| ||
| 16:00 | Break |
|
|
|
| ||
| 16:30 | Alexandre Duchateau | UIUC | Joint Early Results | TBA |
| ||
| 17:00 | Jean François Mehaud | INRIA | Joint Early Results | TBA |
| ||
| 17:30 | Emmanuel Jeannot | INRIA | Joint Early Results | TBA |
| ||
| 18:00 | Franck Cappello & Marc snir | INRIA &UIUC & ANL |
| Preparation of the working groups |
| ||
|
|
|
|
|
|
| ||
| 19:00 | Banquet |
|
|
| |||
|
|
|
|
|
|
| ||
Workshop Day 3 | Wednesday June 29th |
|
|
|
|
| ||
|
|
|
|
|
|
| ||
8:30 | Franck Cappello & Marc snir | INRIA &UIUC & ANL |
| Indications for working groups |
| |||
Working groups | 9:00- 10:30 | Bill Gropp |
|
| Numerical libraries 3 groups (Laura Grigori, Yves Robert, Sebastien Lefourestier + Paul Hovland + Wen-Mei Hwu, ...) |
| ||
| 9:00 - 10:30 | Marc Snir |
|
| I/O (Bill Kramer + Gabriel Antoniu + Matthieu Dorrier + Michele Buttler + Brett Bode + Rajeev Thakur |
| ||
| 10:30 | Break |
|
|
|
| ||
| 11:00 - 12:30 | Sanjay Kale |
|
| Programming models 4 groups (Jean Francois Mehaut, Sebastien Fourestier, |
| ||
11:00 - 12:30 | Franck Cappello |
|
| Resilience 2 groups: resilient algorithms (Bill Gropp, George Bosilca, Yves Robert, Laura Grigori + ...) |
| |||
| 12:30 | Adjourn |
|
|
|
| ||
| 13:00 | Lunch |
|
|
|
|
| |
| 14:30 - 18:00 |
|
| Informal working groups |
| |||
| 19:00 | Dinner at ... |
|
|
|
|
...
Over the past few years, energy consumption of supercomputers has become a major issue. In order to be able to meet the important needs in terms of performance that express scientists in various fields, supercomputers are growing too fast. In fact, they involve more and more computing nodes, which consequently increase both their total energy consumption and their probability to experience a failure. Especially, in order to ensure the transition to the exascale era by 2018 which will involve millions of cores, we need to address these two challenges by providing efficient fault tolerance mechanisms while reducing the total energy consumption.
In this talk, we first present some techniques used to reduce the energy consumptions of large scale distributed systems and particularly in future supercomputers. Then, we present our current research works for reducing energy consumption costs of fault tolerance algorithms in exascale supercomputers.
Yves Robert: Linear algebra kernels on petascale/exascale platforms: scheduling issues
Future exascale machines will likely be massively parallel architectures, with 100K to 1000K processors, each processor itself being equipped with 1K to 10Kcores. At the node level, the architecture is a shared-memory machine, running many parallel threads on the cores. At the machine level, the architecture is a distributed-memory machine. This additional level of hierarchy, together with massive parallelism at the node level, dramatically complicates the design of new versions of the standard numerical linear algebra algorithms that are at the heart of many scientific applications. On exascale platforms, resilience is a key challenge. Failures are much more likely to occur during the execution of parallel jobs that enroll increasingly larger numbers of processors. The design of efficient fault-tolerant scheduling strategies will be key to high performance. Such strategies can involve either checkpointing, or task replication, or dynamic task re-execution, or any combination. But they all incur big overheads in terms of performance, and of energy-consumption. The main goal of the talk is to survey the challenges faced to design linear algebra algorithm on exascale architectures, and to provide a few examples of algorithms and scheduling techniques
that constitute a first step to solving these challenges. Joint work with Marin Bougeret, Henri Casanova, Jack Dongarra, Thoma Hérault, Julien Langou, Mathieu Faverge, and Frédéric Vivien.
Marc Baboulin: A parallel tiled solver for dense symmetric indefinite systems on multicore architectures
...