...
Main Topics | Schedule | Speakers | Types of presentation | Topic | Download | |
Sunday Nov. 18th | Dinner | Giordano's | http://www.giordanos.com/ |
| ||
Workshop Day 1 (Room 1416, TCS conference center) | Monday Nov. 19th |
|
|
| ||
| 07:30-8:30 | Transportation: Guest House to TCS (building 240) |
| (Entrance of the conference center) |
| |
| 08:00 | Contiental Breakfast and Registration |
| Food available in Room 1407, Lunch seating in room 1416 (second half) |
| |
Welcome and Introduction | 08:30 | Franck Cappello, INRIA & UIUC, Marc Snir ANL | Opening | Welcome, formal opening and workshop details |
| |
| 08:40 | Marc Snir | Opening | ANL presentation and vision of the collaboration |
| |
| 08:50 | Bill Gropp | Opening | UIUC/NCSA update and vision of the collaboation |
| |
| 09:00 | Frederic Desprez | Opening | INRIA update on HPC strategy and vision of the collaboration |
| |
Big Apps, Big DATA - Big I/O | 09:15 | Robert Jacob | Trends in HPC | Climate simulation at extreme scale | ||
| 09:45 | Rob Ross, ANL | Trends in HPC | Trends in HPC I/O and File systems |
| |
| 10:15 | Break |
|
|
| |
| 10:45 | Rob Pennington, NCSA | Trends in HPC | Big Data | ||
| 11:15 | Andrew Chien, ANL | Potential collaboration | Big Data | ||
| 11:45 | Matthieu Dorier, INRIA | Joint Results | Visualization | ||
| 12:15 | Lunch |
|
|
| |
Programming Models/Runtime chair: Sanjay Kale | 13:30 | Wen-Mei Hwu, UIUC | TBA | Accelerators | ||
| 14:00 | Pavan Balaji, ANL | Potential collaboration | MPI3 and Unified Runtime | ||
| 14:30 | Andra Hugo, Raymond Namyst, INRIA | Potential collaboration | Composing multiple StarPU applications over heterogeneous machines: a supervised approach | ||
| 15:00 | Jean-François Mehaut, INRIA | Potential collaboration | Optimizations for modern NUMA |
| |
| 15:30 | Break |
|
|
| |
Numerical algorithms and Methods | 16:00 | TBA, ANL | TBA | TBA | ||
16:30 | Laura Grigori | Results | Communication avoiding | |||
| 17:00 | Bill Gropp, UIUC | Results | Hybrid Scheduling | ||
| 17:30 | Laurent Hascoet, INRIA | Early Results | TBA | ||
18:00 | Adjourn |
| ||||
19:00 | Dinner | Jameson's |
| |||
|
|
|
|
|
| |
Workshop Day 2 (Main room) | Tuesday Nov. 20th |
|
|
|
| |
|
|
|
|
|
| |
Big Systems | 08:30 | Pete Beckman, ANL | Trends | New Directions in Extreme-Scale Operating Systems and Runtime Software |
| |
| 09:00 | Bill Kramer, UIUC/NCSA | Trends | Blue Waters update |
| |
Cloud | 09:30 | Ian Foster, ANL | Potential collaboration | TBA | ||
| 10:00 | Christine Morin, INRIA | Potential collaboration | Contrial | ||
| 10:30 | Break |
|
|
| |
11:00 | Frederic Desprez, INRIA | Potential collaboration | TBA | |||
Resilience: | 11:30 | Mohamed Slim Bouguerra, INRIA | Early Result | Performance modeling of checkpointing under failure prediction | ||
| 12:00 | Rinku Gupta, ANL | Potential collaboration | Interlayer error notification, coordination and CIFTS |
| |
| 12:30 | Ana Gainaru, UIUC | Early Results | Coupling failure prediction, proactive and preventive checkpoint for current production HPC systems. |
| |
| 13:00 | Lunch |
| Food buffet in Room 1407, Lunch seating in room 1416 (second half) |
| |
|
|
|
| Parallel Session |
| |
Mini workshop on Numerical libraries | 8:30 | Stefan Wild, ANL | Potential collaboration | TBA | ||
| 09:00 | Bill Gropp, UIUC | Potential collaboration | TBA | ||
| 09:30 | Laura Grigori, INRIA | Potential collaboration | TBA | ||
| 10:00 | Break | TBA | |||
| 10:30 | Anshu Dubey, ANL | Potential collaboration | Optimizing Scientific Codes While Retaining Portability |
| |
| 11:00 | Discussion |
|
|
| |
| 12:00 | Adjourn |
|
|
| |
| 13:00 | Lunch |
|
|
| |
|
|
|
| Parallel Sessions |
| |
Mini workshop on Performance Modeling and simulation | 14:30 | Sanjay Kale, UIUC | Early Results | BIG SIM |
| |
| 15:00 | Arnaud Legrand, INRIA |
| SIM GRID |
| |
| 15:30 | Torsten Hoefler, ETH | Early Results | TBA |
| |
| 16:00 | Break |
|
|
| |
| 16:30 | Yves Robert, INRIA | Early Results | TBA |
| |
| 17:00 | Discussion |
|
|
| |
| 18:00 | Adjourn |
|
|
| |
| 19:00 | Dinner | Meggaiano's | [http://www.maggianos.com/EN/Oak-Brook_Oak-Brook_IL/Pages/LocationLanding.aspx?AspxAutoDetectCookieSupport=1 |
| |
|
|
|
|
|
| |
Mini workshop on Cloud | 14:30 | Kate Keahey, ANL | Potential collaboration | TBA |
| |
| 15:00 | Narayan Deai, ANL | Potential collaboration | TBA |
| |
| 15:30 | Jonathan Rouzaud, INRIA | Potential collaboration | TBA |
| |
| 16:00 | Break |
|
|
| |
| 16:30 | Michael Wilde | Potential collaboration | Swift: simpler parallel programming for cloud and HPC domains http://www.ci.uchicago.edu/swift (Swift for clouds and clusters) |
| |
| 17:00 | Discussion |
|
|
| |
| 18:00 | Adjourn |
|
|
| |
| 19:00 | Dinner | Meggaiano's | [http://www.maggianos.com/EN/Oak-Brook_Oak-Brook_IL/Pages/LocationLanding.aspx?AspxAutoDetectCookieSupport=1 |
| |
|
|
|
|
|
| |
Workshop Day 3 (Main room) | Wednesday Nov 21st |
|
|
|
| |
|
|
|
| Parallel Sessions |
| |
Mini workshop on Programming models/runtime | 08:30 | Emmanuel Jeannot, INRIA | Results | TBA |
| |
09:00 | Sanjay Kale, UIUC | Charm++ update |
| |||
09:30 | Christian Perez, INRIA |
| TBA |
| ||
10:00 | Break |
|
| |||
10:30 | Jim Dinan |
| One sided communication |
| ||
11:00 | Sebastien Fourestier | Potential collaboration | Parallel repartitioning and re-mapping in Scotch |
| ||
| 11:30 | Discussion |
|
|
| |
| 12:30 | Closing |
|
|
| |
| 13:00 | Lunch |
|
|
| |
|
|
|
|
|
| |
Mini workshop on Resilience | 08:30 | Mohamed Slim Bouguerra TBA | TBA | TBA |
| |
| 09:00 | Peter Brune Amina Guermouche, ANL INRIA | TBA | TBA |
| |
| 09:30 | Bogdan Nicolae, IBM | Results | Optimizing checkpoint image pages storage |
| |
| 10:00 | Break |
|
|
| |
| 10:30 | Tatiana Martsinkevich, INRIA | Results | Fully distributed recovery for send-determinism applications |
| |
| 11:00 | Amina Guermouche Peter Brune, INRIA ANL | Results | Trends | Multilevel Resiliency for PDE Simulations TBA |
|
| 11:30 | Discussion |
|
| ||
|
| 12:30 | Closing |
|
|
|
| 13:00 | Lunch |
| Boxe Lunches |
|
...
A large percentage of computing capacity in today’s large high-performance computing systems is wasted due to failures and recoveries. A way of reducing the overhead induced by these strategies is by combining them with failure avoidance methods. Failure avoidance is based on a prediction model that detects fault occurrences ahead of time and allows preventive measures to be taken, such as task migration or checkpointing the application. This talk presents the implementation and results of a prototype implementation of proactive checkpointing based on the ELSA toolkit coupled with periodic multi-level checkpointing based on FTI. The proactive checkpointing is implemented as a level zero (L0) in a four-level scheme, providing the fastest checkpoint, which is necessary to act quickly between the failure prediction and the moment of the failure. We evaluate the proposed approach on the TSUBAME system and we show that the overhead in comparison with a preventive checkpoint execution only represents only 2% to 6%.
Peter Brune
Multilevel Resiliency for PDE Simulations
Co-Authors: Mark Adams, Jed Brown, Peter Brune (speaking), Barry Smith
Multilevel methods for the solution of partial differential equations are the de-facto fast algorithms for large-scale computations. The utilization of these method necessitates progressively smaller approximations of the solution to the problem, potentially on a smaller subset of the machine. These algorithms present a tempting target for enabling efficient extreme-scale resiliency, as the multilevel structure may be used to efficiently compress the PDE solution and check for algorithmic correctness. We discuss the components of multilevel methods and their use for resilient computation. We speculate on possibilities for the integration of these methods into simulations.