...
Main Topics | Schedule | Speaker | Affiliation | Type of presentation | Title (tentative) | Download |
|
|
|
|
|
|
|
Dinner Before the Workshop | 7:30 PM | Only people registered for the dinner |
|
|
| |
|
|
|
|
|
|
|
Workshop Day 1 | Wednesday June 12th |
|
|
|
|
|
|
|
|
|
| TITLES ARE TEMPORARY (except if in bold font) |
|
Registration | 08:00 |
|
|
|
|
|
Welcome and Introduction | 08:30 | Marc Snir + Franck Cappello | INRIA&UIUC&ANL | Background | Welcome, Workshop objectives and organization |
|
| 08:45 | Bill Kramer | UIUC | Background | NCSA updates and vision of the collaboration |
|
| 09:00 | Marc Snir | ANL | Background | ANL updates vision of the collaboration |
|
| 09:15 | Frederic Desprez | Inria | Background | INRIA updates and vision of the collaboration |
|
Big systems | 9:30 | Bill Kramer | UIUC | Background | Update on BlueWaters |
|
| 10:00 | Break |
|
|
|
|
| 10:30 | Mitsuhisa Sato | U. Tsukuba & AICS | Background | AICS and the K computer |
|
| 11:00 | Paul Gibbon | Juelich | Background | TBA |
|
Resilience&fault tolerance and simulation | 11:30 | Marc Snir | ANL&UIUC | Report | ICIS report on Resilience |
|
| 12:00 | Lunch |
|
|
|
|
Numerical Algorithms | 13:30 | Bill Gropp | UIUC | BackgroundTBA | Topics for Collaboration in Numerical Libraries |
|
14:00 | Paul Hoveland | ANL | Background | TBA |
| |
| 14:30 | Frederic Nataf | INRIA&P6 | Background | Toward black-box adaptive domain decomposition methods |
|
| 15:00 | Luke Olson | UIUC | BackgroundTBA | Opportunities in developing a more robust and scalable multigrid solver |
|
15:30 | Break | |||||
| 16:00 | Marc Baboulin | INRIA | Background | Using condition numbers to assess numerical quality in high-performance computing applications |
|
Resilience&fault tolerance and simulation Chair: Franck Cappello | 16:30 | Vincent Baudoui
| Total & ANL | Joint-Results | Round-off error and silent soft error propagation in exascale applications | |
17:00 | Bogdan Nicolae | IBM | Joint Result | AI-Ckpt: Leveraging Memory Access Patterns for Adaptive Asynchronous Incremental Checkpointing | ||
17:30 | Martin Quison | INRIA | Result | Improving Simulations of MPI Applications Using A Hybrid Network Model with Topology and Contention Support | ||
| 18:00 | Adjourn |
|
|
|
|
| 19:00 | Dinner |
|
|
|
|
|
|
|
|
|
|
|
Workshop Day 2 | Thursday June 13th |
|
|
|
|
|
|
|
|
|
|
|
|
Programming Models (cont.) | 08:30 | Jean-François Mehaut | INRIA | Result | Progresses in the European FP7 Mont-Blanc 1 project and objectives of its follow up: Mont-Blanc 2 |
|
| 09:00 | Rajeev Thakur | ANL | Background | TBA |
|
| 09:30 | Andra Ecaterina Hugo | INRIA | Results | TBA |
|
| 10:00 | Celso Mendes | UIUC | Background | TBA |
|
| 10:30 | Break |
|
|
|
|
Big Data, I/O, Visualization | 11:00 | Dries Kimpe | ANL | Results | TBA |
|
| 11:30 | Gilles Fedak | INRIA | Result | Active Data: A Programming Model to Manage Data Life Cycle Across Heterogeneous Systems and Infrastructures |
|
| 12:00 | Matthieu Dorrier | INRIA | Joint Result | Data Analysis of Ensemble Simulations: an In Situ Approach using Damaris |
|
| 12:30 | Ian Foster | ANL | Background | TBA |
|
| 13:00 | Lunch |
|
|
|
|
|
|
|
|
|
|
|
Mini Workshop1 |
|
|
|
|
|
|
Resilience | 14:00 | Ana Gainaru | UIUC | Results | Failure prediction on Blue Waters |
|
| 14:30 | Xiang Ni | UIUC | Results | TBA |
|
| 15:00 | Tatiana | INRIA & ANL | Result | TBA |
|
| 15:30 | Mohamed Slim Bouguerra | INRIA & ANL | Result | TBA |
|
| 16:00 | Break |
|
|
|
|
| 16:30 | Amina Guermouche | UVSQ | Result | Multi-criteria Checkpointing Strategies: Response-time versus Resource Utilization |
|
| 17:00 | Thomas Ropars | EPFL | Result | TBA |
|
| 17h30 | Mehdi Diouri | INRIA | Result | ECOFIT: A Framework to Estimate Energy Consumption of Fault Tolerance Protocols for HPC Applications |
|
| 18:00 | Adjourn |
|
|
|
|
|
|
|
|
|
|
|
Mini Workshop2 |
|
|
|
|
|
|
Numerical Algorithms and Libraries | 14:00 | Laura Grigori | INRIA | Result | TBA |
|
| 14:30 | Stefan Wild | ANL | Result | TBA |
|
| 15:00 | Frederic Hecht | INRIA/P6 | Result | TBA |
|
| 15:30 | Jed Brown | ANL | Result | TBA |
|
| 16:00 | Break |
|
|
|
|
| 16:30 | Yushan Wang | INRIA P11 | Result | TBA |
|
| 17:00 | Jean Utke | ANL | Result | Designing and implementing a tool-indedendent, adjoinable MPI wrapper library |
|
| 17:30 | Laurent Hascoet | INRIA | Result | The adjoint of MPI one-sided communications |
|
| 18:00 | Adjourn |
|
|
|
|
|
|
|
|
|
|
|
| 19:00 | Banquet |
|
| Lyon |
|
|
|
|
|
|
|
|
Workshop Day 3 | Friday June 14th |
|
|
|
|
|
|
|
|
|
|
|
|
Mini Workshop1 (cont.) |
|
|
|
|
|
|
Resilience | 08:30 | Di Sheng | INRIA | Result | TBA |
|
| 09:00 | Guillaume Aupy | INRIA | Result | TBA |
|
| 09:30 | Discussion |
|
|
|
|
| 10:00 | Break |
|
|
|
|
Mini Workshop3 | 10:30 | Guillaume Mercier | INRIA | Result | TBA |
|
Programming and Scheduling | 11:00 | Vincent Lanore | INRIA | Result | TBA |
|
| 11:30 | Anne Benoit | INRIA | Result | Energy-efficient scheduling |
|
| 12:00 | François Tessier | INRIA | Result | TBA |
|
| 12:30 | Discussions |
|
|
|
|
| 13:00 | Closing and Lunch |
|
|
|
|
|
|
|
|
|
|
|
Mini Workshop2 (cont.) |
|
|
|
|
|
|
Numerical Algorithms and Libraries | 08:30 | François Pellegrini | INRIA | Result | Shared memory parallel algorithms in Scotch 6 |
|
| 09:00 | Luc Giraud | INRIA | Result | TBA |
|
| 09:30 | Discussions |
|
|
|
|
| 10:00 | Break |
|
|
|
|
Mini Workshop4 | 10:30 | Kate Keahey | ANL | Result | TBA |
|
Clouds | 11:00 | Gabriel Antoniu | INRIA | Result | TBA |
|
| 11:30 | Christian Perez | INRIA | Result | TBA |
|
| 12:00 | Eddy Caron | INRIA | Result | TBA |
|
| 12:30 | Discussions |
|
|
|
|
| 13:00 | Closing and Lunch |
|
|
|
|
...
Future exascale computers will open up new perspectives in numerical simulation, but they will also experience more errors because of their massive scale. We will focus here on round-off errors and on silent soft errors, of which propagation needs to be studied in order to ensure results accuracy. Round-off errors come from numerical calculation finite precision and can lead to catastrophic losses in significant numbers when they accumulate. We will discuss the limits of existing error bounds when facing large scale problems. Soft hardware errors can also perturb computations by randomly flipping memory bits. Some of these errors are automatically corrected but others can propagate silently through the calculations. We will present some strategies to determine the sensitive sections of an application as part of future research work.
Bogdan Nicolae
AI-Ckpt: Leveraging Memory Access Patterns for Adaptive Asynchronous Incremental Checkpointing
With increasing scale and complexity of supercomputing and cloud computing architectures, faults are becoming a frequent occurrence, which makes reliability a difficult challenge. Although for some applications it is enough to restart failed tasks, there is a large class of applications where tasks run for a long time or are tightly coupled, thus making a restart from scratch unfeasible. Checkpoint-Restart (CR), the main method to survive failures for such applications faces additional challenges in this context: not only does it need to minimize the performance overhead on the application due to checkpointing, but it also needs to operate with scarce resources. To this end, this paper contributes with a novel approach that leverages both the current and past memory access pattern in order to optimize the order in which memory pages are flushed to stable storage during asynchronous checkpointing. Large scale experiments show up to 60% improvement when compared to state-of-art checkpointing approaches, all this achievable with an extra memory requirement of less than 5% of the total application memory.
Bill Gropp
Topics for Collaboration in Numerical Libraries
This talk will discuss some open problems in numerical libraries for extreme scale systems, including issues currently facing some of the application teams that are currently using the Blue Waters sustained petascale system.
Luke Olson
Opportunities in developing a more robust and scalable multigrid solver
Multigrid methods have increased in robustness in recent years due to new algorithmic advances and new theoretical developments. The result is a more robust multilevel framework leading to improved convergence for a wider range of non-elliptic problems. Yet, many of these developments have not been adapted at scale despite their intended use while many of the optimizations could be
strengthened by considering the high-perfromance computing architectures more directly. In this talk, we discuss a particular example of these recent optimizations in multigrid, to define optimal interpolation, that moves toward a more general framework, and highlight some focused directions for collaboration in this respect. In addition, recent trends in highthrouput computing have motivated algorithmic changes in the multigrid design. In this talk, we will also highlight some directions to futher advance multigrid solvers at scale based on this work with collaborion through the Joint Lab.