...
Main Topics | Schedule | Speaker | Affiliation | Type of presentation | Title (tentative) | Download |
|
|
|
|
|
|
|
Dinner Before the Workshop | 7:30 PM | Only people registered for the dinner |
|
|
| |
|
|
|
|
|
|
|
Workshop Day 1 | Wednesday June 12th |
|
|
|
|
|
|
|
|
|
| TITLES ARE TEMPORARY (except if in bold font) |
|
Registration | 08:00 |
|
|
|
|
|
Welcome and Introduction Amphitheatre | 08:30 | Marc Snir + Franck Cappello | INRIA&UIUC&ANL | Background | Welcome, Workshop objectives and organization |
|
| 08:45 | Bill Kramer | UIUC | Background | NCSA updates and vision of the collaboration |
|
| 09:00 | Marc Snir | ANL | Background | ANL updates vision of the collaboration | |
| 09:15 | Frederic Desprez | Inria | Background | INRIA updates and vision of the collaboration | |
Big systems | 9:30 | Bill Kramer | UIUC | Background | Update on BlueWaters |
|
| 10:00 | Break |
|
|
|
|
| 10:30 | Mitsuhisa Sato | U. Tsukuba & AICS | Background | AICS and the K computer | |
CANCELED | 11:00 | Paul Gibbon | Juelich | Background | Meeting the Exascale Challenge at the Juelich Supercomputing Centre. |
|
Resilience&fault tolerance and simulation | 11:00 | Marc Snir | ANL&UIUC | Report | ICIS report on Resilience |
|
11:30 | Vincent Baudoui | Total & ANL | Joint-Results | Round-off error and silent soft error propagation in exascale applications | ||
| 12:00 | Lunch |
|
|
|
|
Numerical Algorithms | 13:30 | Bill Gropp | UIUC | Background | Topics for Collaboration in Numerical Libraries |
|
14:00 | Paul Hoveland | ANL | Background | Argonne strategic plan in applied math |
| |
| 14:30 | Marc Baboulin | INRIA | Background | Using con dition numbers to assess numerical quality in high-performance computing applications |
|
| 15:00 | Luke Olson | UIUC | Background | Opportunities in developing a more robust and scalable multigrid solver |
|
15:30 | Break | |||||
| 16:00 | Frederic Nataf | INRIA&P6 | Background | Toward black-box adaptive domain decomposition methods |
|
Resilience&fault tolerance and simulation Chair: Franck Cappello | 16:30 | Bogdan Nicolae | IBM | Joint Result | AI-Ckpt: Leveraging Memory Access Patterns for Adaptive Asynchronous Incremental Checkpointing | |
17:00 | Martin Quison | INRIA | Result | Improving Simulations of MPI Applications Using A Hybrid Network Model with Topology and Contention Support | ||
| 17:30 | Adjourn |
|
|
|
|
| 18:45 | Bus for Diner |
|
|
|
|
|
|
|
|
|
|
|
Workshop Day 2 | Thursday June 13th |
|
|
|
|
|
|
|
|
|
|
|
|
Programming Models | 08:30 | Jean-François Mehaut | INRIA | Result | Progresses in the European FP7 Mont-Blanc 1 project and objectives of its follow up: Mont-Blanc 2 |
|
| 09:00 | Rajeev Thakur | ANL | Background | Update on MPI and OS/R Activities at Argonne |
|
| 09:30 | Andra Ecaterina Hugo | INRIA | Results | Composing multiple StarPU applications over heterogeneous machines: a supervised approach |
|
| 10:00 | Celso Mendes | UIUC | Background | Dynamic Load Balancing for Weather Models via AMPI |
|
| 10:30 | Break |
|
|
|
|
Big Data, I/O, Visualization | 11:00 | Dries Kimpe | ANL | Results | Triton: Exascale Storage |
|
| 11:30 | Gilles Fedak | INRIA | Result | Active Data: A Programming Model to Manage Data Life Cycle Across Heterogeneous Systems and Infrastructures |
|
| 12:00 | Matthieu Dorrier | INRIA | Joint Result | Data Analysis of Ensemble Simulations: an In Situ Approach using Damaris |
|
| 12:30 | Ian Foster | ANL | Background | Compiler optimization for distributed dynamic data flow programs |
|
| 13:00 | Lunch |
|
|
|
|
|
|
|
|
|
|
|
Mini Workshop1 Amphitheatre |
|
|
|
|
|
|
Resilience | 14:00 | Ana Gainaru | UIUC | Results | Challenges in predicting failures on the Blue Waters system. |
|
| 14:30 | Xiang Ni | UIUC | Results | ACR: Automatic Checkpoint/Restart for Soft and Hard Error Protection. |
|
| 15:00 | Tatiana Martsinkevich | INRIA & ANL | Result | On the feasibility of message logging in hybrid hierarchical FT protocols |
|
| 15:30 | Mohamed Slim Bouguerra | INRIA & ANL | Result | Investigating the probability distribution of false negative failure alerts in HPC systems |
|
| 16:00 | Break |
|
|
|
|
| 16:30 | Amina Guermouche | UVSQ | Result | Multi-criteria Checkpointing Strategies: Response-time versus Resource Utilization |
|
| 17:00 | Thomas Ropars | EPFL | Result | Towards efficient replication of HPC applications to deal with crash failures |
|
| 17h30 | Mehdi Diouri | INRIA | Result | ECOFIT: A Framework to Estimate Energy Consumption of Fault Tolerance Protocols for HPC Applications |
|
| 18:00 | Adjourn |
|
|
|
|
|
|
|
|
|
|
|
Mini Workshop2 Room: Saint Maur |
|
|
|
|
|
|
Numerical Algorithms and Libraries | 14:00 | Jean Utke | ANL | Result | Designing and implementing a tool-indedendent, adjoinable MPI wrapper library |
|
| 14:30 | Laurent Hascoet | INRIA | Result | The adjoint of MPI one-sided communications |
|
| 15:00 | Stefan Wild, | ANL | Result | Loud computations? Noise in iterative solvers |
|
| 15:30 | Jed Brown | ANL | Result | Vectorization, communication aggregation, and reuse in stochastic and temporal dimensions |
|
| 16:00 | Break |
|
|
|
|
| 16:30 | Yushan Wang | INRIA P11 | Result | Accelerating incompressible fluid flows simulations using SIMD or GPU computing |
|
| 17:00 | Frederic Hecht | INRIA/P6 | Result | FreeFem++, a user language to solve PDE. |
|
| 18:00 | Adjourn |
|
|
|
|
|
|
|
|
|
|
|
| 18:45 | Bus for diner |
|
| Lyon |
|
|
|
|
|
|
|
|
Workshop Day 3 | Friday June 14th |
|
|
|
|
|
|
|
|
|
|
|
|
Mini Workshop1 (cont.) Room: Les essarts |
|
|
|
|
|
|
Resilience | 08:30 | Di Sheng | INRIA | Result | Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism |
|
| 09:00 | Guillaume Aupy | INRIA | Result | On the Combination of Silent Error Detection and Checkpointing |
|
Mini Workshop3 | 09:30 | Guillaume Mercier | INRIA | Result | Topology Management and MPI Implementations Improvements |
|
10:00 | Break | |||||
Programming and Scheduling | 10:30 | Vincent Lanore | INRIA | Result | Static 2D FFT adaptation through a component model based on Charm++ |
|
| 11:00 | Anne Benoit | INRIA | Result | Energy-efficient scheduling |
|
| 11:30 | François Tessier | INRIA | Result | Communication-aware load balancing with TreeMatch in Charm++ |
|
| 12:00 | Closing |
|
|
| |
| 12:30 | Lunch |
|
|
|
|
|
|
|
|
|
|
|
Mini Workshop2 (cont.) Room: Saint Maur |
|
|
|
|
|
|
Numerical Algorithms and Libraries | 08:30 | François Pellegrini | INRIA | Result | Shared memory parallel algorithms in Scotch 6 |
|
| 09:00 | Luc GiraudAbdou Guermouche | INRIA | ResultTBA | Towards resilient parallel linear Krylov solvers |
|
Mini Workshop4 | 09:30 | Kate Keahey | ANL | Result | Research Topics and Collaboration Opportunities in the Nimbus Team |
|
Clouds | 10:00 | Break |
|
|
|
|
10:30 | Jonathan Rouzaud-Cornabas | CNRS&INRIA | Result | SimGrid Cloud Broker: Simulation of Public and Private Clouds |
| |
| 11:00 | Christian Perez | INRIA | Result | On Component Models to Deploy Application on Clouds |
|
| 11:30 | Eddy Caron | INRIA | Result | Seed4C: Secured Embedded Element and Data privacy for Cloud Federation |
|
| 12:00 | Closing |
|
|
|
|
| 12:30 | Lunch |
|
|
|
|
...
The advent of IaaS cloud computing promises acquisition and management of customized on-demand resources. What is the best way to leverage those resources? What new applications are emerging in this context? How will they change our work patterns? What new technical approaches need to be developed to support them? What new opportunities will they lead to? In this talk, I will describe tools the Nimbus team is developing, among others, in the context of the Ocean Observatory Initiative project, that focus on answering these questions. I will describe our approach and tools, the problems we are trying to address, as well as the interaction patterns associated with scientific applications currently driving our approach.
Abdou Guermouche
Towards resilient parallel linear Krylov solvers
The advent of exascale machines will require the use of parallel resources at an unprecedented scale, probably leading to a high rate of hardware faults. High Performance Computing (HPC) applications that aim at exploiting all these resources will thus need to be resilient, i.e., being able to still compute a correct solution even in presence of faults. In this work, we investigate possible remedies in the framework of the solution of large sparse linear systems that is often the inner most numerical kernel in many scientific and engineering applications and also one of the most time consuming part. More precisely, we present recovery followed by restarting strategies in the framework of Krylov subspace solvers where lost entries of the iterate are interpolated to define a new initial guess before restarting. In particular, we consider two interpolation policies that preserve key numerical properties of well-known solvers. We assess the impact of the recovery method, the fault rate and the number of processors on the robustness of the resulting linear solvers. We consider experiments with CG, GMRES and Bi-CGStab.