...
Main Topics | Schedule | Speaker | Affiliation | Type of presentation | Title (tentative) | Download |
|
|
|
|
|
|
|
Sunday Nov. 24th | 7:00 PM | Only people registered for the dinner |
|
|
| |
|
|
|
|
|
|
|
Workshop Day 1 | Monday Nov. 25th |
|
|
|
|
|
|
|
|
|
| TITLES ARE TEMPORARY (except if in bold font) |
|
Registration | 08:00 |
|
|
|
|
|
Welcome and Introduction Auditorium 1122 Chair: Franck | 08:30 | Marc Snir + Franck Cappello | INRIA&UIUC&ANL | Background | Welcome, Workshop objectives and organization | |
| 08:45 | Peter Schiffer | UIUC | Background | Welcome from UIUC Vice Chancellor for Research | |
| 09:00 | Ed. Siedel | UIUC | Background | NCSA update and vision of the collaboration | |
| 09:15 | Michel Cosnard | Inria | Background | INRIA updates and vision of the collaboration | |
9:30 | Marc Snir | ANL | Background | Argonne updates and vision of the collaboration | ||
| 9h45 | Franck Cappello | ANL | Background | Joint-Lab, New Joint-Lab, PUF articulation |
|
| 10:15 | Break |
|
|
| |
Extreme Scale Systems and infrastructures Auditorium 1122 Chair: Marc Snir | 10:45 | Pete Beckman | ANL |
| Extreme Scale Computing & Co-design Challenges |
|
| 11:15 | John Towns | UIUC |
| Applications Challenges in the XSEDE Environment | |
11:45 | Gabriel Antoniu | INRIA | Plenary talk | |||
| 12:15 | Lunch |
|
|
|
|
13:45 | Bill Kramer | UIUC | Blue Waters | Is Petascale Completely Done? What Should We Do Now? | ||
14:15 | Marc Snir | UIUC |
| G8 ECS and international collaboration toward extreme scale climate simulation |
| |
| 14:45 | Rob Ross | ANL |
| Thinking Past POSIX: Persistent Storage in Extreme Scale Systems | |
15:15 | François Pellegrini | INRIA | Plenary talk | |||
15:45 | Break | |||||
16:15 | Pavan Balagi | ANL | ||||
16:45 | Wen Mei Hwu | UIUC | Plenary talk | |||
17:15 | Adjourn | |||||
| 18:45 | Bus for Diner |
|
|
|
|
|
|
|
|
|
|
|
Workshop Day 2 | Tuesday Nov. 26 |
|
|
|
|
|
Applications, I/O, Visualization, Big data Auditorium 1122 Chair: Rob Ross | 08:30 | Greg Bauer | UIUC | Applications and their challenges on Blue Waters |
| |
| 09:00 | Matthieu Dorier | INRIA | Joint-result, submitted | CALCioM: Mitigating I/O Interferences in HPC Systems through Cross-Application Coordination | |
09:30 | Dries Kempe | ANL |
| Mercury: Enabling Remote Procedure Call for High-Performance Computing | ||
| 10:00 | Venkat Vishwanath | ANL |
| Plenary talk | |
| 10:30 | Break |
|
|
|
|
| 11:00 | Babak Behzad | UIUC | ACM/IEEE SC13 | Taming Parallel I/O Complexity with Auto-Tuning |
|
| 11:30 | McHenry, Kenton Guadron | UIUC |
| NSF CIF21 DIBBs: Brown Dog | |
| 12:00 | Lunch |
|
|
| |
|
|
|
|
|
|
|
Mini Workshop1 Resilience Room 1030 Chair: Yves Robert |
|
|
|
|
|
|
| 13:30 | Leonardo | ANL | Joint-result | ||
| 14:00 | Tatiana Martsinkevich | INRIA | Joint-result | On the feasibility of message logging in hybrid hierarchical FT protocols | |
| 14:30 | Mohamed Slim Bouguera | INRIA | Joint-result, submitted |
Failure prediction: what to do with unpredicted failures ? | |
| 15:00 | Ana Gainaru | UIUC | Joint-result, submitted | Topology and behaviour aware failure prediction for Blue Waters. | |
| 15:30 | Break |
|
|
|
|
| 16:00 | Sheng Di | INRIA | Joint-result, submitted |
Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications | |
| 16:30 | Yves Robert | INRIA |
| Assessing the impact of ABFT & Checkpoint composite strategies |
|
| 17h00 | Weslay Bland | ANL |
| Fault Tolerant Runtime Research at ANL | |
| 17H30 | Adjourn |
|
|
|
|
| 19:00 | Bus for Diner |
|
|
|
|
Mini Workshop2 Numerical Agorithms Room 1040 Chair: Bill Gropp |
|
|
|
|
|
|
| 13:30 | Luke Olson | UIUC |
| ||
14:00 | Prasanna Balaprakash | ANL | Active-Learning-based Surrogate Models for Empirical Performance Tuning | |||
| 14:30 | Yushan Wang | INRIA |
| Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems. | |
| 15:00 | Jed Brown | ANL |
| Fast solvers for implicit Runge-Kutta systems | |
| 15:30 | Break |
|
|
|
|
| 16:00 | Pierre Jolivet | INRIA | Best Paper nomiee, IEEE, ACM SC13 | Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems | |
16:30 | Vincent Baudoui | Total&ANL | Round-off error propagation and non-determinism in parallel applications | |||
17:00 | TBD | TBD | ||||
| 17:30 | Adjourn |
|
|
|
|
| 19:00 | Bus for diner |
|
|
|
|
|
|
|
|
|
|
|
Workshop Day 3 | Wednesday Nov. 27 |
|
|
|
|
|
|
|
|
|
|
|
|
Mini Workshop3 |
|
|
|
|
|
|
Programming models, compilation and runtime. Room 1030 Chair: Marc Snir | 08:30 | Grigori Fursin | INRIA |
| | |
| 09:00 | Maria Garzaran | UIUC |
| Optimization by Run-time Specialization for Sparse Matrix-Vector Multiplication | |
09:30 | Jean-François Mehaut | INRIA |
| From Multicores to Manycores Processors: Challenging Programming Issues with the MPPA/KALRAY | ||
10:00 | Break | |||||
| 10:30 | Frederic Vivien | INRIA |
| Scheduling tree-shaped task graphs to minimize memory and makespan | |
| 11:00 | Rafael Tesser | INRIA | Joint result PDP 2013 | ||
| 11:30 | Emmanuel Jeannot | INRIA | Joint-result, IEEE Cluster2013 | Communication and Topology-aware Load Balancing in Charm++ with TreeMatch | |
| 12:00 | Closing |
|
|
| |
| 12:30 | Lunch |
|
|
|
|
| 18:00 | Bus for diner |
|
|
|
|
Mini Workshop4 Large scale systems and their simulators Room 1040 Chair: Bill Kramer |
|
|
|
|
|
|
08:30 | Sanjay Kale |
|
| |||
| 09:00 | Arnault Legrand |
|
| SMPI: Toward Better Simulation of MPI Applications |
|
09:30 | Kate Kahey |
|
|
| ||
| 10:00 | Break |
|
|
|
|
10:30 | Gille Fedak |
|
| |||
| 11:00 | Jeremy Henos |
|
| Application Runtime Consistency and Performance Challenges on a shared 3D torus. | |
| 11:30 | TBD |
|
|
| |
Auditorium 1122 | 12:00 | Closing |
|
|
|
|
| 12:30 | Lunch |
|
|
|
|
18:00 | Bus for diner |
...
Round-off errors coming from numerical calculation finite precision can lead to catastrophic losses in significant numbers when they accumulate. Their propagation throughout a computation needs to be studied in order to ensure results accuracy. We present a round-off error estimation method based on first order derivatives that can help following error propagation in an execution graph and identifying the sensitive sections of a code. It has been experimented on well known LU decomposition algorithms. In a second part, we focus on the effects of non-determinism in parallel applications where messages exchanged between processes are received in random order, possibly leading to different round-off error accumulations and subsequently to different results at each execution. We study the impact of this non-reproducibility on the convergence of stencil computations after a failure and recovery event.
Jeremy Enos
Application Runtime Consistency and Performance Challenges on a shared 3D torus.
Early testing on Blue Waters revealed varied performance for some applications making required walltimes unpredictable. Many potential causes were investigated, ultimately indicating that poor placement on to compute resources within the 3D torus network was a chief aggravating factor. Multiple thrusts of effort were launched to improve both application performance and consistency; a long term topology-aware placement development plan, improved high speed network monitoring, and immediate "stop gap" measures available within already existing tools and methods.
Ana Gainaru
Topology and behaviour aware failure prediction for Blue Waters.
Failure prediction has made substantial progress in the last 5 years and current studies have shown that failure avoidance techniques could give high benefits when combined with classical fault tolerance protocols. Understanding the properties of a prediction module and exploiting them for enhancing fault tolerance approaches and scheduling decisions is crucial for providing scalable solutions to deal with failures on future HPC systems.
Recently, we have presented a novel methodology for truly online failure prediction for the Blue Water system. In this talk we described the main bottlenecks and limitations faced in applying failure prediction on a petascale system and proposed a couple of solutions by using topology-level information.
Moreover, we will show that on a real system, system failures are not very frequently translated into application failures. We will present how this is influencing application level failure prediction and future system performance degradation analysis.