...
Main Topics | Schedule | Speaker | Affiliation | Type of presentation | Title (tentative) | Download |
| Sunday June 8th |
|
|
|
|
|
Dinner Before the Workshop | 7:30 PM | Only people registered for the dinner (included) |
|
| Mercure Hotel |
|
|
|
|
|
|
|
|
Workshop Day 1 | Monday June 9th |
|
|
|
|
|
|
|
|
|
| TITLES ARE TEMPORARY (except if in bold font) |
|
Registration | 08:00 | At Inria Sophia Antipolis |
|
|
|
|
Welcome and Introduction Amphitheatre | 08:30 | Franck Cappello + Marc Snir + Yves Robert + Bill Kramer + Jesus Labarta | INRIA&UIUC&ANL&BSC | Background | Welcome, Workshop objectives and organization | |
Plenary Amphitheatre Chair: Franck Cappello | 09:00 | Jesus Labarta | BSC | Background | Presentation of BSC activities |
|
Mini Workshop Math app. Room 1 | ||||||
Chair: Paul Hovland | 09:30 | Bill Gropp | UIUC | Advancing Toward Exascale: Some Results and Opportunities | ||
10:00 | Jed Brown | ANL | Next-generation multigridding: adaptivity and communication avoidance | |||
| 10:30 | Break |
|
|
|
|
11:00 | Ian Masliah | Inria |
| Automatic generation of dense linear system solvers on CPU/GPU architectures |
| |
11:30 | Luke Olson | UIUC | Reducing Complexity in Algebraic Solvers | |||
12:00 | Lunch | |||||
Chair: Bill Gropp | 13:30 | Vincent Baudoui | Inria |
| Round-off error propagation in large-scale applications |
|
| 14:00 | Paul Hovland | ANL |
| Checkpointing with Multiple Goals |
|
14:30 | Stephane Lanteri | Inria | C2S@Exa: a multi-disciplinary initiative for high performance computing in computational sciences | |||
Mini Workshop I/O and BigData Room 1 | ||||||
Chair: Rob Ross | 15:00 | Wolfgang Frings | JSC | |||
15:30 | Break | |||||
16:00 | Jonathan Jenkins | ANL |
| Towards Simulating Extreme-scale Distributed Systems |
| |
16:30 | Matthieu Dorier | Inria | Omnisc'IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction | |||
17:00 | Dave Mattson Kenton Guadron McHenry, | NCSA | The NCSA Image and Spatial Data Analysis Division | |||
17:30 | Adjourn | |||||
18:30 | Bus for dinner (dinner included) | |||||
Mini Workshop Runtime Room 2 |
|
|
|
|
|
|
Chair: Jesus Labarta | 9:30 | Pavan Balaji | ANL | |||
10:00 | Augustin Degomme | Inria | Status Report on the Simulation of MPI Applications with SMPI/SimGrid | |||
| 10:30 | Break |
|
|
|
|
| 11:00 | Ronak Buch | UIUC |
| ||
| 11:30 | Victor Lopez | BSC |
|
|
|
12:00 | Lunch | |||||
Chair: Rajeev Thakur | 13:30 | Xin Zhao | ANL | |||
14:00 | Brice Videau | Inria | ||||
14:30 | Pieter Bellens | BSC | ||||
15:00 | Martin Quinson and Luka Nusbaum | Inria | Evaluating exascale HPC runtimes through emulation with Distem | |||
15:30 | Break | |||||
Chair: Sanjay Kale | 16:00 | Francois Tessier | Inria | |||
16:30 | Jean-François Mehaud | Inria | ||||
17:00 | Juan González | Inria | Performance Analytics: Understanding Parallel Applications using Cluster Analysis and Sequence Analysis. | |||
17:30 | Adjourn | |||||
18:30 | Bus for dinner (dinner included) | |||||
Workshop Day 2 | Tuesday June 10th | |||||
Formal opening Amphitheatre Chair: Bill Kramer | 08:30 | Marc Snir + Franck Cappello | INRIA&UIUC&ANL | Background | ||
| 08:40 | Claude Kirchner | Inria | Background | Inria updates and vision of the collaboration | TBD |
| 08:50 | Marc Snir | ANL | Background | ANL updates vision of the collaboration | TBD |
Plenary Amphitheatre | 09:00 | Wolfgan Frings | JSC | Background | JSC activities in HPC | TBD |
Mini Workshop I/O and Big Data Room 1 | ||||||
Chair: Gabriel Antoniu | 09:30 | Rob Ross | ANL |
| Understanding and Reproducing I/O Workloads |
|
| 10:00 | Guillaume Aupy | Inria |
| Scheduling the I/O of HPC applications under congestion | |
10:30 | Break | |||||
11:00 | Lokman Rahmani | Inria | ||||
| 11:30 | Anthony Simonet | Inria |
| Using Active Data to Provide Smart Data Surveillance to E-Science Users |
|
| 12:00 | Lunch |
|
|
|
|
Mini Workshop Runtime | ||||||
Chair: Jean François Mehaud | 09:30 | Sanjay Kale | UIUC | Temperature, Power and Energy: How an Adaptive Runtime can optimize them | ||
10:00 | Florentino Sainz | BSC | DEEP Collective offload | |||
10:30 | Break | Inria | ||||
11:00 | Arnaud Legrand | Inria | Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures | |||
11:30 | Grigori Fursin | Inria | ||||
12:00 | Lunch | |||||
Formal encouragments Amphitheatre Chair: Franck Cappello | 13:45 | Ed Seidel | UIUC | Background | NCSA updates and vision of the collaboration | |
Plenary Amphitheatre Chair: Wolfgan Frings | 14:00 | Yves Robert | Inria | |||
14:30 | Marc Snir | ANL | ||||
15:00 | Break | |||||
Mini Workshop Resilience | ||||||
Chair: Franck Cappello | 15:30 | Luc Jaulmes | BSC | Checkpointless exact recovery techniques for Krylov-based iterative methods | ||
16:00 | Ana Gainaru | UIUC | ||||
16:30 | Tatiana Martsinkevich | Inria | ||||
17:00 | Adjourn | |||||
Mini Workshop Cloud & Cyber-infrastructure Room 2 | ||||||
Chair: Kate Keahey | 15:30 | Justin Wozniak | ANL | |||
16:00 | Shaowen Wang | UIUC | CyberGIS @ Scale | |||
16:30 | Christine Morin | Inria | ||||
17:00 | Adjourn | |||||
| 18:30 | Bus for Dinner (dinner included) |
|
|
|
|
Workshop Day 3 | Wednesday June 11th |
|
|
|
|
|
Plenary Amphitheatre Chair: Jesus Labarta | 8:30 | Bill Kramer | NCSA |
| Blue Waters - A year of results and insights |
|
Mini Workshop Resilience | ||||||
Chair: Yves Robert | 9:00 | Leonardo Bautista Gomez | ANL | |||
9:30 | Slim Bougera | Inria | ||||
10:00 | Break | |||||
10:30 | Vincent Baudoui | ANL | Round-off error propagation in large-scale applications | |||
Plenary Amphitheatre | 11:00 | Closing | ||||
12:00 | Lunch (included) | |||||
Mini Workshop Cloud & Cyber-infrastructure | ||||||
Chair: Christine Morin | 09:00 | Kate Keahey | ANL | |||
09:30 | Radu Tudoran | Inria | JetStream: Enabling High Performance Event Streaming across Cloud Data-Centers | |||
10:00 | Break | |||||
10:30 | Timothy Armstrong
| ANL | Towards Dynamic Dataflow Composition for Extreme-Scale Applications with Heterogeneous Tasks | |||
Plenary Amphitheatre | 11:00 | Closing | ||||
12:00 | Lunch (included) |
...
Luc Jaulmes
Checkpointless exact recovery techniques for Krylov-based iterative methods
Lokman Rahmani
Smart In Situ Visualization for Climate SimulationsLucas Nussbaum
Evaluating exascale HPC runtimes through emulation with Distem
The Exascale era will require the HPC software stack to face important challenges such as platform hetereogeneity and evolution during execution, or reliability issues. We propose a framework to evaluate key aspects of a central part of this software stack: the HPC runtimes. Starting from Distem, which is a versatile emulator for studying distributed systems, we designed an emulator suitable for the evaluation of HPC runtimes, enabling specifically: (1) emulation of a very large scale platform on top of a regular cluster; (2) introduction of heterogeneity and dynamic imbalance among the computing resources; (3) introduction of failures. Those features provide runtime designers with the ability to experiment their prototypes under a large range of conditions, to discover performance gaps, understand future bottlenecks, and evaluate fault tolerance and load balancing mechanisms. We validate the usefulness of this approach with experiments on two HPC runtimes: Charm++ and OpenMPI.Sanjay Kale
Temperature, Power and Energy: How an Adaptive Runtime can optimize them.
Jonathan Jenkins
Towards Simulating Extreme-scale Distributed Systems
Simulating future extreme-scale parallel/distributed systems can be an important component in understanding these systems at a scale at which prototyping cannot feasibly reach. For HPC, big-data/cloud, or other computing/analysis platforms, the design decisions for developing systems that scale beyond current-generation systems are multi-dimensional in nature. For example, these decisions encompass distributed storage software/hardware solutions, network topologies within and between computing centers, algorithms for data analysis and compute services in heterogeneous software/hardware environments, etc., each of which can potentially be rich targets for exploring via a simulation-based approach. This talk will examine our ongoing work in developing a simulation model framework using parallel discrete event simulation to examine various design aspects of extreme-scale distributed systems. As an exemplar, simulation of protocols used in distributed storage systems will be examined in detail.
Timothy Armstrong
Towards Dynamic Dataflow Composition for Extreme-Scale Applications with Heterogeneous Tasks
Parallel applications are increasingly built from heterogeneous software components that use diverse programming models, such as message-passing, threads, CUDA, and OpenCL on heterogeneous hardware resources such as CPUs and GPUS. Getting these components to interoperate is a challenge in itself, which is further complicated by complex cross-cutting concerns such as scheduling, overlapping of communication and computation, fault-tolerance, and energy efficiency. Parallel execution models offer the hope of making these challenges more managable for application programmers by unifying heterogeneous components into a more uniform framework. One such model is data-driven task parallelism, in which massive numbers of tasks are dynamically assigned to compute resources and communication and synchronization is based on explicit data dependencies. Swift is a high-level scripting
language that provides a simple yet powerful way of expressing data-driven task parallelism. This talk discusses our current progress and future challenges on a compiler and runtime system that allows Swift to scale to hundreds of thousands of cores.
Performance Analytics: Understanding Parallel Applications using Cluster Analysis and Sequence Analysis.
Due to the increasing complexity of HPC systems and applications it is strictly necessary to maximize the insight of the performance data extracted from an application execution. This is the mission of the Performance Analytics field. In this talk we introduce two Performance Analytics techniques. First, we demonstrate how it is possible to capture the computation structure of parallel applications at fine grain by using density-based cluster algorithms. Second, we introduce the use of multiple sequence alignment algorithms to asses the quality of this computation structure."
An alternate interpretation of the Full Approximation Scheme (FAS) multigrid method creates relationships between levels that can be exploited to eliminate communication on fine grids, avoid storage of fine grids, avoid "visiting" fine grids away from active nonlinearities, accelerate recomputation from checkpoints, and use fine-to-coarse compatibility to check for silent data corruption in fine grid state. This talk will present the algorithmic structure, new results with ultra-low-communication parallel multigrid, and directions for future research.
Reducing Complexity in Algebraic Solvers
Algebraic multigrid solvers can be designed to handle a large range of problem types, yielding high convergence with minimal tuning of parameters. Yet, in many situations these robust methods also yield complexities in the sparse matrix cycling that inhibits performance, particularly in parallel. The multigrid solution cycle is modeled effectively through the structure of the sparse matrices in the multigrid hierarchy. In this talk, we highlight a couple of recent strategies that target reducing the solver complexity (particularly in parallel) while attempting to retain the convergence of the iterative solver.The coarse-level sparse matrices operations are defined through the Galerkin product, R A P — i.e., restriction, operator, and interpolation. Consequently, we look at two methods that reduce this complexity: an approach that filters P and a method that builds a coarse level through a non-Galerkin construction. To this end we first introduce a root-node based approach to multigrid, which can be viewed as a hybrid of classical and aggregation based multigrid methods. We give an overview and show how the complexity and convergence of the multigrid cycle can be controlled through selective filtering in a root-node setting. In addition, we look at a non-Galerkin algebraic framework where we are able to model the performance and note the performance gains in selectively filtering coarse-level operators.
Vincent Baudoui
Round-off error propagation in large-scale applications
Round-off errors coming from numerical calculation finite precision can lead to catastrophic losses in significant numbers when they accumulate. They will become more and more overriding in the future as the problem size increases with the refinement of numerical simulations. Existing analytical bounds for round-off errors are known to be poorly scalable and they become quite useless for large problems. That is why the propagation of round-off errors throughout a computation needs to be better understood in order to ensure large-scale application results accuracy. We study here a round-off error estimation method based on first order derivatives computed thanks to algorithmic differentiation techniques. It can help following the error propagation through a computational graph and identifying the sensitive sections of a code. It has been experimented on well known LU decomposition algorithms that are widely used to solve linear systems. We will present some examples as well as challenges that need to be tackled as part of future research work in order to set up a strategy to analyze round-off error propagation in large-scale problems.
In this talk, I will discuss some results in addressing problems in extreme scale computing that came about from collaborations within the Joint Laboratory on Petascale Computing. I will follow that with a summary of some of my ongoing research projects and challenges that are addressing some of the problems of extreme scale computing, and close with some suggestions for future collaborations.