Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Main Topics

Schedule

            Speaker

Affiliation

Type of presentation

Title (tentative)

Download

 

Sunday June 8th

 

 

 

 

 

Dinner Before the Workshop

7:30 PM

Only people registered for the dinner (included)

 

 

Mercure Hotel

 

 

 

 

 

 

 

 

Workshop Day 1

Monday June 9th

 

 

 

 

 

 

 

 

 

 

TITLES ARE TEMPORARY (except if in bold font)

 

Registration

08:00

At Inria Sophia Antipolis

 

 

 

 

Welcome and Introduction

Amphitheatre


08:30

Franck Cappello + Marc Snir + Yves Robert + Bill Kramer + Jesus Labarta

INRIA&UIUC&ANL&BSC

Background

Welcome, Workshop objectives and organization

 

Plenary

Amphitheatre

Chair: Franck Cappello

09:00

Jesus Labarta

BSC

Background

Presentation of BSC activities

 

Mini Workshop

Math app.

Room 1

      
Chair: Paul Hovland09:30Bill GroppUIUC  Advancing Toward Exascale: Some Results and Opportunities 
 10:00Jed BrownANL Next-generation multigridding: adaptivity and communication avoidance 

 

10:30

Break

 

 

 

 


11:00

Ian Masliah

Inria

 

Automatic generation of dense linear system solvers on CPU/GPU architectures

 

 11:30Luke OlsonUIUC 

Reducing Complexity in Algebraic Solvers

 
 12:00Lunch    

Chair: Bill Gropp

13:30

Vincent Baudoui

Inria

 

Round-off error propagation in large-scale applications

 

 

14:00

Paul Hovland

ANL

  

Checkpointing with Multiple Goals

 

 14:30Stephane LanteriInria C2S@Exa: a multi-disciplinary initiative for high performance computing in computational sciences 

Mini Workshop

I/O and BigData

Room 1
      
Chair: Rob Ross15:00Wolfgang FringsJSC   
 15:30Break    


16:00

Jonathan JenkinsANL

 

Towards Simulating Extreme-scale Distributed Systems

 


16:30

Matthieu Dorier

Inria

 

Omnisc'IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction


 17:00

Dave Mattson  

Kenton Guadron McHenry,

NCSA The NCSA Image and Spatial Data Analysis Division 
 17:30Adjourn    
 

18:30

Bus for dinner (dinner included)

    
       

Mini Workshop

Runtime

Room 2

 

 

 

 

 

 

Chair: Jesus Labarta

9:30

Pavan Balaji

ANL

 


 
 10:00Augustin DegommeInria Status Report on the Simulation of MPI Applications with SMPI/SimGrid  

 

10:30

Break

 

 

 

 

 

11:00

Ronak BuchUIUC

 



 

11:30

Victor Lopez

BSC

 

 

 

 12:00Lunch    
Chair: Rajeev Thakur
13:30Xin ZhaoANL   
 14:00Brice VideauInria   
 14:30Pieter BellensBSC   
 15:00Martin Quinson and Luka NusbaumInria Evaluating exascale HPC runtimes through emulation with Distem 
 15:30Break    
Chair: Sanjay Kale
16:00

Francois Tessier

Inria

   
 16:30Jean-François MehaudInria   
 17:00Juan GonzálezInria Performance Analytics: Understanding Parallel Applications using Cluster Analysis and Sequence Analysis. 
 17:30Adjourn    
 

18:30

Bus for dinner (dinner included)

    
       

Workshop Day 2


Tuesday June 10th

     
       

Formal opening

Amphitheatre

Chair: Bill Kramer

08:30

Marc Snir + Franck Cappello

INRIA&UIUC&ANL

Background


 

 

08:40

Claude Kirchner

Inria

Background

Inria updates and vision of the collaboration

TBD

 

08:50

Marc Snir

ANL

Background

ANL updates vision of the collaboration

TBD

Plenary

Amphitheatre

09:00

Wolfgan Frings

JSC

Background

JSC activities in HPC

TBD

Mini Workshop

I/O and Big Data

Room 1
      

Chair: Gabriel Antoniu

09:30

Rob Ross

ANL

 

Understanding and Reproducing I/O Workloads

 

 

10:00

Guillaume AupyInria

 

Scheduling the I/O of HPC applications under congestion


 10:30Break    
 11:00Lokman RahmaniInria   

 

11:30

Anthony Simonet

Inria

 

Using Active Data to Provide Smart Data Surveillance to E-Science Users

 

 

12:00

Lunch

 

 

 

 

Mini Workshop

Runtime

Room 2
      
Chair: Jean François Mehaud09:30Sanjay KaleUIUC Temperature, Power and Energy: How an Adaptive Runtime can optimize them 
 10:00Florentino SainzBSC DEEP Collective offload 
 10:30BreakInria   
 11:00Arnaud LegrandInria Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures 
 11:30Grigori FursinInria   
 12:00Lunch    

Formal encouragments

Amphitheatre

Chair: Franck Cappello

13:45Ed SeidelUIUCBackgroundNCSA updates and vision of the collaboration 

Plenary

Amphitheatre

Chair: Wolfgan Frings

14:00Yves RobertInria   
 14:30Marc SnirANL   
 15:00Break    

Mini Workshop

Resilience

Room 1
      
Chair: Franck Cappello15:30Luc JaulmesBSC Checkpointless exact recovery techniques for Krylov-based iterative methods 
 16:00Ana GainaruUIUC   
 16:30Tatiana MartsinkevichInria   
 17:00Adjourn    

Mini Workshop

Cloud & Cyber-infrastructure

Room 2
      
Chair: Kate Keahey15:30Justin WozniakANL   
 16:00Shaowen WangUIUC CyberGIS @ Scale 
 16:30Christine MorinInria   
 17:00Adjourn    

 

18:30

Bus for Dinner (dinner included)

 

 

 

 

       

Workshop Day 3


Wednesday June 11th

 

 

 

 

 

Plenary

Amphitheatre

Chair: Jesus Labarta

8:30

Bill Kramer

NCSA

 

Blue Waters - A year of results and insights

 

Mini Workshop

Resilience

Room 1
      
Chair: Yves Robert9:00Leonardo Bautista GomezANL   
 9:30Slim BougeraInria   
 10:00Break    
 10:30Vincent BaudouiANL Round-off error propagation in large-scale applications 

Plenary

Amphitheatre

11:00Closing    
 12:00Lunch (included)    

Mini Workshop

Cloud & Cyber-infrastructure

Room 2
      
Chair: Christine Morin09:00Kate KeaheyANL   
 09:30Radu TudoranInria JetStream: Enabling High Performance Event Streaming across Cloud Data-Centers 
 10:00Break    
 10:30

Timothy Armstrong

 

ANL Towards Dynamic Dataflow Composition for Extreme-Scale Applications with Heterogeneous Tasks 

Plenary

Amphitheatre

11:00Closing    
 12:00Lunch (included)    

...

 
Vincent Baudoui
 
Round-off error propagation in large-scale applications
 
Round-off errors coming from numerical calculation finite precision can lead to catastrophic losses in significant numbers when they accumulate. They will become more and more overriding in the future as the problem size increases with the refinement of numerical simulations. Existing analytical bounds for round-off errors are known to be poorly scalable and they become quite useless for large problems. That is why the propagation of round-off errors throughout a computation needs to be better understood in order to ensure large-scale application results accuracy. We study here a round-off error estimation method based on first order derivatives computed thanks to algorithmic differentiation techniques. It can help following the error propagation through a computational graph and identifying the sensitive sections of a code. It has been experimented on well known LU decomposition algorithms that are widely used to solve linear systems. We will present some examples as well as challenges that need to be tackled as part of future research work in order to set up a strategy to analyze round-off error propagation in large-scale problems.

 

Luc Jaulmes

Checkpointless exact recovery techniques for Krylov-based iterative methods

 By exploiting inherent redundancy in iterative solvers, especially Krylov-subspace methods, we can recover from non-silent errors in data without reverting to techniques like checkpointing. We implemented this recovery scheme for the Conjugate Gradient (CG) and its Preconditioned variant (PCG) and show near-zero overheads without faults, and fast recoveries that preserve all convergence properties of the solver. Using the asynchronous task-based programming model OmpSs, these overheads are even further minimized.

 

Lokman Rahmani

Smart In Situ Visualization for Climate Simulations  
The increasing gap between computational power and I/O performance in new supercomputers has started to drive a shift from an offline approach to data analysis to an inline approach, termed in situ visualization (ISV). While most visualization software now provides ISV, they typically visualize large dumps of unstructured data, by rendering everything at the highest possible resolution. This often negatively impacts the performance of simulations that support ISV, in particular when ISV is performed interactively, as in situ visualization requires synchronization with the simulation. In this work, we advocate for a smarter method of performing ISV. Our approach is data-driven: it aims to detect potentially interesting regions in the generated dataset in order to feed ISV frameworks with “the interesting” subset of the data produced by the simulation. While this method mitigates the load on ISV frameworks by making them more efficient and more interactive, it also helps scientists focus on the relevant part of their data. We investigate smart ISV in the context of a climate simulation, with a set of generic filters derived from information theory, statistics and image processing, and show the tradeoff between performance and quality of visualization.

 

Lucas Nussbaum

Evaluating exascale HPC runtimes through emulation with Distem

  The Exascale era will require the HPC software stack to face important challenges such as platform hetereogeneity and evolution during execution, or reliability issues. We propose a framework to evaluate key aspects of a central part of this software stack: the HPC runtimes.  Starting from Distem, which is a versatile emulator for studying distributed systems, we designed an emulator suitable for the evaluation of HPC runtimes, enabling specifically: (1) emulation of a very large scale platform on top of a regular cluster; (2) introduction of heterogeneity and dynamic imbalance among the computing resources; (3) introduction of failures. Those features provide runtime designers with the ability to experiment their prototypes under a large range of conditions, to discover performance gaps, understand future bottlenecks, and evaluate fault tolerance and load balancing mechanisms. We validate the usefulness of this approach with experiments on two HPC runtimes: Charm++ and OpenMPI. 


Sanjay Kale

Temperature, Power and Energy: How an Adaptive Runtime can optimize them.


 

Jonathan Jenkins

Towards Simulating Extreme-scale Distributed Systems

 Simulating future extreme-scale parallel/distributed systems can be an important component in understanding these systems at a scale at which prototyping cannot feasibly reach. For HPC, big-data/cloud, or other computing/analysis platforms, the design decisions for developing systems that scale beyond current-generation systems are multi-dimensional in nature. For example, these decisions encompass distributed storage software/hardware solutions, network topologies within and between computing centers, algorithms for data analysis and compute services in heterogeneous software/hardware environments, etc., each of which can potentially be rich targets for exploring via a simulation-based approach. This talk will examine our ongoing work in developing a simulation model framework using parallel discrete event simulation to examine various design aspects of extreme-scale distributed systems. As an exemplar, simulation of protocols used in distributed storage systems will be examined in detail.

 

Timothy Armstrong

Towards Dynamic Dataflow Composition for Extreme-Scale Applications with Heterogeneous Tasks
Parallel applications are increasingly built from heterogeneous software components that use diverse programming models, such as message-passing, threads, CUDA, and OpenCL on heterogeneous hardware resources such as CPUs and GPUS.  Getting these components to interoperate is a challenge in itself, which is further complicated by complex cross-cutting concerns such as scheduling, overlapping of communication and computation, fault-tolerance, and energy efficiency. Parallel execution models offer the hope of making these challenges more managable for application programmers by unifying heterogeneous components into a more uniform framework.  One such model is data-driven task parallelism, in which massive numbers of tasks are dynamically assigned to compute resources and communication and synchronization is based on explicit data dependencies.  Swift is a high-level scripting
language that provides a simple yet powerful way of expressing data-driven task parallelism.  This talk discusses our current progress and future challenges on a compiler and runtime system that allows Swift to scale to hundreds of thousands of cores.

 

Juan González

Performance Analytics: Understanding Parallel Applications using Cluster Analysis and Sequence Analysis.
Due to the increasing complexity of HPC systems and applications it is strictly necessary to maximize the insight of the performance data extracted from an application execution. This is the mission of the Performance Analytics field. In this talk we introduce two Performance Analytics techniques. First, we demonstrate how it is possible to capture the computation structure of parallel applications at fine grain by using density-based cluster algorithms. Second, we introduce the use of multiple sequence alignment algorithms to asses the quality of this computation structure."


Jed Brown
Next-generation multigridding: adaptivity and communication avoidance
 

An alternate interpretation of the Full Approximation Scheme (FAS) multigrid method creates relationships between levels that can be exploited to eliminate communication on fine grids, avoid storage of fine grids, avoid "visiting" fine grids away from active nonlinearities, accelerate recomputation from checkpoints, and use fine-to-coarse compatibility to check for silent data corruption in fine grid state. This talk will present the algorithmic structure, new results with ultra-low-communication parallel multigrid, and directions for future research.

 

Luke Olson
 

Reducing Complexity in Algebraic Solvers

 Algebraic multigrid solvers can be designed to handle a large range of problem types, yielding high convergence with minimal tuning of parameters.  Yet, in many situations these robust methods also yield complexities in the sparse matrix cycling that inhibits performance, particularly in parallel.  The multigrid solution cycle is modeled effectively through the structure of the sparse matrices in the multigrid hierarchy.  In this talk, we highlight a couple of recent strategies that target reducing the solver complexity (particularly in parallel) while attempting to retain the convergence of the iterative solver. 

The coarse-level sparse matrices operations are defined through the Galerkin product, R A P — i.e., restriction, operator, and interpolation.  Consequently, we look at two methods that reduce this complexity: an approach that filters P  and a method that builds a coarse level through a non-Galerkin construction.  To this end we first introduce a root-node based approach to multigrid, which  can be viewed as a hybrid of classical and aggregation based multigrid methods.  We give an overview and show how the complexity and convergence of the multigrid cycle can be controlled through selective filtering in a root-node setting.  In addition, we look at a non-Galerkin algebraic framework where we are able to model the performance and note the performance gains in selectively filtering coarse-level operators.

 

Vincent Baudoui

Round-off error propagation in large-scale applications

Round-off errors coming from numerical calculation finite precision can lead to catastrophic losses in significant numbers when they accumulate. They will become more and more overriding in the future as the problem size increases with the refinement of numerical simulations. Existing analytical bounds for round-off errors are known to be poorly scalable and they become quite useless for large problems. That is why the propagation of round-off errors throughout a computation needs to be better understood in order to ensure large-scale application results accuracy. We study here a round-off error estimation method based on first order derivatives computed thanks to algorithmic differentiation techniques. It can help following the error propagation through a computational graph and identifying the sensitive sections of a code. It has been experimented on well known LU decomposition algorithms that are widely used to solve linear systems. We will present some examples as well as challenges that need to be tackled as part of future research work in order to set up a strategy to analyze round-off error propagation in large-scale problems.

 

 Paul Hovland
Checkpointing with Multiple Goals

Bill Gropp
Advancing Toward Exascale: Some Results and Opportunities
 

In this talk, I will discuss some results in addressing problems in extreme scale computing that came about from collaborations within the Joint Laboratory on Petascale Computing.  I will follow that with a summary of some of my ongoing research projects and challenges that are addressing some of the problems of extreme scale computing, and close with some suggestions for future collaborations.