...
Main Topics | Schedule | Speaker | Affiliation | Type of presentation | Title (tentative) | Download |
| Sunday June 8th |
|
|
|
|
|
Dinner Before the Workshop | 7:30 PM | Only people registered for the dinner (included) |
|
| Mercure Hotel |
|
|
|
|
|
|
|
|
Workshop Day 1 | Monday June 9th |
|
|
|
|
|
|
|
|
|
| TITLES ARE TEMPORARY (except if in bold font) |
|
Registration | 08:00 | At Inria Sophia Antipolis |
|
|
|
|
Welcome and Introduction Amphitheatre | 08:30 | Franck Cappello + Marc Snir + Yves Robert + Bill Kramer + Jesus Labarta | INRIA&UIUC&ANL&BSC | Background | Welcome, Workshop objectives and organization | |
Plenary Amphitheatre Chair: Franck Cappello | 09:00 | Jesus Labarta | BSC | Background | Presentation of BSC activities |
|
Mini Workshop Applied Maths. Amphitheatre | ||||||
Chair: Paul Hovland | 09:30 | Bill Gropp | UIUC | Advancing Toward Exascale: Some Results and Opportunities | ||
10:00 | Jed Brown | ANL | Next-generation multigridding: adaptivity and communication avoidance | |||
| 10:30 | Break |
|
|
|
|
11:00 | Ian Masliah | Inria |
| Automatic generation of dense linear system solvers on CPU/GPU architectures |
| |
11:30 | Luke Olson | UIUC | Reducing Complexity in Algebraic Solvers | |||
12:00 | Lunch | |||||
Chair: Bill Gropp | 13:30 | Vincent Baudoui | Inria |
| Round-off error propagation in large-scale applications |
|
| 14:00 | Paul Hovland | ANL |
| Checkpointing with Multiple Goals |
|
14:30 | Stephane Lanteri | Inria | C2S@Exa: a multi-disciplinary initiative for high performance computing in computational sciences | |||
Mini Workshop I/O and BigData Amphitheatre | ||||||
Chair: Rob Ross | 15:00 | Wolfgang Frings | JSC | HPC I/O at Large Scale with SIONlib and Spindle | ||
15:30 | Break | |||||
16:00 | Jonathan Jenkins | ANL |
| Towards Simulating Extreme-scale Distributed Systems |
| |
16:30 | Matthieu Dorier | Inria | Omnisc'IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction | |||
17:00 | Dave Mattson Kenton Guadron McHenry, | NCSA | The NCSA Image and Spatial Data Analysis Division | |||
17:30 | Adjourn | |||||
18:30 | Bus for dinner (dinner included) | |||||
Mini Workshop Runtime Room Gilles Kahn |
|
|
|
|
|
|
Chair: Jesus Labarta | 9:30 | Pavan Balaji | ANL | VOCL: A Virtualization Infrastructure for Accelerators | ||
10:00 | Augustin Degomme and Arnaud Legrand | Inria | Status Report on the Simulation of MPI Applications with SMPI/SimGrid | |||
| 10:30 | Break |
|
|
|
|
| 11:00 | Ronak Buch | UIUC |
| Advanced Techniques in Parallel Performance Analysis | |
| 11:30 | Victor Lopez | BSC |
| DLB: Dynamic Load Balancing Library |
|
12:00 | Lunch | |||||
Chair: Rajeev Thakur | 13:30 | Xin Zhao | ANL | Programming Runtime Support for Irregular Computations | ||
14:00 | Luka Stanisic and Arnaud Legrand | Inria | Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures | |||
14:30 | Pieter Bellens | BSC | Quantifying the effect of rectangular blocks in the dense QR factorization | |||
15:00 | Lucas Nussbaum | Inria | Evaluating exascale HPC runtimes through emulation with Distem | |||
15:30 | Break | |||||
Chair: Sanjay Kale | 16:00 | Francois Tessier | Inria | Distributed communication-aware load balancing with TreeMatch in Charm++ | ||
16:30 | Jean-François Mehaud | Inria | Saving Energy by Exploiting Residual Imbalance on Iterative Applications | |||
17:00 | Juan González | Inria | Performance Analytics: Understanding Parallel Applications using Cluster Analysis and Sequence Analysis. | |||
17:30 | Adjourn | |||||
18:30 | Bus for dinner (dinner included) | |||||
Workshop Day 2 | Tuesday June 10th | |||||
Formal opening Amphitheatre Chair: Bill Kramer | 08:30 | Marc Snir + Franck Cappello | INRIA&UIUC&ANL | Background | ||
| 08:40 | Claude Kirchner | Inria | Background | Inria updates and vision of the collaboration | TBD |
| 08:50 | Marc Snir | ANL | Background | ANL updates vision of the collaboration | TBD |
Plenary Amphitheatre | 09:00 | Wolfgan Frings | JSC | Background | JSC activities in HPC | TBD |
Mini Workshop I/O and Big Data Amphitheatre | ||||||
Chair: Gabriel Antoniu | 09:30 | Rob Ross | ANL |
| Understanding and Reproducing I/O Workloads |
|
| 10:00 | Guillaume Aupy | Inria |
| Scheduling the I/O of HPC applications under congestion | |
10:30 | Break | |||||
11:00 | Lokman Rahmani | Inria | Smart In Situ Visualization for Climate Simulations | |||
| 11:30 | Anthony Simonet | Inria |
| Using Active Data to Provide Smart Data Surveillance to E-Science Users |
|
| 12:00 | Lunch |
|
|
|
|
Mini Workshop Runtime | ||||||
Chair: Jean François Mehaut | 09:30 | Sanjay Kale | UIUC | Temperature, Power and Energy: How an Adaptive Runtime can optimize them | ||
10:00 | Florentino Sainz | BSC | DEEP Collective offload | |||
10:30 | Break | Inria | ||||
11:00 | Brice Videau | Inria | Porting HPC applications to the Mont-Blanc prototype using BOAST | |||
11:30 | Grigori Fursin | Inria | Collective Mind: bringing reproducible research to the masses | |||
12:00 | Lunch | |||||
Plenary Amphitheatre Chair: Franck Cappello | 13:45 | Ed Seidel | UIUC | Background | NCSA updates and vision of the collaboration | |
Plenary Amphitheatre Chair: Wolfgan Frings | 14:00 | Yves Robert | Inria | Algorithms for coping with silent errors | ||
14:30 | Marc Snir | ANL | Runtime and OS research at DoE | |||
15:00 | Break | |||||
Mini Workshop Resilience | ||||||
Chair: Franck Cappello | 15:30 | Luc Jaulmes | BSC | Checkpointless exact recovery techniques for Krylov-based iterative methods | ||
16:00 | Ana Gainaru | UIUC | The road to failure prediction on Blue Waters: latest details and future directions | |||
16:30 | Tatiana Martsinkevich | Inria | Using dedicated resources to alleviate memory limitation for message logging protocols | |||
17:00 | Adjourn | |||||
Mini Workshop Cloud & Cyber-infrastructure Room Gilles Kahn | ||||||
Chair: Kate Keahey | 15:30 | Justin Wozniak | ANL | Case Studies in Big Data and HPC from X-ray Crystallography | ||
16:00 | Shaowen Wang | UIUC | CyberGIS @ Scale | |||
16:30 | Christine Morin | Inria | Contrail: Interoperability and Dependability in a Cloud Federation | |||
17:00 | Adjourn | |||||
| 18:30 | Bus for Dinner (dinner included) |
|
|
|
|
Workshop Day 3 | Wednesday June 11th |
|
|
|
|
|
Plenary Amphitheatre Chair: Marc Snir | 8:30 | Bill Kramer | NCSA |
| Blue Waters - A year of results and insights |
|
Mini Workshop Resilience | ||||||
Chair: Yves Robert | 9:00 | Leonardo Bautista Gomez | ANL | Fault Tolerance Interface new features and new developments | ||
9:30 | Slim Bougera | Inria | Energy-Performance Tradeoffs in Multilevel Checkpoint Strategies | |||
10:00 | Break | |||||
10:30 | Martin Quinson | Inria | Formal verification of unmodified MPI applications with SimGrid | |||
Plenary Amphitheatre | 11:00 | Closing | ||||
12:00 | Lunch (included) | |||||
Mini Workshop Cloud & Cyber-infrastructure | ||||||
Chair: Justin Wozniak | 09:00 | Kate Keahey | ANL | |||
09:30 | Radu Tudoran | Inria | JetStream: Enabling High Performance Event Streaming across Cloud Data-Centers | |||
10:00 | Break | |||||
10:30 | Timothy Armstrong
| ANL | Towards Dynamic Dataflow Composition for Extreme-Scale Applications with Heterogeneous Tasks | |||
Plenary Amphitheatre | 11:00 | Closing | ||||
12:00 | Lunch (included) |
...
Luc Jaulmes
Checkpointless exact recovery techniques for Krylov-based iterative methods
Lokman Rahmani
Smart In Situ Visualization for Climate SimulationsLucas Nussbaum
Evaluating exascale HPC runtimes through emulation with Distem
The Exascale era will require the HPC software stack to face important challenges such as platform hetereogeneity and evolution during execution, or reliability issues. We propose a framework to evaluate key aspects of a central part of this software stack: the HPC runtimes. Starting from Distem, which is a versatile emulator for studying distributed systems, we designed an emulator suitable for the evaluation of HPC runtimes, enabling specifically: (1) emulation of a very large scale platform on top of a regular cluster; (2) introduction of heterogeneity and dynamic imbalance among the computing resources; (3) introduction of failures. Those features provide runtime designers with the ability to experiment their prototypes under a large range of conditions, to discover performance gaps, understand future bottlenecks, and evaluate fault tolerance and load balancing mechanisms. We validate the usefulness of this approach with experiments on two HPC runtimes: Charm++ and OpenMPI.Sanjay Kale
Temperature, Power and Energy: How an Adaptive Runtime can optimize them.
Jonathan Jenkins
Towards Simulating Extreme-scale Distributed Systems
Simulating future extreme-scale parallel/distributed systems can be an important component in understanding these systems at a scale at which prototyping cannot feasibly reach. For HPC, big-data/cloud, or other computing/analysis platforms, the design decisions for developing systems that scale beyond current-generation systems are multi-dimensional in nature. For example, these decisions encompass distributed storage software/hardware solutions, network topologies within and between computing centers, algorithms for data analysis and compute services in heterogeneous software/hardware environments, etc., each of which can potentially be rich targets for exploring via a simulation-based approach. This talk will examine our ongoing work in developing a simulation model framework using parallel discrete event simulation to examine various design aspects of extreme-scale distributed systems. As an exemplar, simulation of protocols used in distributed storage systems will be examined in detail.
Timothy Armstrong
Towards Dynamic Dataflow Composition for Extreme-Scale Applications with Heterogeneous Tasks
Parallel applications are increasingly built from heterogeneous software components that use diverse programming models, such as message-passing, threads, CUDA, and OpenCL on heterogeneous hardware resources such as CPUs and GPUS. Getting these components to interoperate is a challenge in itself, which is further complicated by complex cross-cutting concerns such as scheduling, overlapping of communication and computation, fault-tolerance, and energy efficiency. Parallel execution models offer the hope of making these challenges more managable for application programmers by unifying heterogeneous components into a more uniform framework. One such model is data-driven task parallelism, in which massive numbers of tasks are dynamically assigned to compute resources and communication and synchronization is based on explicit data dependencies. Swift is a high-level scripting
language that provides a simple yet powerful way of expressing data-driven task parallelism. This talk discusses our current progress and future challenges on a compiler and runtime system that allows Swift to scale to hundreds of thousands of cores.
Performance Analytics: Understanding Parallel Applications using Cluster Analysis and Sequence Analysis.
Due to the increasing complexity of HPC systems and applications it is strictly necessary to maximize the insight of the performance data extracted from an application execution. This is the mission of the Performance Analytics field. In this talk we introduce two Performance Analytics techniques. First, we demonstrate how it is possible to capture the computation structure of parallel applications at fine grain by using density-based cluster algorithms. Second, we introduce the use of multiple sequence alignment algorithms to asses the quality of this computation structure."
An alternate interpretation of the Full Approximation Scheme (FAS) multigrid method creates relationships between levels that can be exploited to eliminate communication on fine grids, avoid storage of fine grids, avoid "visiting" fine grids away from active nonlinearities, accelerate recomputation from checkpoints, and use fine-to-coarse compatibility to check for silent data corruption in fine grid state. This talk will present the algorithmic structure, new results with ultra-low-communication parallel multigrid, and directions for future research.
Reducing Complexity in Algebraic Solvers
Algebraic multigrid solvers can be designed to handle a large range of problem types, yielding high convergence with minimal tuning of parameters. Yet, in many situations these robust methods also yield complexities in the sparse matrix cycling that inhibits performance, particularly in parallel. The multigrid solution cycle is modeled effectively through the structure of the sparse matrices in the multigrid hierarchy. In this talk, we highlight a couple of recent strategies that target reducing the solver complexity (particularly in parallel) while attempting to retain the convergence of the iterative solver.The coarse-level sparse matrices operations are defined through the Galerkin product, R A P — i.e., restriction, operator, and interpolation. Consequently, we look at two methods that reduce this complexity: an approach that filters P and a method that builds a coarse level through a non-Galerkin construction. To this end we first introduce a root-node based approach to multigrid, which can be viewed as a hybrid of classical and aggregation based multigrid methods. We give an overview and show how the complexity and convergence of the multigrid cycle can be controlled through selective filtering in a root-node setting. In addition, we look at a non-Galerkin algebraic framework where we are able to model the performance and note the performance gains in selectively filtering coarse-level operators.
Vincent Baudoui
Round-off error propagation in large-scale applications
Round-off errors coming from numerical calculation finite precision can lead to catastrophic losses in significant numbers when they accumulate. They will become more and more overriding in the future as the problem size increases with the refinement of numerical simulations. Existing analytical bounds for round-off errors are known to be poorly scalable and they become quite useless for large problems. That is why the propagation of round-off errors throughout a computation needs to be better understood in order to ensure large-scale application results accuracy. We study here a round-off error estimation method based on first order derivatives computed thanks to algorithmic differentiation techniques. It can help following the error propagation through a computational graph and identifying the sensitive sections of a code. It has been experimented on well known LU decomposition algorithms that are widely used to solve linear systems. We will present some examples as well as challenges that need to be tackled as part of future research work in order to set up a strategy to analyze round-off error propagation in large-scale problems.
Wolfgang Frings
Parallel applications often store data in multiple task-local files, for example, to create checkpoints, to circumvent memory limitations, or to record performance data. When operating at very large processor configurations, such applications often experience scalability limitations when the simultaneous creation of thousands of files causes metadata-server contention or simply when large file counts complicate file management or operations on those files even destabilize the file system.
In the first part of the talk we will cover the design principles of SIONlib, a parallel I/O library, which addresses this problem by transparently mapping a large number of task-local files onto a small number of physical files via internal metadata handling and block alignment to ensure high performance.
Dynamic linking has many advantages for managing large code bases, but dynamically linked applications have not typically scaled well on high performance computing systems at large scale. Launching an executable that depends on many dynamic shared objects (DSOs) causes a flood of file system operations at program start-up, when each process in the parallel application loads its dependencies. At large scales, this operation has an effect similar to a site-wide denial-of-service attack, as even large parallel file systems struggle to service so many simultaneous requests.
In the second part of this talk we will present Spindle, a novel approach to parallel loading, which coordinates, transparently to user applications, simultaneous file system operations with a scalable network of cache server processes.
Pavan Balaji
VOCL: A Virtualization Infrastructure for Accelerators
Abstract: In this talk I’ll present a light-weight virtualization infrastructure for accelerators called VOCL (Virtual OpenCL). The VOCL framework provides an implementation of OpenCL-1.1 and internally manages accelerators from different vendors using their native OpenCL implementations. It provides transparent access to both local and remote accelerators internally using MPI communication for data movement. This talk will focus on various capabilities such an infrastructure provides including: (1) automatic load balancing capabilities, (2) automatic global system power management, (3) transparent protection from double-bit errors, and (4) utilization of heterogeneous collections of accelerators.
Ronak Buch
Advanced Techniques in Parallel Performance Analysis
Abstract: Analyzing the performance of HPC applications is difficult and often unintuitive. Techniques from the world of serial programming, such as profilers and wall-clock timers, do not fully reveal the properties of parallel programs. To provide an incisive view into performance, tools must be designed with parallelism in mind. This talk will present some advanced analysis techniques tailored specifically for parallelism. These capabilities, including multirun analysis and processor clustering, will be demonstrated using the Projections performance analysis tool.
Jean-François Mehaut
Saving Energy by Exploiting Residual Imbalance on Iterative Applications
Parallel scientific applications have been influencing the way science is done in the last decades. These applications have ever increasing demands in performance and resources due to their greater complexity and larger datasets. To meet these demands, the performance of supercomputers has been growing exponentially, which leads to an exponential growth in power consumption too. In this context, saving power has become one of the main concerns of current HPC platform designs, as future Exascale systems need to consider power demand and energy consumption constraints. Whereas some scientific applications have regular designs that lead to well balanced load distributions, others are more imbalanced due to the fact that they have tasks with different processing demands, which makes it difficult to provide an efficient use of the available resources at the hardware level. In this case, a challenge lies in reducing the energy consumption of the application while maintaining a similar performance. In our work, we focus on reducing the energy consumption of imbalanced applications through a combination of load balancing and Dynamic Voltage and Frequency Scaling (DVFS). Our strategy employs an Energy Daemon Tool to gather power information and a load balancing module that benefits from the load balancing framework available with the CHARM++ runtime system. Our approach differs from the one proposed by Sarood et al. as we employ DVFS as a way to decrease energy consumption after balancing the load, while the latter uses DVFS to regulate temperature and employs load balancing to correct subsequent imbalance.
Grigori Fursin
Collective Mind: bringing reproducible research to the masses
When trying to make auto-tuning practical using common infrastructure, public repository of knowledge, and machine learning (cTuning.org), we faced a major problem with reproducibility of experimental results collected from multiple users. This was largely due to a lack of information about all software and hardware dependencies as well as a large variation of measured characteristics.
I will present a possible collaborative approach to solve above problems using a new Collective Mind knowledge management system. This modular infrastructure is intended to preserve and share through Internet the whole experimental setups with all related artifacts and their software and hardware dependencies besides just performance data. Researchers can take advantage of shared components and data with extensible meta-description at http://c-mind.org/repo to quickly prototype and validate research techniques particularly on software and hardware optimization and co-design. At the same time, behavior anomalies or model mispredictions can be exposed in a reproducible way to interdisciplinary community for further analysis and improvement. This approach supports our new open publication model in computer engineering where all results and artifacts are continuously shared and validated by the community (c-mind.org/events/trust2014).
Xin Zhao
Programming Runtime Support for Irregular Computations
Irregular computations have become increasingly important in many areas in recent year such as bioinformatics and social network analysis. Traditional data movement approaches for scientific computation are not well suited for such applications. The Active Messages (AM) model is an alternative communication paradigm that is better suited for such applications by allowing computation to be dynamically moved closer to data. Given the wide usage of MPI in scientific computing, enabling an MPI-interoperable AM paradigm would allow traditional applications to incrementally start utilizing AMs in portions of their applications, thus eliminating the programming effort of rewriting entire applications.
In our previous work we proposed a new generalized framework for MPI-interoperable Active Messages that can provide rich semantics to accommodate a wide variety of application computational patterns. Together with a new API, we present a detailed design of the correctness semantics of the functionality, including memory semantics, interoperability, ordering, concurrency, etc. We also proposed techniques for data streaming, buffering management and asynchronous processing to guarantee the correct execution of irregular applications as well as to achieve high performance. In this talk, I will discuss about irregular computations and the effort we made from programming model and runtime to make the computations easier and faster.
Xin Zhao is a fourth-year Ph.D. student from the Department of Computer Science at the University of Illinois at Urbana-Champaign (UIUC), advised by Prof. William Gropp. Her research interests focus on parallel programming models / runtime systems and irregular applications, with an emphasis on communication, resources management and dynamic execution.
Victor Lopez
DLB: Dynamic Load Balancing Library
Distribute equal amounts of work between tasks is not always trivial and usually becomes a negative performance impact in an application. DLB is a dynamic library designed to speed up hybrid applications by improving its load balance with little or none intervention from the user. The idea behind the library is to redistribute the computational resources of the second level of parallelism (OpenMP, OmpSs) to improve the load balance of the outer level of parallelism (MPI). DLB library uses an interposition technique at run time, so it is not necessary to do a previous analysis or modify the application; although finer control is also supported through an API.
We will present also a case study with CESM (Community Earth System Model), a global climate model that provides computer simulations of the Earth climate states. The application already uses a hybrid parallel programming model (MPI+OpenMp), so with few modifications in the source code we have compiled it to use the OmpSs programming model where DLB will benefit from the high malleability of it.
Marc Snir
Runtime and OS research at DoE
Pieter Bellens
Quantifying the effect of rectangular blocks in the dense QR factorization
Blocked, dense QR factorization using Householder reflectors attains minimal communication bounds and creates a fine-grained parallel computation. We consider the effects of rectangular block dimensions for an implementation in OmpSs, hereby unifying the traditional algorithm, that uses panels or block columns, and the square-blocked variants. Communication, computation, potential parallelism and hence the performance are functions of the block dimension. We use hardware counters and the Task Dependence Graph to quantize these properties for different matrix dimensions. Our measurements indicate that, against the grain of traditional practice, performance in dynamically scheduled environments can be improved by resorting to blocks with rectangular dimensions
Francois Tessier
Distributed communication-aware load balancing with TreeMatch in Charm++
Programming multicore or manycore architectures is a hard challenge particularly if one wants to fully take advantage of their computing power. Moreover, a hierarchical topology implies that communication performance is heterogeneous and this characteristic should also be exploited. We developed a parallel and distributed hierarchical load balancer for Charm++ that take into account both aspects. This work is based on our TreeMatch library that computes process placement in order to reduce an application communication cost based on the hardware topology. We show that the proposed load-balancing scheme manages to improve the execution times while being computed fast and in a scalable manner.
Brice Videau
Porting HPC applications to the Mont-Blanc prototype using BOAST
One of the goal of the Mont-Blanc project is to use real HPC application to evaluate the feasibility of exascale architectures using off the shelf hardware commonly used in the embedded world. The porting of those application is thus of paramount importance for the project. But, if getting scientific applications to run on the target platform is not very difficult, obtaining good performance portability is challenging. Indeed, HPC software are often hand tuned for the most frequently encountered architectures, and those optimizations can prove harmful if applied on a very different architecture. One way to alleviate this problem is to use task based runtimes to obtain adaptive application from a load balancing and network point of view. Unfortunately this solves only part of the problem. Individual tasks also have to be optimized and can be very sensitive to many parameters that are often not clearly exposed in the source code. We thus propose BOAST a meta-programming tool aiming at generating parametrized source code. Several output languages are supported and an expressive DSL is defined to help express the optimizations. An integrated compilation and execution framework is also supplied. This allows to directly test the generated kernels inside BOAST. This talk will present BOAST and how we used it to port part of two HPC applications:
- the Debauchies wavelet kernels of BigDFT, a quantum physics software that compute the electronic density around atoms and molecules,
- a port from CUDA to OpenCL of SPECFEM3D_GLOBE, a wave propagation software based on spectral finite element methods.
Performance results will also be presented.
Lokman Rahmani
Smart In Situ Visualization for Climate Simulations
The increasing gap between computational power and I/O performance in new supercomputers has started to drive a shift from an offline approach to data analysis to an inline approach, termed in situ visualization (ISV). While most visualization software now provides ISV, they typically visualize large dumps of unstructured data, by rendering everything at the highest possible resolution. This often negatively impacts the performance of simulations that support ISV, in particular when ISV is performed interactively, as in situ visualization requires synchronization with the simulation. In this work, we advocate for a smarter method of performing ISV. Our approach is data-driven: it aims to detect potentially interesting regions in the generated dataset in order to feed ISV frameworks with “the interesting” subset of the data produced by the simulation. While this method mitigates the load on ISV frameworks by making them more efficient and more interactive, it also helps scientists focus on the relevant part of their data. We investigate smart ISV in the context of a climate simulation, with a set of generic filters derived from information theory, statistics and image processing, and show the tradeoff between performance and quality of visualization.
Justin Wozniak
Case Studies in Big Data and HPC from X-ray Crystallography
Recent advancements in X-ray crystallography methods, including experimental techniques and detector technology, have produced a data explosion (10's of TBs/week) that has outpaced increases in conventional computational and storage capacity, leading to a crisis in computational analysis and data management in X-ray sciences. Typical Big Data solutions do not accommodate the ad hoc nature of the scientific workflow, including opportunistic use of hardware and highly specialized analysis tools. From a computational perspective, existing analysis codes must be quickly scaled up to massively parallel resources. In this presentation, we will describe our recent work applying the Swift programming language to four applications in X-ray sciences, addressing problems in wide-area data movement and management as well as scaling existing applications on large clusters and the Blue Gene/Q.
Christine Morin
Contrail: Interoperability and Dependability in a Cloud Federation
Cloud computing market is in rapid expansion due to the opportunities to dynamically allocate large amount of resources when needed and to pay only for their effective usage. However, many challenges, in terms of interoperability, performance guarantee, and dependability, should still be addressed to make cloud computing the right solution for companies. Contrail integrated project (IP), funded by the European Commission (http://www.contrail-project.eu) developed a comprehensive cloud computing software stack in open source to address these challenges.
In this talk we first discuss the main challenges faced in the open cloud market and then we present components developed in the framework of the Contrail European project to provide solutions to guarantee interoperability in a cloud federation and to deploy distributed applications over a federation of heterogeneous cloud providers. Our solutions allow to negotiate QoS and QoP SLA terms for an application and then map them on the physical resources.
Martin Quinson
Formal verification of unmodified MPI applications with SimGrid
This talk will first recap the approach leveraged in SimGrid to formally assess the correction of MPI applications through model checking. It will be focused on our current status report and future work. We are now able to verify safety properties, but also liveness properties (with some restrictions), on unmodified small to medium MPI applications (few thousands of lines in C, C++ or Fortran). I will conclude with the research leads that we are currently working on, and with the kind of collaboration that could occur within the Joint Lab with the potential users of such tools.
Yves Robert
Algorithms for coping with silent errors
Silent errors have become a major problem for large-scale distributed systems. Detection is hard, and correction is even harder. This talks presents generic algorithms to achieve both detection and correction of silent errors, by coupling verification mechanisms and checkpointing protocols.
Slim Bouguerra
Energy-Performance Tradeoffs in Multilevel Checkpoint Strategies
Increased complexity of computer architectures, consideration of power constraints, and expected failure rates of hardware components make the design and analysis of energy-efficient fault-tolerance schemes an increasingly challenging and important task. We develop run-time and energy models for multilevel checkpoint schemes and characterize when tradeoffs between expected runtime and energy usage exist. Using these models, we study FTI, a recently developed multilevel checkpoint library, on an IBM Blue Gene/Q. We show that FTI has a low energy footprint and that, consequently optimal checkpoint-interval values with respect to time and energy are similar. We also explore the effect of general system-level parameters on run-time and energy tradeoffs.
Tatiana V. Martsinkevich
Using dedicated resources to alleviate memory limitation for message logging protocols
There are different approaches on how to handle memory limitation for a message logging protocol. The simplest is to take a checkpoint once one of the processes runs out of memory or dump logs to the stable storage to free the memory. However this may increase the load on the the I/O subsystem which is not desirable especially for large-scale runs . Another approach is to use the memory of additional dedicated nodes as a log storage: when a process runs out of memory it sends a portion of its log to the memory of a dedicated node. I will present the study on the feasibility of this approach and explore the overheads related to it.
Ana Gainaru
The road to failure prediction on Blue Waters: latest details and future directions
We analyze the characteristics of failures from the Blue Waters system and study their effect on the results given by the online failure prediction. We make a couple of key observations about the difference in behaviour between different failure types and propose specific optimizations for each. A detailed analysis of the prediction results is also given. We present future work direction together with preliminary results.
Leonardo Bautista Gomez
Fault Tolerance Interface new features and new developments
Slim Bouguerra
Energy-Performance Tradeoffs in Multilevel Checkpoint Strategies
Increased complexity of computer architectures, consideration of power constraints, and expected failure rates of hardware components make the design and analysis of energy-efficient fault-tolerance schemes an increasingly challenging and important task. We develop run-time and energy models for multilevel checkpoint schemes and characterize when tradeoffs between expected runtime and energy usage exist. Using these models, we study FTI, a recently developed multilevel checkpoint library, on an IBM Blue Gene/Q. We show that FTI has a low energy footprint and that, consequently optimal checkpoint-interval values with respect to time and energy are similar. We also explore the effect of general system-level parameters on run-time and energy tradeoffs.