UNDER construction: The agenda below is not the final one
This event is supported by INRIA, UIUC, NCSA, ANL, BSC, PUF NEXTGEN,
Main Topics | Schedule | Speaker | Affiliation | Type of presentation | Title (tentative) | Download |
| Sunday June 8th |
|
|
|
|
|
Dinner Before the Workshop | 7:30 PM | Only people registered for the dinner (included) |
|
| Mercure Hotel |
|
|
|
|
|
|
|
|
Workshop Day 1 | Monday June 9th |
|
|
|
|
|
|
|
|
|
| TITLES ARE TEMPORARY (except if in bold font) |
|
Registration | 08:00 | At Inria Sophia Antipolis |
|
|
|
|
Welcome and Introduction Amphitheatre | 08:30 | Franck Cappello + Marc Snir + Yves Robert + Bill Kramer + Jesus Labarta | INRIA&UIUC&ANL&BSC | Background | Welcome, Workshop objectives and organization | |
Plenary Amphitheatre Chair: Franck Cappello | 09:00 | Jesus Labarta | BSC | Background | Presentation of BSC activities |
|
Mini Workshop Math app. Room 1 | ||||||
Chair: Paul Hovland | 09:30 | Bill Gropp | UIUC | |||
10:00 | Jed Brown | ANL | ||||
| 10:30 | Break |
|
|
|
|
11:00 | Ian Masliah | Inria |
| Automatic generation of dense linear system solvers on CPU/GPU architectures |
| |
11:30 | Luke Olson | UIUC | ||||
12:00 | Lunch | |||||
Chair: Bill Gropp | 13:30 | Vincent Baudoui | Inria |
|
|
|
| 14:00 | Paul Hovland | ANL |
|
|
|
14:30 | Stephane Lanteri | Inria | C2S@Exa: a multi-disciplinary initiative for high performance computing in computational sciences | |||
Mini Workshop I/O and BigData Room 1 | ||||||
Chair: Rob Ross | 15:00 | Wolfgang Frings | JSC | |||
15:30 | Break | |||||
16:00 | Jonathan Jenkins | ANL |
|
|
| |
16:30 | Matthieu Dorier | Inria | Omnisc'IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction | |||
17:00 | Adjourn | |||||
18:30 | Bus for dinner (dinner included) | |||||
Mini Workshop Runtime Room 2 |
|
|
|
|
|
|
Chair: Sanjay Kale | 9:30 | Pavan Balaji | ANL | |||
10:00 | Augustin Degomme | Inria | Status Report on the Simulation of MPI Applications with SMPI/SimGrid | |||
| 10:30 | Break |
|
|
|
|
| 11:00 | Ronak Buch | UIUC |
| ||
| 11:30 | Victor Lopez | BSC |
|
|
|
12:00 | Lunch | |||||
Chair: Jesus Labarta | 13:30 | Xin Zhao | ANL | |||
14:00 | Brice Videau | Inria | ||||
14:30 | Pieter Bellens | BSC | ||||
15:00 | Martin Quinson | Inria | ||||
15:30 | Break | |||||
Chair: Martin Quison | 16:00 | Francois Tessier | Inria | |||
16:30 | Jean-François Mehaud | Inria | ||||
17:00 | Adjourn | |||||
18:30 | Bus for dinner (dinner included) | |||||
Workshop Day 2 | Tuesday June 10th | |||||
Formal opening Amphitheatre Chair: Bill Kramer | 08:30 | Marc Snir + Franck Cappello | INRIA&UIUC&ANL | Background | ||
| 08:40 | TBD | Inria | Background | Inria updates and vision of the collaboration | TBD |
| 08:50 | Marc Snir | ANL | Background | ANL updates vision of the collaboration | TBD |
Plenary Amphitheatre | 09:00 | Wolfgan Frings | JSC | Background | JSC activities in HPC | TBD |
Mini Workshop I/O and Big Data Room 1 | ||||||
Chair: Gabriel Antoniu | 09:30 | Rob Ross | ANL |
| Understanding and Reproducing I/O Workloads |
|
| 10:00 | Guillaume Aupy | Inria |
| Scheduling the I/O of HPC applications under congestion | |
10:30 | Break | |||||
11:00 | Lokman Rahmani | Inria | ||||
| 11:30 | Anthony Simonet | Inria |
| Using Active Data to Provide Smart Data Surveillance to E-Science Users |
|
| 12:00 | Lunch |
|
|
|
|
Mini Workshop Runtime | ||||||
Chair: Jean François Mehaud | 09:30 | Sanjay Kale | UIUC | |||
10:00 | Florentino Sainz | BSC | DEEP Collective offload | |||
10:30 | Break | Inria | ||||
11:00 | Arnaud Legrand | Inria | Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures | |||
11:30 | Grigori Fursin | Inria | ||||
12:00 | Lunch | |||||
Formal encouragments Amphitheatre Chair: Franck Cappello | 13:45 | Ed Seidel | UIUC | Background | NCSA updates and vision of the collaboration | |
Plenary Amphitheatre Chair: Wolfgan Frings | 14:00 | Yves Robert | Inria | |||
14:30 | Marc Snir | ANL | ||||
15:00 | Break | |||||
Mini Workshop Resilience | ||||||
Chair: Franck Cappello | 15:30 | Luc Jaulmes | BSC | |||
16:00 | Ana Gainaru | UIUC | ||||
16:30 | Tatiana Martsinkevich | Inria | ||||
17:00 | Adjourn | |||||
Mini Workshop Cloud & Cyber-infrastructure Room 2 | ||||||
Chair: Kate Keahey | 15:30 | Justin Wozniak | ANL | |||
16:00 | Shaowen Wang | UIUC | CyberGIS @ Scale | |||
16:30 | Christine Morin | Inria | ||||
17:00 | Adjourn | |||||
| 18:30 | Bus for Dinner (dinner included) |
|
|
|
|
Workshop Day 3 | Wednesday June 11th |
|
|
|
|
|
Plenary Amphitheatre Chair: Jesus Labarta | 8:30 | Bill Kramer | NCSA |
|
|
|
Mini Workshop Resilience | ||||||
Chair: Yves Robert | 9:00 | Leonardo Bautista Gomez | ANL | |||
9:30 | Slim Bougera | Inria | ||||
10:00 | Break | |||||
10:30 | Sheng Di | ANL | ||||
11:00 | Franck Cappello | ANL | Five open questions on Resilience for the Exascale era | |||
Plenary Amphitheatre | 11:30 | Closing | ||||
12:00 | Lunch (included) | |||||
Mini Workshop Cloud & Cyber-infrastructure | ||||||
Chair: Christine Morin | 09:00 | Kate Keahey | ANL | |||
09:30 | Radu Tudoran | Inria | JetStream: Enabling High Performance Event Streaming across Cloud Data-Centers | |||
10:00 | Break | |||||
10:30 | Sri Hari Krishna Narayanan | ANL | ||||
11:00 | Timothy Armstrong | ANL | ||||
Plenary Amphitheatre | 11:30 | Closing | ||||
12:00 | Lunch (included) |
Abstract
Matthieu Dorier
Title: Omnisc'IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction
The increasing gap between the computation performance of post-petascale machines and the performance of their I/O subsystem has motivated many I/O optimizations including prefetching, caching, and scheduling techniques. To further improve these techniques, modeling and predicting spatial and temporal I/O patterns of HPC applications as they run have become crucial.
This presentation introduces Omnisc'IO, an original approach that aims to make a step forward toward an intelligent I/O management of HPC applications in next-generation post-petascale supercomputers. It builds a grammar-based model of the I/O behavior of any HPC application and uses that model to predict when future I/O operations will occur, as well as where and how much data will be accessed. Omnisc'IO is transparently integrated into the POSIX and MPI I/O stacks and does not require any modification to application sources or to high level I/O libraries. It works without prior knowledge of the application, and converges to accurate predictions within a couple of iterations only. Its implementation is efficient both in computation time and in memory footprint. Omnisc'IO was evaluated with four real HPC applications -- CM1, Nek5000, GTC, and LAMMPS -- using a variety of I/O backends ranging from simple POSIX to Parallel HDF5 on top of MPI I/O. Our experiments show that Omnisc'IO achieves from 79.5% to 100% accuracy in spatial prediction and an average precision of temporal predictions ranging from 0.2 seconds to less than a millisecond.
Sheng Di
Optimization of Multi-level Checkpoint Model with Uncertain Execution Scales
As for future extreme scale systems, there could be different types of failures striking exa-scale applications with different failure scales, from transient uncorrectable memory errors in processes to massive system outages. In this work, a multi-level checkpoint model is proposed by taking into account uncertain execution scales (different numbers of processes/cores). The contribution is three-fold. (1) We provide an in-depth analysis on why it is very tough to derive the optimal checkpoint intervals for different checkpoint levels and optimize the number of cores simultaneously. (2) We devise a novel method which can quickly obtain an optimized solution, which is the first successful attempt in the multi-level checkpoint model with uncertain scales. (3) We perform both large-scale real experiments and extreme-scale numerical simulation to validate the effectiveness of our design. Experiments confirm our optimized solution outperforms other state-of-the-art solutions by 4.3-88% on wall-clock length.
Augustin Degomme/Arnaud Legrand
Status Report on the Simulation of MPI Applications with SMPI/SimGrid
- Virtualisation: The automatic approaches we had for application emulation required to rely on an alternative compiling chain (e.g., using GNU TLS), which is problematic as it could dramatically changes code performance and was not sufficiently generic. We have looked forward alternative approaches and have recently designed a new one based on the OS-like organization of SimGrid that allows us to identify heaps and stacks of virtual MPI process and to mmap them whenever context switching. This new approach enables to *emulate unmodified MPI applications* regardless of the language with which they are written and regardless of the compiling toolchain. Although this has not been evaluated yet, this approach should also allow to use classical profilers at small scale to identify which variables should be aliased and which kernels should be modeled rather than truly executed in simulation.
- Trace replay and interoperability: we have a current effort toward SMPI interoperability. Each simulation tool (BigSim, LogGOPSIM, Dimemas, SimGrid, SST/Macro ...) has its own strength and weaknesses but is often stronly biased toward a given tracing format. Working toward interoperability would allow researchers to seamlessly move to another simulator whenever it is more appropriate rather than trying to fix the one linked to its tracing tool or to its application. Replaying BigSim and scalatrace traces is now possible in SMPI/SimGrid but the validation remains to be done. We have plans to perform similar work with Dimemas and SST/Macro so as to ease the use of SimGrid's fluid models.
- Status report and current effort on network modeling (IB, fat-tree and torus-like topologies).
Luka Stanisic/Arnaud Legrand
Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures
[Joint work between Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau and Jean-François Méhaut, accepted for publication at Europar'14]
Multi-core architectures comprising several GPUs have become mainstream in the field of High-Performance Computing. However, obtaining the maximum performance of such heterogeneous machines is challenging as it requires to carefully offload computations and manage data movements between the different processing units. The most promising and successful approaches so far rely on task-based runtimes that abstract the machine and rely on opportunistic scheduling algorithms. As a consequence, the problem gets shifted to choosing the task granularity, task graph structure, and optimizing the scheduling strategies. Trying different combinations of these different alternatives is also itself a challenge. Indeed, getting accurate measurements requires reserving the target system for the whole duration of experiments. Furthermore, observations are limited to the few available systems at hand and may be difficult to generalize. In this research report, we show how we crafted a coarse-grain hybrid simulation/emulation of StarPU, a dynamic runtime for hybrid architectures, over SimGrid, a versatile simulator for distributed systems. This approach allows to obtain performance predictions accurate within a few percents on classical dense linear algebra kernels in a matter of seconds, which allows both runtime and application designers to quickly decide which optimization to enable or whether it is worth investing in higher-end GPUs or not.
Guillaume Aupy
Scheduling the I/O of HPC applications under congestion
A significant percentage of the computing capacity of large-scale platforms is wasted due to interferences incurred by multiple applications that access a shared parallel file system concurrently. One solution to handling I/O bursts in large-scale HPC systems is to absorb them at an intermediate storage layer consisting of burst buffers. However, our analysis of the Argonne’s Mira system shows that burst buffers cannot prevent congestion at all times. As a consequence, I/O performance is dramatically degraded, showing in some cases a decrease in I/O throughput of 67%. In this paper, we analyze the effects of interference on application I/O bandwidth, and propose several scheduling techniques to mitigate congestion. We show through extensive experiments that our global I/O scheduler is able to reduce the effects of congestion, even on systems where burst buffers are used, and can increase the overall system throughput up to 56%. We also show that it outperforms current Mira I/O schedulers.
Florentino Sainz
DEEP Collective offload
Abstract: We present a new extension of OmpSs programming model which allows users to dynamically offload C/C++ or Fortran code from one or many nodes to a group of remote nodes. Communication between remote nodes executing offloaded code is possible through MPI. It aims to improve programmability of Exascale and nowadays supercomputers which use different type of processors and interconnection networks which have to work together in order to obtain the best performance. We can find a good example of these architectures in the DEEP project, which has two separated clusters (CPUs and Xeon Phis). With our technology, which works in any architecture which fully supports MPI, users will be able to easily offload work from the CPU cluster to the accelerators cluster without the constraint of falling back to the CPU cluster in order to perform MPI communications.
Radu Tudoran
JetStream: Enabling High Performance Event Streaming across Cloud Data-Centers
The easily-accessible computation power offered by cloud infrastructures coupled with the revolution of Big Data are expanding the scale and speed at which data analysis is performed. In their quest for finding the Value in the 3 Vs of Big Data, applications process larger data sets, within and across clouds. Enabling fast data transfers across geographically distributed sites becomes particularly important for applications which manage continuous streams of events in real time. In this paper, we propose a set of strategies for efficient transfers of events between cloud data-centers. Our approach, called, JetStream, is able to self-adapt to the streaming conditions by modeling and monitoring a set of context parameters. It further aggregates the available bandwidth by enabling multi-route streaming across cloud sites. The prototype was validated on tens of nodes from US and Europe data-centers of the Microsoft Azure cloud using synthetic benchmarks and with application code from the context of the Alice experiment at CERN. The results show an increase in transfer rate of 250 times over individual event streaming. Besides, introducing an adaptive transfer strategy brings an additional 25% gain. Finally, the transfer rate can further be tripled thanks to the use of multi-route streaming.
Anthony Simonet
Using Active Data to Provide Smart Data Surveillance to E-Science Users
Modern scientific experiments often involve multiple storage and computing platforms, software tools, and analysis scripts. The resulting heterogeneous environments make data management operations challenging; the significant number of events and the absence of data integration makes it difficult to track data provenance, manage sophisticated analysis processes, and recover from unexpected situations. Current approaches often require costly human intervention and are inherently error prone. The difficulties inherent in managing and manipulating such large and highly distributed datasets also limits automated sharing and collaboration.
We study a real world e-Science application involving terabytes of data, using three different analysis and storage platforms, and a number of applications and analysis processes. We demonstrate that using a specialized data life cycle and programming model---Active Data---we can easily implement global progress monitoring, and sharing; recover from unexpected events; and automate a range of tasks.
Ian Ma