The workshop will take place at Argonne National Laboratory.
This event is supported by INRIA, ANL, UIUC and NCSA, as well as by EDF
Schedule under construction
Main Topics |
Schedule |
Speakers |
Types of presentation |
Topic |
Download |
|
Sunday Nov. 18th |
Dinner |
Giordano's |
http://www.giordanos.com/ |
|
Workshop Day 1 (Room 1416, TCS conference center) |
Monday Nov. 19th |
|
|
|
|
|
07:30-8:30 |
Transportation: Guest House to TCS (building 240) |
|
(Entrance of the conference center) |
|
|
08:00 |
Contiental Breakfast and Registration |
|
Food available in Room 1407, Lunch seating in room 1416 (second half) |
|
Welcome and Introduction |
08:30 |
Franck Cappello, INRIA & UIUC, Marc Snir ANL |
Opening |
Welcome, formal opening and workshop details |
|
|
08:40 |
Marc Snir |
Opening |
ANL presentation and vision of the collaboration |
|
|
08:50 |
Bill Gropp |
Opening |
UIUC/NCSA update and vision of the collaboation |
|
|
09:00 |
Frederic Desprez |
Opening |
INRIA update on HPC strategy and vision of the collaboration |
|
Big Apps, Big DATA - Big I/O |
09:15 |
Robert Jacob |
Trends in HPC |
Climate simulation at extreme scale |
|
|
09:45 |
Rob Ross, ANL |
Trends in HPC |
Trends in HPC I/O and File systems |
|
|
10:15 |
Break |
|
|
|
|
10:45 |
Rob Pennington, NCSA |
Trends in HPC |
Big Data |
|
|
11:15 |
Andrew Chien, ANL |
Potential collaboration |
Big Data |
|
|
11:45 |
Matthieu Dorier, INRIA |
Joint Results |
Visualization |
|
|
12:15 |
Lunch |
|
|
|
Programming Models/Runtime chair: Sanjay Kale |
13:30 |
Wen-Mei Hwu, UIUC |
TBA |
Accelerators |
|
|
14:00 |
Pavan Balaji, ANL |
Potential collaboration |
MPI3 and Unified Runtime |
|
|
14:30 |
Andra Hugo, Raymond Namyst, INRIA |
Potential collaboration |
Composing multiple StarPU applications over heterogeneous machines: a supervised approach |
|
|
15:00 |
Jean-François Mehaut, INRIA |
Potential collaboration |
Optimizations for modern NUMA |
|
|
15:30 |
Break |
|
|
|
Numerical algorithms and Methods |
16:00 |
TBA, ANL |
TBA |
TBA |
|
|
16:30 |
Laura Grigori |
Results |
Communication avoiding |
|
|
17:00 |
Bill Gropp, UIUC |
Results |
Hybrid Scheduling |
|
|
17:30 |
Laurent Hascoet, INRIA |
Early Results |
TBA |
|
|
18:00 |
Adjourn |
|
|
|
|
19:00 |
Dinner |
Jameson's |
|
|
|
|
|
|
|
|
Workshop Day 2 (Main room) |
Tuesday Nov. 20th |
|
|
|
|
|
|
|
|
|
|
Big Systems |
08:30 |
Pete Beckman, ANL |
Trends |
New Directions in Extreme-Scale Operating Systems and Runtime Software |
|
|
09:00 |
Bill Kramer, UIUC/NCSA |
Trends |
Blue Waters update |
|
Cloud |
09:30 |
Ian Foster, ANL |
Potential collaboration |
TBA |
|
|
10:00 |
Christine Morin, INRIA |
Potential collaboration |
Contrial |
|
|
10:30 |
Break |
|
|
|
|
11:00 |
Frederic Desprez, INRIA |
Potential collaboration |
TBA |
|
Resilience: |
11:30 |
Mohamed Slim Bouguerra, INRIA |
Early Result |
Performance modeling of checkpointing under failure prediction |
|
|
12:00 |
Rinku Gupta, ANL |
Potential collaboration |
Interlayer error notification, coordination and CIFTS |
|
|
12:30 |
Ana Gainaru, UIUC |
Early Results |
Coupling failure prediction, proactive and preventive checkpoint for current production HPC systems. |
|
|
13:00 |
Lunch |
|
Food buffet in Room 1407, Lunch seating in room 1416 (second half) |
|
|
|
|
|
Parallel Session |
|
Mini workshop on Numerical libraries |
8:30 |
Stefan Wild, ANL |
Potential collaboration |
TBA |
|
|
09:00 |
Bill Gropp, UIUC |
Potential collaboration |
TBA |
|
|
09:30 |
Laura Grigori, INRIA |
Potential collaboration |
TBA |
|
|
10:00 |
Break |
|
TBA |
|
|
10:30 |
Anshu Dubey, ANL |
Potential collaboration |
TBA |
|
|
11:00 |
Discussion |
|
|
|
|
12:00 |
Adjourn |
|
|
|
|
13:00 |
Lunch |
|
|
|
|
|
|
|
Parallel Sessions |
|
Mini workshop on Performance Modeling and simulation |
14:30 |
Sanjay Kale, UIUC |
Early Results |
BIG SIM |
|
|
15:00 |
Arnaud Legrand, INRIA |
|
SIM GRID |
|
|
15:30 |
Torsten Hoefler, ETH |
Early Results |
TBA |
|
|
16:00 |
Break |
|
|
|
|
16:30 |
Yves Robert, INRIA |
Early Results |
TBA |
|
|
17:00 |
Discussion |
|
|
|
|
18:00 |
Adjourn |
|
|
|
|
19:00 |
Dinner |
Meggaiano's |
[http://www.maggianos.com/EN/Oak-Brook_Oak-Brook_IL/Pages/LocationLanding.aspx?AspxAutoDetectCookieSupport=1 |
|
|
|
|
|
|
|
Mini workshop on Cloud |
14:30 |
Kate Keahey, ANL |
Potential collaboration |
TBA |
|
|
15:00 |
Narayan Deai, ANL |
Potential collaboration |
TBA |
|
|
15:30 |
Jonathan Rouzaud, INRIA |
Potential collaboration |
TBA |
|
|
16:00 |
Break |
|
|
|
|
16:30 |
Michael Wilde |
Potential collaboration |
Swift: simpler parallel programming for cloud and HPC domains http://www.ci.uchicago.edu/swift (Swift for clouds and clusters) |
|
|
17:00 |
Discussion |
|
|
|
|
18:00 |
Adjourn |
|
|
|
|
19:00 |
Dinner |
Meggaiano's |
[http://www.maggianos.com/EN/Oak-Brook_Oak-Brook_IL/Pages/LocationLanding.aspx?AspxAutoDetectCookieSupport=1 |
|
|
|
|
|
|
|
Workshop Day 3 (Main room) |
Wednesday Nov 21st |
|
|
|
|
|
|
|
|
Parallel Sessions |
|
Mini workshop on Programming models/runtime |
08:30 |
Emmanuel Jeannot, INRIA |
Results |
TBA |
|
|
09:00 |
Sanjay Kale, UIUC |
|
Charm++ update |
|
|
09:30 |
Christian Perez, INRIA |
|
TBA |
|
|
10:00 |
Break |
|
|
|
|
10:30 |
Jim Dinan |
|
One sided communication |
|
|
11:00 |
Sebastien Fourestier |
Potential collaboration |
Parallel repartitioning and re-mapping in Scotch |
|
|
11:30 |
Discussion |
|
|
|
|
12:30 |
Closing |
|
|
|
|
13:00 |
Lunch |
|
|
|
|
|
|
|
|
|
Mini workshop on Resilience |
08:30 |
TBA |
TBA |
TBA |
|
|
09:00 |
Peter Brune, ANL |
TBA |
TBA |
|
|
09:30 |
Bogdan Nicolae, IBM |
Results |
Optimizing checkpoint image pages storage |
|
|
10:00 |
Break |
|
|
|
|
10:30 |
Tatiana Martsinkevich, INRIA |
Results |
Fully distributed recovery for send-determinism applications |
|
|
11:00 |
Amina Guermouche, INRIA |
Results |
TBA |
|
|
11:30 |
Discussion |
|
|
|
|
12:30 |
Closing |
|
|
|
|
13:00 |
Lunch |
|
Boxe Lunches |
|
Abstracts
Robert Ross, ANL
Trends in HPC I/O and File systems
All aspects of HPC systems are undergoing change as we move into petascale and towards exascale computing. The traditional "I/O software stack" is no exception: the layers, capabilities, and abstractions in the stack are all in flux as we consider how to best support future HPC applications. This talk will discuss these developmental trends, using ongoing work at Argonne as examples of some directions of study.
Andra Hugo, INRIA
Composing multiple StarPU applications over heterogeneous machines: a supervised approach
Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a single runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource oversubscription, undesirable cache flushes or memory bus contention.
In this talk, I will present an extension to the StarPU runtime system that enables multiple StarPU kernels to simultaneously run over the same CPU+GPU architecture. Further on, I will present some experimental results showing the improvements our solution brings to the efficiency of parallel applications composing several parallel libraries (e.g.: libraries in the domain of dense linear algebra or fluid mechanics). Eventually, I will give some insights about the main challenges of the composability problem and I will present the main topics we are interested in for the future work.
Pete Beckman, ANL
New Directions in Extreme-Scale Operating Systems and Runtime Software
For more than a decade, extreme-scale operating systems and runtime software have been evolving very slowly. Today's large-scale systems use slightly retooled "node" operating systems glued together with ad hoc local agents to handle I/O, job launch, and management. These extreme-scale systems are only slightly more tightly integrated than are generic Linux clusters with InfiniBand. As we look forward to a new era for large-scale HPC systems, we see that power and fault management will become key design issues. Software management of power and support for resilience must now be part of the whole-system design. Extreme-scale operating systems and runtime software will not be simply today's node code with a few control interfaces, but rather a tightly integrated "global OS" that spans the entire platform and works cooperatively across portions of the machine in order to manage power and provide resilience.
Sebastien Fourestier, INRIA
Parallel repartitioning and re-mapping in Scotch
Scotch is a software package for sequential and parallel graph partitioning, static mapping, sparse matrix block ordering, clustering and sequential mesh and hypergraph ordering. As a research project, it is subject to continuous improvement, resulting from several on-going research tasks. Our talk will address several new features we have recently added to Scotch. We will present some threaded algorithms for shared-memory coarsening and refinement. We will also show early results regarding its parallel repartitioning and sequential remapping functionalities.
Michael Wilde, ANL
Swift: simpler parallel programming for cloud and HPC domains
Ana Gainaru, UIUC
Coupling failure prediction, proactive and preventive checkpoint for current production HPC systems.
A large percentage of computing capacity in today’s large high-performance computing systems is wasted due to failures and recoveries. A way of reducing the overhead induced by these strategies is by combining them with failure avoidance methods. Failure avoidance is based on a prediction model that detects fault occurrences ahead of time and allows preventive measures to be taken, such as task migration or checkpointing the application. This talk presents the implementation and results of a prototype implementation of proactive checkpointing based on the ELSA toolkit coupled with periodic multi-level checkpointing based on FTI. The proactive checkpointing is implemented as a level zero (L0) in a four-level scheme, providing the fastest checkpoint, which is necessary to act quickly between the failure prediction and the moment of the failure. We evaluate the proposed approach on the TSUBAME system and we show that the overhead in comparison with a preventive checkpoint execution only represents only 2% to 6%.