You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 174 Next »

The workshop will take place at Argonne National Laboratory.

This event is supported by INRIA, ANL, UIUC and NCSA, French Ministry of Foreign Affairs as well as by EDF

Schedule under construction

Main Topics

Schedule

Speakers

Types of presentation

Topic

Download


Sunday Nov. 18th
19:00

Dinner

Giordano's
641 PLAINFIELD RD
WILLOWBROOK, IL 60521
(630) 325-6710

http://www.giordanos.com/
http://maps.google.com/maps?f=q&hl=en&q=641%20PLAINFIELD%20RD.,+WILLOWBROOK,+IL+60527+US&ie=UTF8&z=15&om=1&iwloc=A

 

Workshop Day 1 (Room 1416, TCS conference center)

Monday Nov. 19th

 


 

 

 

07:30-8:30

Transportation: Guest House to TCS (building 240)

 

(Entrance of the conference center)

 

 

08:00

Contiental Breakfast and Registration

 

Food available in Room 1407, Lunch seating in room 1416 (second half)

 

Welcome and Introduction

08:30

Franck Cappello, INRIA & UIUC, Marc Snir ANL

Opening

Welcome, formal opening and workshop details

 

 

08:40

Marc Snir

Opening

ANL presentation and vision of the collaboration

 

 

08:50

Bill Gropp

Opening

UIUC/NCSA update and vision of the collaboation

 

 

09:00

Frederic Desprez

Opening

INRIA update on HPC strategy and vision of the collaboration

 

Big Apps, Big DATA - Big I/O
chair: Rajeev Thakur

09:15

Robert Jacob

Trends in HPC

Climate simulation at extreme scale


 

09:45

Rob Ross, ANL

Trends in HPC

Trends in HPC I/O and File systems

 

 

10:15

Break

 

 

 

 

10:45

Rob Pennington, NCSA

Trends in HPC

Big Data


 

11:15

Andrew Chien, ANL

Potential collaboration

Presto/Blockus: Towards a Scalable R Programming System


 

11:45

Matthieu Dorier, INRIA

Joint Results

I/O and in-situ visualization: recent results with the Damaris approach


 

12:15

Lunch

 

 

 

Programming Models/Runtime chair: Sanjay Kale

13:30

Wen-Mei Hwu, UIUC

TBA

Accelerators


 

14:00

Pavan Balaji, ANL

Potential collaboration

MPI3 and Unified Runtime


 

14:30

Andra Hugo, Raymond Namyst, INRIA

Potential collaboration

Composing multiple StarPU applications over heterogeneous machines: a supervised approach


 

15:00

Jean-François Mehaut, INRIA

Potential collaboration

Optimizations for modern NUMA

 

 

15:30

Break

 

 

 

Numerical algorithms and Methods
Chair: Paul Hovland

16:00

Barry Smith, ANL

Trend

Performance Issues in DOE PDE Simulations



16:30

Laura Grigori

Results

Communication avoiding


 

17:00

Bill Gropp, UIUC

Results

Hybrid Scheduling


 

17:30

Laurent Hascoet, INRIA

Early Results

The Data-Dependence graph of Adjoint Codes



18:00

Adjourn

 




19:00

Dinner

Jameson's
Woodridge 1001 W. 75th Street  Woodridge, IL 60517 630.910.9700

http://www.jamesons-charhouse.com/index.html
MAP

 

 

 

 

 

 

 

Workshop Day 2 (Main room)

Tuesday Nov. 20th

 

 

 

 

 

 

 

 

 

 

Big Systems
Chair: Jean François Mehaut

08:30

Pete Beckman, ANL

Trends

New Directions in Extreme-Scale Operating Systems and Runtime Software

 

 

09:00

Bill Kramer, UIUC/NCSA

Trends

Blue Waters update

 

Cloud
Chair: Gabriel Antoniu

09:30

Ian Foster, ANL

Potential collaboration

TBA


 

10:00

Christine Morin, INRIA

Potential collaboration

Contrial


 

10:30

Break

 

 

 


11:00

Frederic Desprez, INRIA

Potential collaboration

TBA


Resilience:
Chair: Christine Morin

11:30

Mohamed Slim Bouguerra, INRIA

Early Result

Performance modeling of checkpointing under failure prediction


 

12:00

Rinku Gupta, ANL

Potential collaboration

CIFTS: An infrastructure for coordinated and comprehensive system-wide fault tolerance.

 

 

12:30

Ana Gainaru, UIUC

Early Results

Coupling failure prediction, proactive and preventive checkpoint for current production HPC systems.

 

 

13:00

Lunch

 

Food buffet in Room 1407, Lunch seating in room 1416 (second half)

 

 

 

 

 

Parallel Session

 

Mini workshop on Numerical libraries
Chair: Paul Hovland
(room 1406, TCS conference center)

8:30

Stefan Wild, ANL

Potential collaboration

Numerical optimization for "automatic" tuning of codes


 

09:00

Bill Gropp, UIUC

Potential collaboration

TBA


 

09:30

Laura Grigori, INRIA

Potential collaboration

TBA


 

10:00

Break


TBA


 

10:30

Anshu Dubey, ANL

Potential collaboration

Optimizing Scientific Codes While Retaining Portability

 

 

11:00

Discussion

 

 

 

 

12:00

Adjourn

 

 

 

 

13:00

Lunch

 

 

 

 

 

 

 

Parallel Sessions

 

Mini workshop on Performance Modeling and simulation
Chair: Marc Snir

14:30

Sanjay Kale, UIUC

Early Results

BIG SIM

 

 

15:00

Arnaud Legrand, INRIA

 

SimGrid for HPC

 

 

15:30

Torsten Hoefler, ETH

Early Results

TBA

 

 

16:00

Break

 

 

 

 

16:30

Yves Robert, INRIA

Early Results

TBA

 

 

17:00

Discussion

 

 

 

 

18:00

Adjourn

 

 

 

 

19:00

Dinner

Meggaiano's
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="397f78de-a7b8-4864-a4ad-340ad8f511cb"><ac:plain-text-body><![CDATA[240 Oakbrook Center Oak Brook, IL 60523

[http://www.maggianos.com/EN/Oak-Brook_Oak-Brook_IL/Pages/LocationLanding.aspx?AspxAutoDetectCookieSupport=1
]]></ac:plain-text-body></ac:structured-macro>
] MAP

 

 

 

 

 

 

 

Mini workshop on Cloud
Chair: Kate Keahey

14:30

Kate Keahey, ANL

Potential collaboration

TBA

 

 

15:00

Narayan Deai, ANL

Potential collaboration

TBA

 

 

15:30

Jonathan Rouzaud, INRIA

Potential collaboration

Provisioning Virtual Machines in Federated Clouds

 

 

16:00

Break

 

 

 

 

16:30

Michael Wilde

Potential collaboration

Swift: simpler parallel programming for cloud and HPC domains http://www.ci.uchicago.edu/swift (Swift for clouds and clusters)
http://www.mcs.anl.gov/exm (Swift for extreme-scale domains)    

 

 

17:00

Discussion

 

 

 

 

18:00

Adjourn

 

 

 

 

19:00

Dinner

Meggaiano's
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="3d3c8ddf-01cb-4837-850a-018c98fc17a8"><ac:plain-text-body><![CDATA[240 Oakbrook Center Oak Brook, IL 60523

[http://www.maggianos.com/EN/Oak-Brook_Oak-Brook_IL/Pages/LocationLanding.aspx?AspxAutoDetectCookieSupport=1
]]></ac:plain-text-body></ac:structured-macro>
] MAP

 

 

 

 

 

 

 

Workshop Day 3 (Main room)

Wednesday Nov 21st

 

 

 

 

 

 

 

 

Parallel Sessions

 

Mini workshop on Programming models/runtime
Chair: Pavan Balaji

08:30

Emmanuel Jeannot, INRIA

Results

TBA

 


09:00

Sanjay Kale, UIUC


Charm++ update

 


09:30

Christian Perez, INRIA

 

TBA

 


10:00

Break

 


 


10:30

Jim Dinan

 

A One-Sided View of HPC: Global-View Models and Portable Runtime Systems

 


11:00

Sebastien Fourestier

Potential collaboration

Parallel repartitioning and re-mapping in Scotch

 

 

11:30

Discussion

 

 

 

 

12:30

Closing

 

 

 

 

13:00

Lunch

 

 

 

 

 

 

 

 

 

Mini workshop on Resilience
Chair: Franck Cappello

08:30

Mohamed Slim Bouguerra

TBA

TBA

 

 

09:00

Amina Guermouche, INRIA

{} Unified Model for Assessing Checkpointing Protocols at Extreme-Scale


 

 

09:30

Bogdan Nicolae, IBM

Resu

I-Ckpt: Leveraging memory access patterns and inline collective deduplication to improve scalability of CR

 

 

10:00

Break

 

 

 

 

10:30

Tatiana Martsinkevich, INRIA

Results

Fully distributed recovery for send-determinism applications

 

 

11:00

Peter Brune, ANL

Trends

Multilevel Resiliency for PDE Simulations

 

 

11:30

Discussion


 

 

 

12:30

Closing

 

 

 

 

13:00

Lunch

 

Boxe Lunches

 

Abstracts

Robert Ross, ANL

Trends in HPC I/O and File systems

All aspects of HPC systems are undergoing change as we move into petascale and towards exascale computing. The traditional "I/O software stack" is no exception: the layers, capabilities, and abstractions in the stack are all in flux as we consider how to best support future HPC applications. This talk will discuss these developmental trends, using ongoing work at Argonne as examples of some directions of study.

Andrew Chien

Presto/Blockus: Towards a Scalable R Programming System

We are studying simple extensions of the R programming system to allow R programmers to have simple, scalable access to multi-core and cluster scale-out parallelism, enabling access to larger memories and higher computation speeds.  Subsequent objectives include scale-vertical to secondary storage, which promises computing over "Big Data" in modest size systems.  This effort is joint with HP and several other institutions.

Andra Hugo, INRIA

Composing multiple StarPU applications over heterogeneous machines: a supervised approach

Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a single runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource oversubscription, undesirable cache flushes or memory bus contention.
In this talk, I will present an extension to the StarPU runtime system that enables multiple StarPU kernels to simultaneously run over the same CPU+GPU architecture. Further on, I will present some experimental results showing the improvements our solution brings to the efficiency of parallel applications composing several parallel libraries (e.g.: libraries in the domain of dense linear algebra or fluid mechanics). Eventually,  I will give some insights about the main challenges of the composability problem and I will present the main topics we are interested in for the future work.

Pete Beckman, ANL

New Directions in Extreme-Scale Operating Systems and Runtime Software

For more than a decade, extreme-scale operating systems and runtime software have been evolving very slowly.  Today's large-scale systems use slightly retooled "node" operating systems glued together with ad hoc local agents to handle I/O, job launch, and management. These extreme-scale systems are only slightly more tightly integrated than are generic Linux clusters with InfiniBand.  As we look forward to a new era for large-scale HPC systems, we see that power and fault management will become key design issues.  Software management of power and support for resilience must now be part of the whole-system design.  Extreme-scale operating systems and runtime software will not be simply today's node code with a few control interfaces, but rather a tightly integrated "global OS" that spans the entire platform and works cooperatively across portions of the machine in order to manage power and provide resilience.

Sebastien Fourestier, INRIA

Parallel repartitioning and re-mapping in Scotch

Scotch is a software package for sequential and parallel graph partitioning, static mapping, sparse matrix block ordering, clustering and sequential mesh and hypergraph ordering. As a research project, it is subject to continuous improvement, resulting from several on-going research tasks. Our talk will address several new features we have recently added to Scotch. We will present some threaded algorithms for shared-memory coarsening and refinement. We will also show early results regarding its parallel repartitioning and sequential remapping functionalities.

Anshu Dubey, ANL

Optimizing Scientific Codes While Retaining Portability

Optimization of large scientific codes for production is a balancing act between portability and performance. In face of future hardware architecture challenges, retaining portability while obtaining acceptable performance is expected to more challenging than ever. The first part of my presentation will be about experiences with pragmatic optimizations of FLASH, a multiphysics simulation code with a wide user base. The second part will discuss ideas for addressing the future challenges.

Michael Wilde, ANL

Swift: simpler parallel programming for cloud and HPC domains

Ana Gainaru, UIUC

Coupling failure prediction, proactive and preventive checkpoint for current production HPC systems.

A large percentage of computing capacity in today’s large high-performance computing systems is wasted due to failures and recoveries. A way of reducing the overhead induced by these strategies is by combining them with failure avoidance methods. Failure avoidance is based on a prediction model that detects fault occurrences ahead of time and allows preventive measures to be taken, such as task migration or checkpointing the application. This talk presents the implementation and results of a prototype implementation of proactive checkpointing based on the ELSA toolkit coupled with periodic multi-level checkpointing based on FTI. The proactive checkpointing is implemented as a level zero (L0) in a four-level scheme, providing the fastest checkpoint, which is necessary to act quickly between the failure prediction and the moment of the failure. We evaluate the proposed approach on the TSUBAME system and we show that the overhead in comparison with a preventive checkpoint execution only represents only 2% to 6%.

Peter Brune

Multilevel Resiliency for PDE Simulations

Co-Authors: Mark Adams, Jed Brown, Peter Brune (speaking), Barry Smith

Multilevel methods for the solution of partial differential equations are the de-facto fast algorithms for large-scale computations.  The utilization of these method necessitates progressively smaller approximations of the solution to the problem, potentially on a smaller subset of the machine.  These algorithms present a tempting target for enabling efficient extreme-scale resiliency, as the multilevel structure may be used to efficiently compress the PDE solution and check for algorithmic correctness.  We discuss the components of multilevel methods and their use for resilient computation.  We speculate on possibilities for the integration of these methods into simulations.

Stefan Wild

Numerical optimization for "automatic" tuning of codes

Heterogeneity and rapid evolution of modern architectures increasingly demand that scientific codes be tuned in order to achieve high performance on different machines. Empirical performance tuning seeks high-performing code variants based on their measured performance on a target machine, but several obstacles remain in making this procedure "automatic." In this talk we provide an overview of the search problem in performance tuning, as formulated through a derivative-free, mixed-integer optimization problem. We explore modeling formulations for the problem, local and global algorithms, and potential trade-offs between competing objectives such as run time and energy consumption.

Rinku Gupta

CIFTS: An infrastructure for coordinated and comprehensive system-wide fault tolerance.

The need for leadership class fault-tolerance continues to increase as emerging high performance systems move towards offering exascale level performance.  While most high-end systems do provide mechanisms for detection, notification and perhaps handling of hardware and software related faults, the individual components present in the system perform these actions separately. Knowledge about occurring faults is seldom shared between different software and almost never on a system-wide basis.  A typical system contains numerous software that could benefit from such knowledge, include applications, middleware libraries, job schedulers, file systems, math libraries, monitoring software, operating systems, and check pointing software.

The Coordinated Infrastructure for Fault Tolerant Systems (CIFTS) initiative provides the foundation necessary to enable systems to adapt to faults in a holistic manner. CIFTS achieves this through the Fault Tolerance Backplane (FTB), providing a unified management and communication framework, which can be used by any system software to publish fault-related information. In this talk, I will present some of the work done by the CIFTS group towards the development of FTB and FTB-enabled components; and discuss the potential and challenges of such system-wide inter-layer fault tolerance frameworks.

Bogdan Nicolae

I-Ckpt: Leveraging memory access patterns and inline collective deduplication to improve scalability of CR

With increasing scale and complexity of supercomputing and cloud computing architectures, faults are becoming a frequent occurrence. For a large class of applications that run for a long time and are tightly coupled, Checkpoint-Restart (CR) is the only feasible method to survive failures. However, exploding checkpoint sizes that need to be dumped to storage pose a major scalability challenge. To tackle with this challenge, this talk focuses on two techniques: (1) leveraging knowledge of memory access patterns to minimize overhead of asynchronous checkpointing; (2) an inline collective memory contents deduplication scheme that attempts to identify and eliminate duplicate memory pages across all processes before they are saved to storage. Several extensions and future work directions are also discussed.

Jonathan Rouzaud-Cornabas

Provisioning Virtual Machines in Federated Clouds

With the increasing number of Cloud offers and their heterogeneity , it becomes harder and harder for the Cloud Users to select the proper cloud(s) and resources.
Moreover, the selection process is strongly related to the application itself and the users' requirements (deadline, cost, etc.).
In this talk, we will present our early work on selecting and provisioning Virtual Machines in Federated Clouds. Our current work focuses on running Bag Of Tasks.
We will show our Cloud Broker Simulator based on SimGrid and how it can be used to help selecting resources for a given application based on a set of requirements.
Finally, we will conclude on presenting the future challenges such as taking into account a large set of scientific computing applications such as workflows.Rouzaud-Cornabas Jonathan

Laurent Hascoet

The Data-Dependence graph of Adjoint Codes

Automatic Differentiation (AD) is the primary means of obtaining analytic derivatives from a numerical model given as a computer program. Therefore, it is an essential productivity tool in numerous computational science and engineering domains. Computing gradients with the adjoint mode of AD via source transformation is a particularly beneficial but also challenging use of AD. From another viewpoint, Data-Dependence Graphs are one of the key tools to study and improve the performance of programs, particularly in view of their parallel execution. Basic parallelizability properties are classically expressed as properties of the Data-Dependence Graph of a code. We explore the relation between the Data-Dependence graphs of a program and of its adjoint, thus explaining why many parallel properties of a code also apply to its adjoint.

Jim Dinan

A One-Sided View of HPC: Global-View Models and Portable Runtime Systems

Global-view and one-sided parallel programming models provide a promising alternative to conventional approaches by enabling programmers to aggregate the memory of multiple nodes and allowing them to access any data, regardless of its physical location.  This model for asynchronous data movement also decouples synchronization from communication, enabling a greater degree of asynchrony.  These properties are of critical importance to scientific computing applications, which must cope with rapidly evolving system architectures, and where new simulation and analysis techniques have exposed greater sparsity and computational imbalance.

In this talk, I will present recent and ongoing work on portable one-sided communication interfaces and global-view parallel programming systems.  This work focuses on the evolution of the MPI-2 remote memory access (RMA) communication interface into the new MPI-3 RMA interface, and on the utilization of these interfaces to support higher-level parallel programming interfaces.  I will describe work, in which we have used the MPI RMA interface to provide the first portable, one-sided implementation of Global Arrays and its impact on the NWChem computational chemistry suite.  In addition, I will describe current and ongoing work in the deployment, implementation, and performance tuning of MPI-3 RMA.

Arnault Legrand

SimGrid for HPC

In this talk, I will briefly present the history and goals of the SimGrid simulation toolkit. Although SimGrid was primarily designed in 1999 to perform scheduling studies on heterogeneous systems such as Grid, recent developments have made it a very effective alternative for conducting simulation studies for P2P and HPC compared to many ad hoc (but often short lived) simulators. I will thus present the current status of research and developments in SimGrid as well as the future directions we intend to address.

Matthieu Dorier

I/O and in-situ visualization: recent results with the Damaris approach

As dumping large amounts of data to parallel file systems starts to highly impact the performance of HPC simulations as well as the practicability of subsequent analysis tasks, new approaches to I/O and data analysis must be found. Damaris proposes to relocate I/O and analysis tasks in dedicated cores interacting with the simulation through shared memory.
In this talk, we will provide a quick recall on the Damaris approach to scalable, efficient, jitter-free I/O, along with past results. We will then move to more recent works and results using Damaris for in-situ visualization with the CM1 atmospheric model and the Nek5000 CFD simulation. This presentation will include a demo of Damaris providing in-situ visualization to a sample simulation through the VisIt visualization software.

Amina Guermouche

Unified Model for Assessing Checkpointing Protocols at Extreme-Scale

In this talk, we present a unified model for several well-known checkpoint/restart protocols. The proposed model is generic enough to encompass both extremes of the checkpoint/restart space, from coordinated approaches to a variety of uncoordinated checkpoint strategies (with message logging). We identify a set of crucial pa- rameters, instantiate them and compare the expected efficiency of the fault tolerant protocols, for a given application/platform pair. We then propose a detailed analysis of several scenarios, including some of the most powerful currently available HPC platforms, as well as anticipated Exascale designs. The results of this analytical comparison are corroborated by a comprehensive set of simulations. Altogether, they outline comparative behaviors of checkpoint strategies at very large scale, thereby providing insight that is hardly accessible to direct experimentation.

  • No labels