You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 50 Next »

IN CONSTRUCTION

This event is supported by INRIA, UIUC, NCSA, ANL, as well as by EDF

Main Topics

Schedule

            Speaker

Affiliation

Type of presentation

Title (tentative)

Download

 

 

 

 

 

 

 

Workshop Day 1

Wednesday June 13th

 

 

 

 

 

 

 

 

 

 

TITLES ARE TEMPORARY (except if in bold font)

 

Registration

08:00

 

 

 

 

 

Welcome and Introduction

08:30

Marc Snir + Franck Cappello

INRIA&UIUC

Background

Welcome, Workshop objectives and organization


 

08:45

Bertrand Braunschweig

INRIA

Background

Welcome to INRIA Rennes


 

09:00

Thierry Priol

INRIA

Background

HPC @ INRIA (update)


Sustained Petascale
Chair: Gabriel Antoniu

09:15

Bill Kramer

NCSA

Background

Blue Waters UPDATE and performance metrics.


 

09:45

Torsten Hoefler

NCSA

Background

Blue Waters Applications and Scalability/Performance Challenges and performance modeling

 

 

10:15

Break

 

 

 

 


10:45

Romain Dolbeau

INRIA

Background

Programming Heterogeneous Many-cores Using Directives

 

 

11:15

Marc Snir

ANL

Background

BlueGene Q: First impression

 

 

11:45

Robert Ross

ANL

Background

BIG DATA

 

 

12:15

Lunch

 

 

 

 

Mini Workshop1

 


 

 

 

 

Fault tolerance
Chair: Franck Cappello

13:30

Sanjay Kale and Marc Snir

UIUC, ANL

Background

Fault tolerance needs at NCSA and ANL


 

14:00

Ana Gainaru

NCSA

Joint Result

A detailed analysis of fault prediction results and impact for HPC systems

 

 

14:30

Amina Guermouche

INRIA

Joint Result

TBD

 

 

15:00

Break

 

 

 

 


15:30

Mehdi Diouri

INRIA

Joint Results

Fault tolerance and energy consumption

 

 

16:00

Tatiana Martsinkevich

INRIA

in progress

TBD

 

 

16:30

Sanjay Kale

UIUC

Result

TBD.

 

 

17:00

Discussions



How to address Petascale fault tolerance needs

 

 

18:00

Adjourn

 

 


 

Mini Workshop2

 

 

 

 

 

 

I/O and BigData
Chair: Rob Ross

13:30

Bill Kramer and Rob Ross

UIUC, ANL

Background

I/O and BIGDATA needs at NCSA and ANL


 

14:00

Gabriel Antoniu

INRIA

Joint Result

TBD

 

 

14:30

Matthieu Dorier

INRIA

Joint Result

In-Situ Interactive Visualization of HPC Simulations with Damaris

 

 

15:00

Break

 

 

 

 


15:30

Dries Kimpe

ANL

Background

Fault tolerance and energy consumption

 

 

16:00

---

---

---

TBD

 

 

16:30

---

---

---


 

 

17:00

Discussions



How to address Petascale I/O and Big Data needs

 

 

18:00

Adjourn

 

 


 

 

 

 

 

 

 

 

Workshop Day 2

Thursday June 14th

 

 

 

 

 

 

 

 

 

 

 

 

Math for HPC
Chair: Marc Snir

08:30

Frederic Vivien

INRIA

Joint Result

A Unified...

 

 

09:00

Paul Hovland

ANL

Background

TBD.

 


09:30

Laurent Hascoet

INRIA

Joint Results 

Gradient of MPI-parallel codes

 

 

10:00

Break

 

 

 

 

Programming languages
Chair: François Bodin

10:30

Rajeev Thakur

ANL

Background

TBD.

 

 

11:00

Sanjay Kale

UIUC

Background

TBD.

 

 

11:30

---

---

---

--

 

 

12:00

Lunch

 

 

 

 

 

 

 

 

 

 

 

Mini Workshop3

 

 

 

 

 

 

Numerical libraries
Chair: Paul Hovland

13:30

Paul Hovland and Bill Gropp

UIUC, ANL

Background

Numerical libraries needs at NCSA and ANL


 

14:00

Laura Grigori

INRIA

Joint Result

TBD

 

 

14:30

François Pelegrini

INRIA

Joint Result

Introducing PaMPA

 

 

15:00

Break

 

 

 

 

 

15:30

Jocelyne Erhel

INRIA

Background

TBD

 


16:00

Daisuke Takahashi

U. Tsukuba

Joint Result

TBD

 

 

16:30

---

---

---


 

 

17:00

Discussions



How to address Petascale Numerical Libraries needs

 

 

18:00

Adjourn

 

 


 

 

 

 

 

 

 

 

Mini Workshop4

 

 

 

 

 

 

Programing Models
Chair:  Sanjay Kale

13:30

Rajeev Thakur and Sanjay Kale

UIUC, ANL

Background

Programming model needs at NCSA and ANL


 

14:00

Jean-François Mehaut

INRIA

Joint Result

Load Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs

 

 

14:30

Brice Goglin

INRIA

Background

Bringing hardware affinity information into MPI communication strategies

 

 

15:00

Break

 

 

 

 

 

15:30

Thomas Ropars

EPFL

Background

Towards efficient collective operations on the Intel SCC

 


16:00

Alexandre Duchateau

INRIA

Joint Result

Global operation optimizations on Multicore.

 

 

16:30

---

---

---

TBD

 

 

17:00

Discussions



How to address Petascale programing model needs

 

 

18:00

Adjourn

 

 


 

 

 

 

 

 

 

 

 

19:00

Banquet

 

 

@ Saint Malot

 

 

 

 

 

 

 

 

Workshop Day 3

Friday June 15th

 

 

 

 

 

 

 

 

 

 

 

 

Mini Workshop5

 

 

 

 

 

 

Mapping and Scheduling
Chair:  Torsten Hoefler

08:30

BIll Kramer and Marc Snir

UIUC, ANL

Background

Mapping and Scheduling needs at NCSA and ANL


 

09:00

François Teyssier

INRIA

Joint Result

TBD

 

 

09:30

François Pellegrini

INRIA

Background

TBD

 

 

10:00

Torsten Hoefler

NCSA --> ETH

Background

On-node and off-node Topology Mapping for Petascale Computers

 

 

10:30

Joseph Emeras (Olivier Richard)

INRIA

Background

TBD

 

 

11:00

Discussions



How to address Petascale Mapping and Scheduling needs

 

Mini Workshop6

 

 

 

 

 

 

HPC/Cloud
Chair:  Frederic Desprez

08:30

Kate Keahey (main speaker)
and Franck Cappello

ANL, INRIA

Background

HPC Cloud


 

09:00

Gabriel Antoniu

INRIA

Joint Result

TBD

 

 

09:30

Frederic Desprez

INRIA

Background

TBD

 

 

10:00

Bogdan Nicolae

INRIA

Joint Results

TBD

 

 

10:30

Derrick Kondo

INRIA

Result

Characterization and Prediction of Host Load in a Google Data Center

 

 

11:00

Discussions



How to address HPC Cloud needs

 

 

 

 

 

 

 

 

 

12:00

Franck Cappello and Marc Snir

 

 

Discussion and Closing

 

 

 

 

 

 

 

 

 

12:30

Lunch

 

 

 

 

Abstracts

Romain Dolbeau: Programming Heterogeneous Many-cores Using Directives

Pushed by the pace of innovation in the GPU and more generally the many-core technology, the processor landscape is moving at high-speed. This fast evolution makes software development more complex. Furthermore, the impact of the programming style on future performance and portability of the application is difficult to forecast. The use of directives to annotate serial languages (e.g. C/C++/Fortran) looks very promising. They abstract low-level parallelism implementation details while preserving code assets from the evolution of processor architectures. In this presentation, we describe how to use HMPP (Heterogeneous Many-core Parallel Programming) as well as OpenACC directives to program heterogeneous compute nodes. In particular, we provide insights on how GPU / CPU can be exploited in a unified manner and how code tuning issues can be minimized. We extend the discussion to the use of libraries that is currently one of the key elements when addressing GPU and many-cores.

Laurent Hascoet: Gradient of MPI-parallel codes

Automatic Differentiation (AD) is the primary means of obtaining analytic derivatives from a numerical model given as a computer program. Therefore, it is an essential productivity tool in numerous computational science and engineering domains. Computing gradients with the adjoint mode of AD via source transformation is a particularly beneficial but also challenging use of AD. To date only ad hoc solutions for adjoint differentiation of MPI programs have been available, forcing AD users to reason about parallel communication dataflow and dependencies and manually develop adjoint communication code. In this collaboration between Argonne, RWTH Aachen, and INRIA, we characterize the principal problems of adjoining the most frequently used communication idioms. We propose solutions to cover these idioms and consider the consequences for the MPI implementation, the MPI user and MPI-aware program analysis.

Ana Gainaru: A detailed analysis of fault prediction results and impact for HPC systems

A large percentage of computing capacity in today’s large high-performance computing systems is wasted due to failures and recoveries. As a consequence current research is focusing on providing fault tolerance strategies that aim to minimize fault’s effects on applications. By far, the most popular and used technique from this field is the checkpoint-restart strategy. A complement to this classical approach of handling errors that cause application crashes in large-scale clusters is failure avoidance, by which the occurrence of a fault is predicted and preventive measures are taken. For this, monitoring systems require a reliable prediction system to give information on what will be generated by the system and at what location. Thus far, research in this field used an ideal predictor that so far did not have any implementation in real HPC systems. In this talk, we present a new method for predicting faults by merging signal analysis concepts with data mining techniques. A large part of this talk is focused on a detailed analysis of the prediction method, by applying it to two large-scale systems and by investigating the characteristics and bottlenecks of each step of the prediction process. Furthermore, we analyze the prediction’s precision and recall impact on current checkpointing strategies.

L. Pilla (UFRGS/INRIA Grenoble), J-F. Méhaut (UJF/CEA): Load Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs

Multi-core machines with NUMA design are now common in the assembly of HPC machines. On these machines, in addition to load imbalance coming from the dynamic application, the asymmetry of memory access and network communications costs plays a major role in obtaining high efficiency. Taking both of these criteria into account is a key step to achieve portability of performance on current HPC platforms. In this talk we will explain our portable approach to increase thread/data affinity while reducing core idleness on both sharedmemory and distributed parallel platforms. We will also present the way we implemented it as a Charm++ load balancer that relies on a generic view of the machine topology decorated with benchmarked communication costs. We will end the presentation by showing some of its performance improvements over other state-of-the-art load balancing algorithms on different parallel machines.

Matthieu Dorier: In-Situ Interactive Visualization of HPC Simulations with Damaris

The I/O bottlenecks already present on current petascale systems force to consider new approaches to get insights from running simulations. Trying to bypass storage or drastically reducing the amount of data generated will be of outmost importance for the scales to come and, in particular, for Blue Waters.

This presentation will focus on the specific case of in-situ data analysis collocated with the simulation’s code and running on the same resources. We will first present some common visualization and analysis tools, and show the limitations of their in-situ capabilities. We then present how we enriched the Damaris I/O middleware to support analysis and visualization operations. We show that the use of Damaris on top of existing visualization packages allows us to (1) reduce code instrumentation to a minimum in existing simulations, (2) gather the capabilities of several visualization tools to offer adaptability under a unified data management interface, (3) use dedicated cores to hide the run time impact of in-situ visualization and (4) efficiently use memory through an allocation-based communication model.

François Pellegrini, Cédric Lachat: Introducing PaMPA

PaMPA ("Parallel Mesh Partitioning and Adaptation") is a middleware for the parallel remeshing and the redistribution of distributed unstructured meshes. PaMPA is meant to serve as a basis for the development of numerical solvers implementing compact schemes. PaMPA represents meshes as a set of interconnected entities (elements, faces, edges, nodes, etc.). Since the underlying structure is a graph, elements can be of any kind, and several types of elements can be used within the same mesh. Typed values (scalars, vectors, structured types) can be associated with entities. Value exchange routines allow users to copy values across neighboring processors, and to specify the width of the overlap across processors. Accessors and iterators allow developers of numerical solvers to write their numerical schemes without having to take into account mesh and value distributions. Parallel mesh partitioning and redistribution is now available, partly based on PT-Scotch. Parallel remeshing will soon be available. It will be handled by calling in parallel a user-provided sequential remesher on non-overlapping pieces of the mesh. A full-featured tetrahedron example will be provided before the end of this year, based on the MMG3D sequential remeshing software also developed at Inria.

Torsten Hoefler: On-node and off-node Topology Mapping for Petascale Computers

Network topology and the efficient mapping from tasks to physical processing elements is an increasing problem as we march towards larger systems. The last generation of Petascale class systems which comes right before a major switch to optical interconnects is highly affected due to their large low-bisection Torus networks. We will explore opportunities to improve communication performance by avoiding network congestion with automated task remapping. We discuss how we combine different approaches, various well-known heuristics, a heuristic based in RCM, INRIA's SCOTCH, and INRIA's tree-map algorithms for achieving highest mapping performance for on-node as well as off-node mappings. We also investigate a theoretical possibility to reduce energy usage due to minimizing dilation of the mapping. This whole work is done in the context of MPI and can readily be adapted to real production applications.

Derrick Kondo: Characterization and Prediction of Host Load in a Google Data Center
[Joint work with Sheng Di, Walfredo Cirne]

Characterization and prediction of system load is essential for optimizing its performance. We characterize and predict host load in a real-world production Google data center, using a detailed trace of over 25 million tasks across over 12,500 hosts. In our characterization, we present valuable statistics and distributions of machine’s maximum load, queue state and relative usage levels. Compared to traditional load from Grids and other HPC systems, we ?nd that Google host load exhibits higher variance due to the high frequency of small tasks. Based on this characterization, we develop and evaluate different methods of machine load prediction using techniques such as autoregression, moving averages, probabilistic methods, and noise ?lters. We ?nd that a linear weighted moving average produces accurate predictions with a 80%-95% success rate, outperforming other methods by 5%-20%. Surprisingly, this method outperforms more sophisticated hybrid prediction methods, which are effective for traditional Grid loads but not data center loads due to its more frequent and severe ?uctuations.

Brice Goglin: Bringing hardware affinity information into MPI communication strategies

Understanding the hardware topology and adapting the software accordingly is increasingly difficult. Resource numbering is not portable across machines or operating systems. There are many levels of memory hierarchy. And the access to I/O and memory resource is not flat anymore. We will summarize the work that we put in the Hardware Locality software to provide applications with a portable and easy-to-use abstraction of hardware details. This deep knowledge of hardware affinities let us optimize MPI communication strategies within nodes or between nodes, for both point-to-point and collective operations. We now look at adding quantitative information to the existing qualitative hierarchy description to improve our locality-based criterias.

Thomas Ropars: Towards efficient collective operations on the Intel SCC

Abstract: Many-core chips with more than 1000 cores are expected by the end of the decade. To overcome scalability issues related to cache coherence at such a scale, one of the main research directions is to leverage the message-passing programming model. A many-core chip, such as the Intel Single-Chip Cloud Computer (SCC), integrates a large number of cores connected using a powerful Network-on-Chip. The SCC offers the ability to move data between on-chip Message Passing Buffers using Remote Memory Access (RMA). In this talk, we study how to provide efficient collective operations on the SCC, focusing on the broadcast primitive. We show how RMA operations can be leveraged to dramatically improve the communication performance compared to a solution based on a higher level send/receive interface.

Kate Keahey: HPC Cloud

Infrastructure clouds created ideal conditions for users to outsource their infrastructure needs beyond the boundaries of their institution. A typical infrastructure cloud offers (1) on-demand, short-term access, which allows users to flexibly manage peaks in demand, (2) pay-as-you-go model, which helps save costs for bursty usage patterns (i.e., helps manage “valleys” in demand), (3) access via virtualization technologies which provides a safe and cost-effective way for users to manage and customize their own environments, and (4) sheer convenience, as users and institutions no longer have to have specialized IT departments and can focus on their core mission instead. The flexibility of this approach allows users to also outsource as much or as little of their infrastructure procurement as their needs justify: they can keep a resident amount of infrastructure in-house while outsourcing only at times of increased demand, and they can outsource to a variety of providers choosing the best service levels for the price the market has to offer.

The availability of cloud computing gave rise to an interesting debate on its relationship to high performance computing (HPC). Two significant questions emerged in this context: (1) Can supercomputing workloads be run on a cloud? and (2) Can a supercomputer operate as a cloud? Much investigation has been done on the first issue, most notably and conclusively as part of the Magellan project. The second question, which could provide an interesting solution to challenges defined by the first, has not been investigated nearly as much. This talk will present a state-of-art summary of work in this space, discuss the current open challenges, propose relevant solutions in the area of resource management, as well as outline potential future directions and collaborations.

  • No labels