Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Main Topics

Schedule

Speakers

Types of presentation

Titles (tentative)

Download

Diner

Sunday Nov. 21st
19:00

Radio Maria

 

 

  

http://www.radiomariarestaurant.com/

 

Workshop Day 1 (Auditorium)

Monday Nov. 22cd

 


 

 

Welcome and Introduction

08:30

Franck Cappello, INRIA & UIUC, France and Thom dunning, NCSA, USA

Background

Workshop details

 

Post PetaScale and Exascale Systems  , chair: Franck Cappello

08:45

Mitsuhisa Sato, U. Tsukuba, Japan

Trends in HPC

Challenges on Programming Models and Languages for Post-Petascale Computing -- from Japanese NGS project "The K computer" to Exascale computing --

INRIA-UIUC-WS4-msato.pdf

 

09:15

Marc Snir, UIUC, USA

Trends in HPC

Toward Exascale

INRIA-UIUC-WS4-msnir.pdf

 

09:45

Wen Mei Wu, UIUC, USA

Trends in HPC

Extreme-Scale Heterogeneous Computing

INRIA-UIUC-WS4-Hwu.pdf

 

10:15

Arun Rodrigues, Sandia, USA

Trends in HPC

The UHPC X-Caliber Project

INRIA-UIUC-WS4-arodrigues.pdf

 

10:

 

10:45

Break

 

 

 

Post Petascale Applications  and System Software  chair: Marc Snir

11:15

Pete Beckman, ANL, USA

Trends in HPC

Exascale Sofware Center

INRIA-UIUC-WS4-pbeckman.pdf

 

11:45

Michael Norman, SDSC, USA

Trends in HPC

Extreme Scale AMR for Hydrodynamic Cosmology

INRIA-UIUC-WS4-mnorman.pptx

 

12:15

Eric Bohm, UIUC, USA

Trends in HPC

Scaling NAMD into the Petascale and Beyond

INRIA-NCSA_WS4_ebohm.pdf

 

12:30 45

Lunch

 

 

 

 

 

 

 

 

 

 

 

 

BLUE WATERS , chair Bill Gropp

14:00

Bill Kramer, NCSA, USA

Overview

Blue Waters: A Super-System to Explore the Expanse and Depth of 21st Century Science

INRIA-UIUC-WS4-bkramer2.pdf

Collaborations on System Software

14:30

Ana Gainaru, NCSA, USA

Early Results

Framework for Event Log Analysis in HPC

 

15:00

Thomas Ropars, INRIA, France

Results

Latest Progresses on Rollback-Recovery Protocols for Send-Deterministic Applications

INRIA-UIUC-WS4-againaru.pdf

Steve Gottlieb

15:00

 

15:30

Esteban Menese, UIUC, USA

Early Results

Clustering Message Passing Applications to Enhance Fault Tolerance Protocols

 

16:00

Break

 

 

INRIA-UIUC-WS4-emenese.pdf

 

15:30

Thomas Ropars, INRIA, France

Results

Latest Progresses on Rollback-Recovery Protocols for Send-Deterministic Applications

INRIA-UIUC-WS4-tropars.pdf

 

16:00

Break

 

 

 

Collaborations on System Software, chair: Bill Kramer

16:30

Leonardo Bautista, Titech, Japan

Results/International collaboration with Japan

Transparent

Collaborations on System Software

16:30

Leonardo Bautista, Titech, Japan

Results/International collaboration with Japan

Transparent low-overhead checkpoint for GPU-accelerated clusters

INRIA-UIUC-WS4-lbautista.pdf

 

17:00

Gabriel Antoniu, INRIA/IRISA, France

Results

Concurrency-optimized I/O for visualizing HPC simulations: An Approach Using Dedicated I/O cores

INRIA-UIUC-WS4-gantoniu.pdf

 

 

17:30

Mathias Jacquelin, INRIA/ENS Lyon

Results

Comparing archival policies for BlueWaters

INRIA-UIUC-WS4-mjacquelin.pdf

 

18:00

Olivier Richard, Joseph Emeras, INRIA/U. Grenoble, France

Early Results

Studying the RJMS, applications and File System triptych: a first step toward experimental approach

INRIA-NCSA-WS4-jemeras.pdf

Diner

19:30

Gould's

 

http://www.jimgoulddining.com/


 

 

 

 

 

 

Workshop Day 2 (Auditorium)

Tuesday Nov. 23rd

 

 

 

 

 

 

 

 

 

 

Collaborations on System Software, chair: Raymond Namyst

08:30

Torsten Hoefler, NCSA, USA

Potential collaboration

Application Performance Modeling on Petascale and Beyond

INRIA-UIUC-WS4-thoefler.pdf

 

09:00

Frederic Viven, INRIA/ENS Lyon, France

Potential collaboration

On Scheduling Checkpoints of Exascale Application

INRIA-UIUC-WS4-fvivien.pdf

Collaborations Collaborations on Programming models,

09:30

Thierry Gautier

Early Results

Potential collaboration

On the cost of managing data flow dependencies for parallel programming

INRIA-UIUC-WS4-tgautier.pdf TBA

 

10:00

Jean François Méhaut Laercio Pilla, INRIA/U. Grenoble, France

Early Results

Charm++ on NUMA Platforms: the impact of SMP Optimizations and a NUMA-aware Load Balancing

INRIA-UIUC-WS4-llpilla.pdf

 

10:30

Break

 

 

 

chair: Sanjay Kale

11:00

Raymon Namyst, INRIA/U. Bordeaux, France

Early Results Potential collaboration

Bridging the gap between runtime systems and programming languages on heterogeneous GPU clusters

INRIA-UIUC-WS4-rnamyst.pdf

 

11:30

Brian Amedo, INRIA/U. Nice, France

Potential collaboration

Improving asynchrony in an Active Object model

INRIA-UIUC-WS4-bamedro.pdf

 

 

12:00

Christian Perez, INRIA/ENS Lyon, France

Early Results

High Performance Component with Charm++ and OpenAtom

INRIA-UIUC-W54-cperez.pdf

 

12:30

Lunch

 

 

 

Collaborations on Numerical Algorithms and Libraries, chair Mitsuhisa Sato

14:00

Luke Olson, Bill Gropp, UIUC, USA

Early Results

On the status of algebraic (multigrid) preconditioners

INRIA-UIUC-WS4-lolson.pdf

 

14:30

Simplice Donfac, INRIA/U. Paris Sud, France

Early Results

Improving data locality in communication avoiding LU and QR factorizations

INRIA-UIUC-SW-sdonfack.pdf

 

15:00

Desiré Nuentsa, INRIA/IRISA, France

Early Results

Parallel Implementation of deflated GMRES in the PETSc package

INRIA-UIUC-WS4-dnuentsa.pdf

 

15:30

Sebastien Fourestier, INRIA/U. Bordeaux, France

Early Results

Graph repartitioning with Scotch and other on going work

INRIA-UIUC_WS4-fourestier.pdf

 

16:00

Break

 

 

 

chair: Luke Olson

16:30 15

Marc Baboulin, INRIA, U. Paris Sud, France

Early Results

Accelerating linear algebra computations with hybrid GPU-multicore systems

INRIA-UIUC-WS4-mbaboulin.pdf

 

17 16:00 45

Daisuke Takahashi, U. Tsukuba, Japan

Results/International collaboration with Japan

Optimization of a Parallel 3-D FFT with 2-D Decomposition

INRIA-NCSA-WS4-dtakahashi.pdf

 

17:30 15

Alex Yee, UIUC, USA

Early Results

A Single-Transpose implementation of the Distributed out-of-order 3D-FFT

INRIA-UIUC-WS4-ayee.pdf

 

17:50 35

Jeongnim Kim, NCSA, USA

Early Results

Toward petaflop 3D FFT on clusters of SMP

INRIA-NCSA-WS4jkim.pdf

Diner

19:30

Escobar's  

  

http://www.escobarsrestaurant.com/

 

 

 

 

 

 

 

Workshop Day 3 (Auditorium)

Wednesday Nov 24th

 

 

 

 

 

 

 

 

 

 

Break out sessions introduction

8:30

Cappello, Snir

Overview

Objectives of Break-out, expected results
Collaborations mechanisms (internship, visits, etc.)

 

Topics

 

Participants

Other NCSA participants

 

 

Break out session 1

9:00-10:30 15

 

 

 

 

Routing, topology mapping, scheduling, perf. modeling

 

Snir, Hoefler, Vivien, Gautier, Jeannot, Kale , Kale, Namyst, Méhaut, Bohm, Pilla, Amedo, Perez, Baboulin

 

Room 1030

Break-out-report-snir.pdf

Resilience 3D-FFT

 

Kramer, Cappello, Takahashi, Yee, Jeongnim , Gainaru, Ropars, Menese, Bautista, Antoniu, Richard, Fourestier, Jacquelin

 

Room 1040

Break-out-report-kramer.pdf

Libraries

 

Gropp, BaboulinOlson, Désiré, Simplice, Sébastien, Fourestier

 

Room 1104

 

 

 

 

 

 

10:15

Break

 

 

 

Break out session 2

10:30-1211:00 45

 

 

 

Resilience

 

Kramer, Cappello, Gainaru, Ropars, Menese, Beautista,

  Room

Programing models / GPU

 

Kale, Méhaut, Namyst, Wu, AmedroAmedo, Perez, Hoefler, Jeannot Bohm, Pilla, Baboulin, Fourestier, Gautier

 

Room 1030

 

I/O

 

Snir, Viven, Jaquelin, Antoniu, Richard, Kramer, Gainaru, Ropars

  

Room 1040

Break-out session report
-report-snir.pdf

3D-FFT

 

Cappello, Takahashi, Yee, Jeongnim, Hoefler

 

Room 1104

Break-out-3D-FFT-cappello.pdf

Break out session report

12:00

Speakers: Snir, Cappello, Kramer, Kale, Olson

 

Auditorium

 

12:00

Speakers: Snir, Cappello, Gropp, Kramer, Kale

 

Auditorium

Closing

12:30

Cappello, Snir

 

Auditorium

 

 

13:00

Lunch

 

 

 

Diner

19:00

Buttitta's

 

http://buttittascu.com/

 

Abstracts

Anchor
Sato_A
Sato_A

...

Cosmological simulations present well-known difficulties scaling to large core counts because of the large spatial inhomogeneities and vast range of length scales induced by gravitational instability. These difficulties are compounded when baryonic physics is included which introduce their own multiscale challenges. In this talk I review efforts to scale the Enzo adaptive mesh refinement hydrodynamic cosmology code to O(100,000) cores, and I also discuss Cello--an extremely scalable AMR infrastructure under development at UCSD for the next generation of computer architectures which will underpin petascale Enzo.

Anchor
KramerBohm_AKramer
Bohm_A

...


Eric Bohm, NCSA

Blue Waters: A Super-System to Explore the Expanse and Depth of 21st Century Science

...

Scaling NAMD into the Petascale and Beyond

Many challenges arise when employing ever larger supercomputers for the simulation of biological molecules in the context of a mature molecular dynamics code.  Issues stemming from the scaling up of problem size, such as input and output require both parallelization and revisions to legacy file formats.  Order of magnitude increases in the number of processor cores evoke problems with O(P) structures, load balancing, and performance analysis.  New architectures present code optimization opportunities (VSX SIMD) which must be carefully applied to provide the desired performance improvements without dire costs in implementation time and code quality.  Looking beyond these imminent concerns for sustained petaflop performance on Blue Waters, we will also consider scalability concerns for future exascale machines.

Anchor
Kramer_A
Kramer_A

Bill Kramer, NCSA

Blue Waters: A Super-System to Explore the Expanse and Depth of 21st Century Science

While many people think that Blue Waters means a single Power7 IH supercomputer, in reality, the Blue Waters Project is deploying an entire system architecture that includes an eco-system surrounding the Power7 IH system to make it highly effective, ultra-scale science and engineering. This is what we term the Blue Waters "Super System" which we will describe in detail in this talk along with its corresponding service architecture.

Anchor
Gainaru_A
Gainaru_A

Ana Gainaru, UIUC/NCSA

Framework for

...

Ana Gainaru, UIUC/NCSA

Framework for Event Log Analysis in HPC

...

In a High Performance Computing infrastructure, it is particularly difficult to master the architecture as a whole. With the physical infrastructure, the platform management software and the users' applications, understanding the global behavior and diagnosing problems is quite challenging. And it is even more true in a petascale context with thousands of compute nodes to manage and a high occupation rate of the resources. A global study of the platform will thus consider the Resource and Job Management System (RJMS), the File System and the Applications triptych as a whole. Studying their behavior is complicated because it means having some knowledge of the applications requirements in terms of physical resources and access to the File System. In this presentation, we propose a first step toward an experimental approach that mix the use of Jobs Workloads patterns and File System access patterns that, once combined, will give a full set of jobs behaviors. These synthetic jobs will then be used to test and benchmark infrastructure, considering the RJMS and the File System.

Anchor
Torsten_A
Torsten_A

Torsten Hoefler, NCSA

Application Performance Modeling on Petascale and Beyond

...

Cache-coherent Non-Uniform Memory Access (ccNUMA) platforms based on multi-core chips are now a common resource in High Performance Computing. To overcome scalability issues in such platforms, the shared memory is physically distributed among several memory banks. Its memory access costs may vary depending on the distance between processing units and data. The main challenge of a ccNUMA platform is to manage efficiently threads, data distribution and communication over all the machine nodes. Charm++ is a parallel programming system that provides a portable programming model for platforms based on shared and distributed memory. In this work, we revisit some of the implementation decisions currently featured on Charm++ on the context of ccNUMA platforms. First, we studied the impact of the new -- shared-memory based -- inter-object communication scheme utilized by Charm+. We show how this shared-memory approach can impact the performance of Charm+ on ccNUMA machines. Second, we conduct a performance evaluation of the CPU and memory affinity mechanisms provided by Charm++ on ccNUMA platforms. Results show that SMP optimizations and affinity support can improve the overall performance of our benchmarks in up to 75%. Finally, in light of these studies, we have designed and implemented a NUMA-aware load balancing algorithm that addresses the issues found. The performance evaluation of our prototype showed results as good as the ones obtained by GreedyLB and significant improvements when compared to GreedyCommLBdistributed memory. In this work, we revisit some of the implementation decisions currently featured on Charm++ on the context of ccNUMA platforms. First, we studied the impact of the new -- shared-memory based -- inter-object communication scheme utilized by Charm+. We show how this shared-memory approach can impact the performance of Charm+ on ccNUMA machines. Second, we conduct a performance evaluation of the CPU and memory affinity mechanisms provided by Charm++ on ccNUMA platforms. Results show that SMP optimizations and affinity support can improve the overall performance of our benchmarks in up to 75%. Finally, in light of these studies, we have designed and implemented a NUMA-aware load balancing algorithm that addresses the issues found. The performance evaluation of our prototype showed results as good as the ones obtained by GreedyLB and significant improvements when compared to GreedyCommLB.

Anchor
Gautier_A
Gautier_A

Thierry Gautier INRIA

On the cost of managing data flow dependencies for parallel programming.

Several parallel programming languages or libraries (TBB, Cilk+, OpenMP) allows to spawn independent tasks at runtime. In this talk, I will give an overview of the work about the Kaapi runtime system and its management of dependencies between tasks scheduled by a work stealing algorithm. I will show you that at a lower cost than TBB or Cilk+, it is possible to program with data flow dependencies.

Anchor
Namyst_A
Namyst_A

Raymond Namyst INRIA/Univ. Bordeaux

...