Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Main Topics

Schedule

            Speaker

Affiliation

Type of presentation

Title (tentative)

Download

 

 

 

 

 

 

 

Sunday Nov. 24th
Dinner Before the Workshop

7:00 PM

(Departure from Hampton Inn at 6:45PM) with mini buses

Only people registered for the dinner

 

 

 

 

 

 

 

 

 

 

 

Workshop Day 1

Monday Nov. 25th

 

 

 

 

 

 

 

 

 

 

TITLES ARE TEMPORARY (except if in bold font)

 

Registration

08:00

 

 

 

 

 

Welcome and Introduction

Auditorium 1122

Chair: Franck

08:30

Marc Snir + Franck Cappello

Co-directors of the joint-lab

 

Background

Welcome, Workshop objectives and organization

 

 

08:45

Ed. Seidel

Incoming NCSA director

UIUC

Background

NCSA update and vision of the collaboration

(This address has been inverted with the next one due to schedule constraints)

 
 09:00

Peter Schiffer

UIUC Vice Chancellor for Research

UIUCBackgroundWelcome from UIUC Vice Chancellor for Research 

 

09:15

Michel Cosnard

Inria CEO and President

Inria

Background

INRIA updates and vision of the collaboration

 


09:30

Marc Snir

Director of Argonne/ MCS and co-director of the joint-lab

ANL

Background

Argonne updates and vision of the collaboration

 
 09:45

Marc Daumas

Attaché for Science and Technology

Embassy of FranceBackgroundFrance-USA collaboration program updates 

 

9h55

Franck Cappello

Co-director of the Joint-lab

ANL

Background

Joint-Lab, PUF, New Joint-Lab, organization

 

 

10:15

Break

 

 

 

 

Extreme Scale Systems and infrastructures

Auditorium 1122

Chair: Marc Snir

10:45

Pete Beckman

ANL

 

Extreme Scale Computing & Co-design Challenges

 

 

11:15

John Towns

UIUC

 

Applications Challenges in the XSEDE Environment

 
 11:45Gabriel AntoniuInria  A-Brain and Z-CloudFlow: Scalable Data Processing on Azure Clouds - Lessons Learned in Three Years and Future Directions 

 

12:15

Lunch

 

 

 

 


13:45

Bill Kramer

UIUC

Blue Waters

Is Petascale Completely Done?  What Should We Do Now?
 

 


14:15

Marc Snir

UIUC

 

G8 ECS and international collaboration toward extreme scale climate simulation

 

 

14:45

Rob Ross

ANL

 

Thinking Past POSIX: Persistent Storage in Extreme Scale Systems

 
 15:15François PellegriniInria Plenary talkParallel repartitioning and remeshing : results and prospects 
 15:45Break    
 16:15Pavan BalagiANL Message Passing in Massively Multithreaded Environments 
 16:45Wen Mei HwuUIUC 

 A New, Portable Algorithm Framework for Parallel Linear Recurrence Problems

 
 17:15Adjourn    

 

18:45

Bus for Diner

 

 

 

 

 

 

 

 

 

 

 

Workshop Day 2


Tuesday Nov. 26

 

 

 

 

 

Applications, I/O, Visualization, Big data

Auditorium 1122

Chair: Rob Ross

08:30

Greg BauerUIUC  Applications and their challenges on Blue Waters

 

 

09:00

Matthieu Dorier

Inria

Joint-result, submitted

CALCioM: Mitigating I/O Interferences in HPC Systems through Cross-Application Coordination

 
 

09:30

Dries Kempe

ANL

 

Mercury: Enabling Remote Procedure Call for High-Performance Computing

 

 

10:00

Venkat Vishwanath

ANL

 

Plenary talk

 

 

10:30

Break

 

 

 

 

 

11:00

Babak Behzad

UIUC

ACM/IEEE SC13

Taming Parallel I/O Complexity with Auto-Tuning

 

 

11:30

McHenry, Kenton Guadron

UIUC

 

NSF CIF21 DIBBs: Brown Dog

 

 

12:00

Lunch

 

 


 

 

 

 

 

 

 

 

Mini Workshop1

Resilience

Room 1030

Chair: Yves Robert

 

 

 

 

 

 

 

13:30

Leonardo

ANL

Joint-result

Detecting Silent Data Corruption through Data Dynamic Monitoring for Scientific Applications

 

 

14:00

Tatiana Martsinkevich

Inria

Joint-result

On the feasibility of message logging in hybrid hierarchical FT protocols

 

 

14:30

Mohamed Slim Bouguera

Inria

Joint-result, submitted

 Failure prediction: what to do with unpredicted failures ?

 

 

15:00

Ana Gainaru

UIUC

Joint-result, submitted

Topology and behaviour aware failure prediction for Blue Waters.

 

 

15:30

Break

 

 

 

 

 

16:00

Sheng Di

Inria

Joint-result, submitted

 Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications

 

 

16:30

Yves Robert

Inria

 

Assessing the impact of ABFT & Checkpoint composite strategies

 

 

17h00

Weslay Bland

ANL

 

Fault Tolerant Runtime Research at ANL

 

 

17H30

Adjourn

 

 

 

 

 

19:00

Bus for Diner

 

 

 

 

       

Mini Workshop2

Numerical Agorithms

Room 1040

Chair: Bill Gropp

 

 

 

 

 

 

 

13:30

Luke Olson

UIUC

 

  
 14:00 Prasanna BalaprakashANL  Active-Learning-based Surrogate Models for Empirical Performance Tuning 

 

14:30

Yushan Wang

Inria

 

Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems.

 

 

15:00

Jed Brown

ANL

 

 Fast solvers for implicit Runge-Kutta systems

 

 

15:30

Break

 

 

 

 

 

16:00

Pierre Jolivet

Inria

Best Paper nomiee, IEEE, ACM SC13

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems

 
 16:30Vincent BaudouiTotal&ANL Round-off error propagation and non-determinism in parallel applications 
 17:00TBD  TBD 

 

17:30

Adjourn

 

 

 

 

       

 

19:00

Bus for diner

 

 

 

 

 

 

 

 

 

 

 

Workshop Day 3


Wednesday Nov. 27

 

 

 

 

 

 

 

 

 

 

 

 

Mini Workshop3


 

 

 

 

 

 

 Programming models, compilation and runtime.

Room 1030

Chair: Marc Snir

08:30

Grigori Fursin

Inria

 

Collective Mind: making auto-tuning practical using crowdsourcing and predictive modeling

 

 

09:00

Maria Garzaran

UIUC

 

Optimization by Run-time Specialization for Sparse Matrix-Vector Multiplication

 


09:30

Jean-François Mehaut

Inria

 

From Multicores to Manycores Processors: Challenging Programming Issues with the MPPA/KALRAY

 
 10:00Break    

 

10:30

Frederic Vivien

Inria

 

Scheduling tree-shaped task graphs to minimize memory and makespan 

 

 

11:00

Rafael Tesser

Inria

Joint result PDP 2013

Using AMPI to improve the performance of the Ondes3D seismic wave simulator through dynamic load balancing

 

 

11:30

Emmanuel Jeannot

Inria

Joint-result, IEEE Cluster2013

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch

 

 

12:00

Closing

 

 

 

 

 

12:30

Lunch

 

 

 

 

       

 

18:00

Bus for diner

 

 

 

 

Mini Workshop4

Large scale systems and their simulators

Room 1040

Chair: Bill Kramer

 

 

 

 

 

 


08:30

Eric Bohm

UIUC

 

A Multi-resolution Emulation + Simulation Methodology for Exascale

 

 

09:00

Arnault Legrand

Inria

 

SMPI: Toward Better Simulation of MPI Applications

 


09:30

Torsten Hoefler

EPFL

 


 

 

10:00

Break

 

 

 

 


10:30

Kate Kahey

ANL

 

Evaluating Streaming Strategies for Event Processing across Infrastructure Clouds

 

 

11:00

Jeremy Henos

UIUC

 

 Application Runtime Consistency and Performance Challenges on a shared 3D torus.

 

 

11:30

TBD

 

 

 

 

Auditorium 1122

12:00

Closing

 

 

 

 

 

12:30

Lunch

 

 

 

 

       
 18:00Bus for diner    

...

Software and hardware optimization and co-design of computer systems becomes intolerably complex, ad-hoc, time consuming and error prone due to enormous number of available design and optimization choices, complex interactions between all software and hardware components, and ever changing tools and applications. We present our novel long-term holistic and practical solution to address these problems using new plugin-based Collective Mind infrastructure and repository. For the first time, it can preserve the whole experimental setup and all associated artifacts to distribute program analysis and multi-objective optimization among many participants while utilizing any available smart phone, tablet, laptop, cluster or data center, and continuously observing, classifying and modeling realistic their behavior. Any unexpected behavior is analyzed using shared data mining and predictive modeling plugins or exposed to the community at a public portal cTuning.org and repository c-mind.org/repo for collaborative explanation. Gradually increasing public optimization knowledge helps to continuously improve optimization heuristics of any compiler, predict optimizations for new programs or suggest efficient run-time adaptation strategies depending on end-user requirements. We successfully validated this approach and framework in several academic and industrial projects while releasing hundreds of codelets, numerical applications, data sets, models, universal experimental pipelines, and unified tools to start community-driven, systematic and reproducible R&D to build adaptive, self-tuning computer systems, and initiate new publication model where experiments and techniques are continuously validated and improved by the community.

Wen-Mei Hwu

A New, Portable Algorithm Framework for Parallel Linear Recurrence Problems

Linear recurrence solvers are common constructs in a class of important scientific applications. Many parallel algorithms have been proposed to achieve high performance for different problems that are linear recurrence in nature. Through a detailed investigation of the existing parallel implementations, we identify a general, hierarchical parallel linear recurrence algorithm that has the potential to fully utilize a wide variety of hardware. However, this algorithm is complex and requires enormous programming efforts to achieve high performance across different architectures. To achieve single source performance portability, we create a code-generator using auto-tuning for optimizing high-performance, parallel, linear recurrence solvers that are retargetable to specific platforms. The framework is composed of two major components. The first component is an auto-tuned tiling procedure which generates tiling by searching a unified tiling space (UTS). The UTS combines on-chip memory resources to simplify the complexity of tiling decisions. Based on the tiling decision, the second component selects the best communication implementation to minimize the communication overhead. By heuristically reducing the search space, our auto-tuning technique generates optimized programs in a reasonable time. We evaluate our framework using several benchmarks including prefix sum, IIR filter, bidiagonal solver and tridiagonal solver on GPU architectures. The resulting linear recurrence solvers significantly outperforms the previous state-of-the-art, specialized GPU implementations.

 

François Pellegrini
Parallel repartitioning and remeshing : results and prospects
The purpose of this talk is to expose the current state and the prospects of research and of implementation regarding two software tools that we develop for HPC : PT-Scotch and PaMPA. PT-Scotch is a parallel partitionning and mapping tool that has been recently extended to provide dynamic remapping features. While  its algorithms have been developed with scalability in mind, several algorithmic bottelnecks appear, which impose to re-think the way we perform repartitioning. PaMPA is a library for parallel (re)meshing of distributed, unstructured meshes, that delegates (re)partitioning to PT-SCOTCH. After basic mesh handling features were developed, we focused on parallel remeshing itself, allowing us to produce distributed, tetraedral meshes comprising several hundred million elements.