Page History

Main Topics	Schedule	Speakers	Types of presentation	Titles (tentative)

Workshop Day 1 (Auditorium)	Monday Nov. 22cd
Welcome and Introduction	08:30	Franck Cappello, INRIA & UIUC, France and Thom dunning, NCSA, USA	Background	Workshop details
Post PetaScale and Exascale Systems	08:45	Mitsuhisa Sato, U. Tsukuba, Japan	Trends in HPC	Next Gen and Exascale initiative in Japan
	09:15	Marc Snir, UIUC, USA	Trends in HPC	Exascale Challenges
	09:45	Wen Mei Wu, UIUC, USA	Trends in HPC	Exascale and Accelerators
	10:15	Arun Rodrigues, Sandia, USA	Trends in HPC	X-Caliber (DARPA UHPC)
	10:45	Break
Post Petascale Applications and System Software	11:15	Pete Beckman, ANL, USA	Trends in HPC	Exascale Sofware Center
	11:45	Michael Norman, SDSC, USA	Trends in HPC	ENZO
	12:15	Eric Bohm, UIUC, USA	Trends in HPC	NAMD
	12:30	Lunch


BLUE WATERS	14:00	Bill Kramer, NCSA, USA	Overview	Update on Blue Waters
Collaborations on System Software	14:30	Ana Gainaru, NCSA, USA	Early Results	A Framework for System Event Analysis
	15:00	Thomas Ropars, INRIA, France	Results	Uncoordinated checkpointing without domino effect for send-deterministic applications
	15:30	Esteban Menese, UIUC, USA	Results/International collaboration with China	Clustering for Performance and Fault tolerance
	16:00	Break
Collaborations on System Software	16:30	Leonardo Bautista, Titech, Japan	Results/International collaboration with Japan	Transparent low-overhead checkpoint for GPU-accelerated clusters
	17:00	Gabriel Antoniu, INRIA/IRISA, France	Results	Concurrency-optimized I/O for visualizing HPC simulations: An Approach Using Dedicated I/O cores
	17:30	Mathias Jacquelin, INRIA/ENS Lyon	Results	Vertical vs Horizontal parity for tape archives
	18:00	Olivier Richard, INRIA/U. Grenoble, France	Early Results	I/O aware Resource Management Software
	18:30	Torsten Hoefler, NCSA, USA	Potential collaboration	TBA

Workshop Day 2 (Auditorium)	Tuesday Nov. 23rd

Collaborations on System Software	08:30	Frederic Viven, INRIA/ENS Lyon, France	Potential collaboration	On Scheduling Checkpoints of Exascale Application
Collaborations on Programming models	09:00	Thierry Gautier	Early Results	TBA
	09:30	Jean François Méhaut, INRIA/U. Grenoble, France	Early Results	TBA
	10:00	Emmanuel Jeannot, INRIA/U. Bordeaux, France	Early Results	TBA
	10:30	Break
	11:00	Raymon Namyst, INRIA/U. Bordeaux, France	Early Results	TBA
	11:30	Brian Amedo, INRIA/U. Nice, France	Potential collaboration	TBA
	12:00	Christian Perez, INRIA/ENS Lyon, France	Early Results	TBA
	12:30	Lunch
Collaborations on Numerical Algorithms and Libraries	14:00	Bill Gropp, UIUC, USA	Early Results	TBA
	14:30	Simplice Donfac, INRIA/U. Paris Sud, France	Early Results	TBA
	15:00	Desiré Nuentsa, INRIA/IRISA, France	Early Results	TBA
	15:30	Sebastien Fourestier, INRIA/U. Bordeaux, France	Early Results	TBA
	16:00	Break
	16:30	Marc Baboulin, INRIA, U. Paris Sud, France	Early Results	Accelerating linear algebra computations with hybrid GPU-multicore systems
	17:00	Daisuke Takahashi, U. Tsukuba, Japan	Results/International collaboration with Japan	Optimization of a Parallel 3-D FFT with 2-D Decomposition
	17:30	Alex Yee, UIUC, USA	Early Results 3D	A Single-Transpose implementation of the Distributed out-of-order 3D-FFT FFTs as Big 1D FFTs
	17:50	Jeongnim Kim, NCSA, USA	Early Results	Toward petaflop 3D FFT on clusters of SMP


Workshop Day 3 (Auditorium)	Wednesday Nov 24th

Break out sessions introduction	8:30	Cappello, Snir	Overview	Objectives of Break-out, expected results Collaborations mechanisms (internship, visits, etc.)
Topics		Participants	Other NCSA participants
Break out session 1	9:00-10:30
Routing, topology mapping, scheduling, perf. modeling		Snir, Hoefler, Vivien, Jeannot, Kale		Room
3D-FFT		Cappello, Takahashi, Yee, Jeongnim		Room
Libraries		Gropp, Baboulin, Désiré, Simplice, Sébastien, Fourestier		Room

	10:15	Break
Break out session 2	10:30-12:00
Resilience		Kramer, Cappello, Gainaru, Ropars, Menese, Beautista,		Room
Programing models / GPU		Kale, Méhaut, Namyst, Wu, Amedo, Perez, Hoefler, Jeannot		Room
I/O		Snir, Viven, Jaquelin, Antoniu, Richard
Break out session report	12:00	Speakers: Snir, Cappello, Gropp, Kramer, Kale		Auditorium
Closing	12:30	Cappello, Snir		Auditorium
	13:00	Lunch

...

We describe how hybrid multicore+GPU systems can be used to enhance performance of linear algebra libraries in high performance computing.
We illustrate this approach with the solution of general linear systems based on a hybrid LU factorization where we split the computation over a multicore and a graphic processor, and use particular statistical techniques to reduce the amount of pivoting and communication between the hybrid components. We also show how mixed precision algorithms can be used for accelerating performance.

Kim_A

Anchor

Kim_A

Jeongnim Kim, NCSA, UIUC

Toward petaflop 3D FFT on clusters of SMP

A wide range of scientific applications employs 3D FFT. Sustained petaflop performance of 3D FFT is necessary to meet the NSF Direct Numerical Simulation (DNS) turbulence benchmark on the Blue Waters which represents the current generation of HPC platforms, clusters of multi/many-core SMPs. I present the analysis of 3D FFT implementations and the optimization strategies on the BW. Also discussed is the design of parallel 3D FFT library that can meet the diverse requirements of applications using 3D FFT.

Anchor

Takahashi_A
	Takahashi_A

Daisuke Takahashi, U. Tsukuba

...

Optimization of a Parallel 3-D FFT with 2-D Decomposition
In this talk, an optimization method for parallel 3-D fast Fourier transform (FFT) with 2-D decomposition is presented.The 2-D decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes. The another way to reduce the communication overhead is to overlap communication and computation. An overlapping method for the parallel 3-D FFT is also presented. Performance results of parallel 3-D FFTs on clusters of multi-core processors are reported.
Anchor
Yee_A
Yee_A

Alex Yee, UIUC

A Single-Transpose implementation of the Distributed out-of-order 3D-FFT

The classic approach to computing the distributed in-order 3D-FFT requires up to 3 expensive all-to-all communication transpose steps. Given the memory-bound nature of the FFT, these transposes are dominant factors in the total run-time. Here we present a new approach that reduces the number of transposes to 2 for the in-order transform, and 1 for the out-of-order transform.

Anchor

	Kim_A
	Kim_A

Jeongnim Kim, NCSA, UIUC

Toward petaflop 3D FFT on clusters of SMP

A wide range of scientific applications employs 3D FFT. Sustained petaflop performance of 3D FFT is necessary to meet the NSF Direct Numerical Simulation (DNS) turbulence benchmark on the Blue Waters which represents the current generation of HPC platforms, clusters of multi/many-core SMPs. I present the analysis of 3D FFT implementations and the optimization strategies on the BW. Also discussed is the design of parallel 3D FFT library that can meet the diverse requirements of applications using 3D FFT.

Child pages

Versions Compared

Old Version 30

New Version 31

Key

Jeongnim Kim, NCSA, UIUC

Daisuke Takahashi, U. Tsukuba

Alex Yee, UIUC

Jeongnim Kim, NCSA, UIUC