Page History

Main Topics	Schedule	Speakers	Types of presentation	Titles (tentative)	Download
Diner	Sunday Nov. 21st 19:00	Radio Maria				http://www.radiomariarestaurant.com/
Workshop Day 1 (Auditorium)	Monday Nov. 22cd
Welcome and Introduction	08:30	Franck Cappello, INRIA & UIUC, France and Thom dunning, NCSA, USA	Background	Workshop details
Post PetaScale and Exascale Systems , chair: Franck Cappello	08:45	Mitsuhisa Sato, U. Tsukuba, Japan	Trends in HPC	Challenges on Programming Models and Languages for Post-Petascale Computing -- from Japanese NGS project "The K computer" to Exascale computing --	INRIA-UIUC-WS4-msato.pdf
	09:15	Marc Snir, UIUC, USA	Trends in HPC	Toward Exascale	INRIA-UIUC-WS4-msnir.pdf
	09:45	Wen Mei Wu, UIUC, USA	Trends in HPC	Extreme-Scale Heterogeneous Computing	INRIA-UIUC-WS4-Hwu.pdf
	10:15	Arun Rodrigues, Sandia, USA	Trends in HPC	The UHPC X-Caliber Project	INRIA-UIUC-WS4-arodrigues.pdf
	10:		10:45	Break
Post Petascale Applications and System Software chair: Marc Snir	11:15	Pete Beckman, ANL, USA	Trends in HPC	Exascale Sofware Center	INRIA-UIUC-WS4-pbeckman.pdf
	11:45	Michael Norman, SDSC, USA	Trends in HPC	Extreme Scale AMR for Hydrodynamic Cosmology	INRIA-UIUC-WS4-mnorman.pptx
	12:15	Eric Bohm, UIUC, USA	Trends in HPC	Scaling NAMD into the Petascale and Beyond	INRIA-NCSA_WS4_ebohm.pdf
	12:30 45	Lunch

BLUE WATERS , chair Bill Gropp	14:00	Bill Kramer, NCSA, USA	Overview	Blue Waters: A Super-System to Explore the Expanse and Depth of 21st Century Science	INRIA-UIUC-WS4-bkramer2.pdf
Collaborations on System Software	14:30	Ana Gainaru, NCSA, USA	Early Results	Framework for Event Log Analysis in HPC
	15:00	Thomas Ropars, INRIA, France	Results	Latest Progresses on Rollback-Recovery Protocols for Send-Deterministic Applications
INRIA-UIUC-WS4-againaru.pdf
Steve Gottlieb	15:00		15:30	Esteban Menese, UIUC, USA	Early Results	Clustering Message Passing Applications to Enhance Fault Tolerance Protocols
	16:00	Break
INRIA-UIUC-WS4-emenese.pdf
	15:30	Thomas Ropars, INRIA, France	Results	Latest Progresses on Rollback-Recovery Protocols for Send-Deterministic Applications	INRIA-UIUC-WS4-tropars.pdf
	16:00	Break
Collaborations on System Software, chair: Bill Kramer	16:30	Leonardo Bautista, Titech, Japan	Results/International collaboration with Japan	Transparent	Collaborations on System Software	16:30	Leonardo Bautista, Titech, Japan	Results/International collaboration with Japan	Transparent low-overhead checkpoint for GPU-accelerated clusters	INRIA-UIUC-WS4-lbautista.pdf
	17:00	Gabriel Antoniu, INRIA/IRISA, France	Results	Concurrency-optimized I/O for visualizing HPC simulations: An Approach Using Dedicated I/O cores	INRIA-UIUC-WS4-gantoniu.pdf
		17:30	Mathias Jacquelin, INRIA/ENS Lyon	Results	Comparing archival policies for BlueWaters	INRIA-UIUC-WS4-mjacquelin.pdf
	18:00	Olivier Richard, Joseph Emeras, INRIA/U. Grenoble, France	Early Results	Studying the RJMS, applications and File System triptych: a first step toward experimental approach	INRIA-NCSA-WS4-jemeras.pdf
Diner	19:30	Gould's		http://www.jimgoulddining.com/

Workshop Day 2 (Auditorium)	Tuesday Nov. 23rd

Collaborations on System Software, chair: Raymond Namyst	08:30	Torsten Hoefler, NCSA, USA	Potential collaboration	Application Performance Modeling on Petascale and Beyond	INRIA-UIUC-WS4-thoefler.pdf
	09:00	Frederic Viven, INRIA/ENS Lyon, France	Potential collaboration	On Scheduling Checkpoints of Exascale Application	INRIA-UIUC-WS4-fvivien.pdf
Collaborations Collaborations on Programming models,	09:30	Thierry Gautier	Early Results	Potential collaboration	On the cost of managing data flow dependencies for parallel programming	INRIA-UIUC-WS4-tgautier.pdf TBA
	10:00	Jean François Méhaut Laercio Pilla, INRIA/U. Grenoble, France	Early Results	Charm++ on NUMA Platforms: the impact of SMP Optimizations and a NUMA-aware Load Balancing	INRIA-UIUC-WS4-llpilla.pdf
	10:30	Break
chair: Sanjay Kale	11:00	Raymon Namyst, INRIA/U. Bordeaux, France	Early Results Potential collaboration	Bridging the gap between runtime systems and programming languages on heterogeneous GPU clusters	INRIA-UIUC-WS4-rnamyst.pdf
	11:30	Brian Amedo, INRIA/U. Nice, France	Potential collaboration	Improving asynchrony in an Active Object model	INRIA-UIUC-WS4-bamedro.pdf
		12:00	Christian Perez, INRIA/ENS Lyon, France	Early Results	High Performance Component with Charm++ and OpenAtom	INRIA-UIUC-W54-cperez.pdf
	12:30	Lunch
Collaborations on Numerical Algorithms and Libraries, chair Mitsuhisa Sato	14:00	Luke Olson, Bill Gropp, UIUC, USA	Early Results	On the status of algebraic (multigrid) preconditioners	INRIA-UIUC-WS4-lolson.pdf
	14:30	Simplice Donfac, INRIA/U. Paris Sud, France	Early Results	Improving data locality in communication avoiding LU and QR factorizations	INRIA-UIUC-SW-sdonfack.pdf
	15:00	Desiré Nuentsa, INRIA/IRISA, France	Early Results	Parallel Implementation of deflated GMRES in the PETSc package	INRIA-UIUC-WS4-dnuentsa.pdf
	15:30	Sebastien Fourestier, INRIA/U. Bordeaux, France	Early Results	Graph repartitioning with Scotch and other on going work	INRIA-UIUC_WS4-fourestier.pdf
	16:00	Break
chair: Luke Olson	16:30 15	Marc Baboulin, INRIA, U. Paris Sud, France	Early Results	Accelerating linear algebra computations with hybrid GPU-multicore systems	INRIA-UIUC-WS4-mbaboulin.pdf
	17 16:00 45	Daisuke Takahashi, U. Tsukuba, Japan	Results/International collaboration with Japan	Optimization of a Parallel 3-D FFT with 2-D Decomposition	INRIA-NCSA-WS4-dtakahashi.pdf
	17:30 15	Alex Yee, UIUC, USA	Early Results	A Single-Transpose implementation of the Distributed out-of-order 3D-FFT	INRIA-UIUC-WS4-ayee.pdf
	17:50 35	Jeongnim Kim, NCSA, USA	Early Results	Toward petaflop 3D FFT on clusters of SMP	INRIA-NCSA-WS4jkim.pdf
Diner	19:30	Escobar's		http://www.escobarsrestaurant.com/

Workshop Day 3 (Auditorium)	Wednesday Nov 24th

Break out sessions introduction	8:30	Cappello, Snir	Overview	Objectives of Break-out, expected results Collaborations mechanisms (internship, visits, etc.)
Topics		Participants	Other NCSA participants
Break out session 1	9:00-10:30 15
Routing, topology mapping, scheduling, perf. modeling		Snir, Hoefler, Vivien, Gautier, Jeannot, Kale , Kale, Namyst, Méhaut, Bohm, Pilla, Amedo, Perez, Baboulin		Room 1030	Break-out-report-snir.pdf
Resilience 3D-FFT		Kramer, Cappello, Takahashi, Yee, Jeongnim , Gainaru, Ropars, Menese, Bautista, Antoniu, Richard, Fourestier, Jacquelin		Room 1040	Break-out-report-kramer.pdf
Libraries		Gropp, BaboulinOlson, Désiré, Simplice, Sébastien, Fourestier		Room 1104
					10:15	Break
Break out session 2	10:30-1211:00 45				Resilience		Kramer, Cappello, Gainaru, Ropars, Menese, Beautista,	Room
Programing models / GPU		Kale, Méhaut, Namyst, Wu, AmedroAmedo, Perez, Hoefler, Jeannot Bohm, Pilla, Baboulin, Fourestier, Gautier		Room 1030
I/O		Snir, Viven, Jaquelin, Antoniu, Richard, Kramer, Gainaru, Ropars		Room 1040	Break-out session report -report-snir.pdf
3D-FFT		Cappello, Takahashi, Yee, Jeongnim, Hoefler		Room 1104	Break-out-3D-FFT-cappello.pdf
Break out session report	12:00	Speakers: Snir, Cappello, Kramer, Kale, Olson		Auditorium		12:00	Speakers: Snir, Cappello, Gropp, Kramer, Kale		Auditorium
Closing	12:30	Cappello, Snir		Auditorium
	13:00	Lunch
Diner	19:00	Buttitta's		http://buttittascu.com/

Abstracts

Anchor

	Sato_A
	Sato_A

...

Cosmological simulations present well-known difficulties scaling to large core counts because of the large spatial inhomogeneities and vast range of length scales induced by gravitational instability. These difficulties are compounded when baryonic physics is included which introduce their own multiscale challenges. In this talk I review efforts to scale the Enzo adaptive mesh refinement hydrodynamic cosmology code to O(100,000) cores, and I also discuss Cello--an extremely scalable AMR infrastructure under development at UCSD for the next generation of computer architectures which will underpin petascale Enzo.

Anchor

	KramerBohm_AKramer
	Bohm_A

...

Eric Bohm, NCSA

Blue Waters: A Super-System to Explore the Expanse and Depth of 21st Century Science

...

Scaling NAMD into the Petascale and Beyond

Many challenges arise when employing ever larger supercomputers for the simulation of biological molecules in the context of a mature molecular dynamics code. Issues stemming from the scaling up of problem size, such as input and output require both parallelization and revisions to legacy file formats. Order of magnitude increases in the number of processor cores evoke problems with O(P) structures, load balancing, and performance analysis. New architectures present code optimization opportunities (VSX SIMD) which must be carefully applied to provide the desired performance improvements without dire costs in implementation time and code quality. Looking beyond these imminent concerns for sustained petaflop performance on Blue Waters, we will also consider scalability concerns for future exascale machines.

Anchor

	Kramer_A
	Kramer_A

Bill Kramer, NCSA

Blue Waters: A Super-System to Explore the Expanse and Depth of 21st Century Science

While many people think that Blue Waters means a single Power7 IH supercomputer, in reality, the Blue Waters Project is deploying an entire system architecture that includes an eco-system surrounding the Power7 IH system to make it highly effective, ultra-scale science and engineering. This is what we term the Blue Waters "Super System" which we will describe in detail in this talk along with its corresponding service architecture.

Anchor

	Gainaru_A
	Gainaru_A

Ana Gainaru, UIUC/NCSA

Framework for

...

Ana Gainaru, UIUC/NCSA

Framework for Event Log Analysis in HPC

...

In a High Performance Computing infrastructure, it is particularly difficult to master the architecture as a whole. With the physical infrastructure, the platform management software and the users' applications, understanding the global behavior and diagnosing problems is quite challenging. And it is even more true in a petascale context with thousands of compute nodes to manage and a high occupation rate of the resources. A global study of the platform will thus consider the Resource and Job Management System (RJMS), the File System and the Applications triptych as a whole. Studying their behavior is complicated because it means having some knowledge of the applications requirements in terms of physical resources and access to the File System. In this presentation, we propose a first step toward an experimental approach that mix the use of Jobs Workloads patterns and File System access patterns that, once combined, will give a full set of jobs behaviors. These synthetic jobs will then be used to test and benchmark infrastructure, considering the RJMS and the File System.

Anchor

	Torsten_A
	Torsten_A

Torsten Hoefler, NCSA

Application Performance Modeling on Petascale and Beyond

...

Cache-coherent Non-Uniform Memory Access (ccNUMA) platforms based on multi-core chips are now a common resource in High Performance Computing. To overcome scalability issues in such platforms, the shared memory is physically distributed among several memory banks. Its memory access costs may vary depending on the distance between processing units and data. The main challenge of a ccNUMA platform is to manage efficiently threads, data distribution and communication over all the machine nodes. Charm++ is a parallel programming system that provides a portable programming model for platforms based on shared and distributed memory. In this work, we revisit some of the implementation decisions currently featured on Charm++ on the context of ccNUMA platforms. First, we studied the impact of the new -- shared-memory based -- inter-object communication scheme utilized by Charm+. We show how this shared-memory approach can impact the performance of Charm+ on ccNUMA machines. Second, we conduct a performance evaluation of the CPU and memory affinity mechanisms provided by Charm++ on ccNUMA platforms. Results show that SMP optimizations and affinity support can improve the overall performance of our benchmarks in up to 75%. Finally, in light of these studies, we have designed and implemented a NUMA-aware load balancing algorithm that addresses the issues found. The performance evaluation of our prototype showed results as good as the ones obtained by GreedyLB and significant improvements when compared to GreedyCommLBdistributed memory. In this work, we revisit some of the implementation decisions currently featured on Charm++ on the context of ccNUMA platforms. First, we studied the impact of the new -- shared-memory based -- inter-object communication scheme utilized by Charm+. We show how this shared-memory approach can impact the performance of Charm+ on ccNUMA machines. Second, we conduct a performance evaluation of the CPU and memory affinity mechanisms provided by Charm++ on ccNUMA platforms. Results show that SMP optimizations and affinity support can improve the overall performance of our benchmarks in up to 75%. Finally, in light of these studies, we have designed and implemented a NUMA-aware load balancing algorithm that addresses the issues found. The performance evaluation of our prototype showed results as good as the ones obtained by GreedyLB and significant improvements when compared to GreedyCommLB.

Anchor

	Gautier_A
	Gautier_A

Thierry Gautier INRIA

On the cost of managing data flow dependencies for parallel programming.

Several parallel programming languages or libraries (TBB, Cilk+, OpenMP) allows to spawn independent tasks at runtime. In this talk, I will give an overview of the work about the Kaapi runtime system and its management of dependencies between tasks scheduled by a work stealing algorithm. I will show you that at a lower cost than TBB or Cilk+, it is possible to program with data flow dependencies.

Anchor

	Namyst_A
	Namyst_A

Raymond Namyst INRIA/Univ. Bordeaux

...

Child pages

Versions Compared

Old Version 61

New Version Current

Key

Abstracts

Eric Bohm, NCSA

Bill Kramer, NCSA

Ana Gainaru, UIUC/NCSA

Ana Gainaru, UIUC/NCSA

Torsten Hoefler, NCSA

Thierry Gautier INRIA

Raymond Namyst INRIA/Univ. Bordeaux