Page History

...

Main Topics	Schedule	Speaker	Affiliation	Type of presentation	Title (tentative)	Download

Sunday Nov. 24th Dinner Before the Workshop	7:00 PM (Departure from Hampton Inn at 6:45PM) with mini buses	Only people registered for the dinner

Workshop Day 1	Monday Nov. 25th
					TITLES ARE TEMPORARY (except if in bold font)
Registration	08:00
Welcome and Introduction Auditorium 1122 Chair: Franck	08:30	Marc Snir + Franck Cappello Co-directors of the joint-lab		Background	Welcome, Workshop objectives and organization
	08:45	Ed. Seidel Incoming NCSA director	UIUC	Background	NCSA update and vision of the collaboration (This address has been inverted with the next one due to schedule constraints)
	09:00	Peter Schiffer UIUC Vice Chancellor for Research	UIUC	Background	Welcome from UIUC Vice Chancellor for Research
	09:15	Michel Cosnard Inria CEO and President	Inria	Background	INRIA updates and vision of the collaboration
	09:30	Marc Snir Director of Argonne/ MCS and co-director of the joint-lab	ANL	Background	Argonne updates and vision of the collaboration
	09:45	Marc Daumas Attaché for Science and Technology	Embassy of France	Background	France-USA collaboration program updates
	9h55	Franck Cappello Co-director of the Joint-lab	ANL	Background	Joint-Lab, PUF, New Joint-Lab, organization
	10:15	Break
Extreme Scale Systems and infrastructures Auditorium 1122 Chair: Marc Snir	10:45	Pete Beckman	ANL		Extreme Scale Computing & Co-design Challenges
	11:15	John Towns	UIUC		Applications Challenges in the XSEDE Environment
	11:45	Gabriel Antoniu	Inria		A-Brain and Z-CloudFlow: Scalable Data Processing on Azure Clouds - Lessons Learned in Three Years and Future Directions
	12:15	Lunch
	13:45	Bill Kramer	UIUC	Blue Waters	Is Petascale Completely Done? What Should We Do Now?
	14:15	Marc Snir	UIUC		G8 ECS and international collaboration toward extreme scale climate simulation
	14:45	Rob Ross	ANL		Thinking Past POSIX: Persistent Storage in Extreme Scale Systems
	15:15	François Pellegrini	Inria	Plenary talk	Parallel repartitioning and remeshing : results and prospects
	15:45	Break
	16:15	Pavan Balagi	ANL		Message Passing in Massively Multithreaded Environments
	16:45	Wen Mei Hwu	UIUC		A New, Portable Algorithm Framework for Parallel Linear Recurrence Problems
	17:15	Adjourn
	18:45	Bus for Diner

Workshop Day 2	Tuesday Nov. 26
Applications, I/O, Visualization, Big data Auditorium 1122 Chair: Rob Ross	08:30	Greg Bauer	UIUC		Applications and their challenges on Blue Waters
	09:00	Matthieu Dorier	Inria	Joint-result, submitted	CALCioM: Mitigating I/O Interferences in HPC Systems through Cross-Application Coordination
	09:30	Dries Kempe	ANL		Mercury: Enabling Remote Procedure Call for High-Performance Computing
	10:00	Venkat Vishwanath	ANL		Plenary talk
	10:30	Break
	11:00	Babak Behzad	UIUC	ACM/IEEE SC13	Taming Parallel I/O Complexity with Auto-Tuning
	11:30	McHenry, Kenton Guadron	UIUC		NSF CIF21 DIBBs: Brown Dog
	12:00	Lunch

Mini Workshop1 Resilience Room 1030 Chair: Yves Robert
	13:30	Leonardo	ANL	Joint-result	Detecting Silent Data Corruption through Data Dynamic Monitoring for Scientific Applications
	14:00	Tatiana Martsinkevich	Inria	Joint-result	On the feasibility of message logging in hybrid hierarchical FT protocols
	14:30	Mohamed Slim Bouguera	Inria	Joint-result, submitted	Failure prediction: what to do with unpredicted failures ?
	15:00	Ana Gainaru	UIUC	Joint-result, submitted	Topology and behaviour aware failure prediction for Blue Waters.
	15:30	Break
	16:00	Sheng Di	Inria	Joint-result, submitted	Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications
	16:30	Yves Robert	Inria		Assessing the impact of ABFT & Checkpoint composite strategies
	17h00	Weslay Bland	ANL		Fault Tolerant Runtime Research at ANL
	17H30	Adjourn
	19:00	Bus for Diner

Mini Workshop2 Numerical Agorithms Room 1040 Chair: Bill Gropp
	13:30	Luke Olson	UIUC
	14:00	Prasanna Balaprakash	ANL		Active-Learning-based Surrogate Models for Empirical Performance Tuning
	14:30	Yushan Wang	Inria		Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems.
	15:00	Jed Brown	ANL		Fast solvers for implicit Runge-Kutta systems
	15:30	Break
	16:00	Pierre Jolivet	Inria	Best Paper nomiee, IEEE, ACM SC13	Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems
	16:30	Vincent Baudoui	Total&ANL		Round-off error propagation and non-determinism in parallel applications
	17:00	TBD			TBD
	17:30	Adjourn

	19:00	Bus for diner

Workshop Day 3	Wednesday Nov. 27

Mini Workshop3
Programming models, compilation and runtime. Room 1030 Chair: Marc Snir	08:30	Grigori Fursin	Inria		Collective Mind: making auto-tuning practical using crowdsourcing and predictive modeling
	09:00	Maria Garzaran	UIUC		Optimization by Run-time Specialization for Sparse Matrix-Vector Multiplication
	09:30	Jean-François Mehaut	Inria		From Multicores to Manycores Processors: Challenging Programming Issues with the MPPA/KALRAY
	10:00	Break
	10:30	Frederic Vivien	Inria		Scheduling tree-shaped task graphs to minimize memory and makespan
	11:00	Rafael Tesser	Inria	Joint result PDP 2013	Using AMPI to improve the performance of the Ondes3D seismic wave simulator through dynamic load balancing
	11:30	Emmanuel Jeannot	Inria	Joint-result, IEEE Cluster2013	Communication and Topology-aware Load Balancing in Charm++ with TreeMatch
	12:00	Closing
	12:30	Lunch

	18:00	Bus for diner
Mini Workshop4 Large scale systems and their simulators Room 1040 Chair: Bill Kramer
	08:30	Eric Bohm	UIUC		A Multi-resolution Emulation + Simulation Methodology for Exascale
	09:00	Arnault Legrand	Inria		SMPI: Toward Better Simulation of MPI Applications
	09:30	Torsten Hoefler	EPFL
	10:00	Break
	10:30	Kate Kahey	ANL		Evaluating Streaming Strategies for Event Processing across Infrastructure Clouds
	11:00	Jeremy Henos	UIUC		Application Runtime Consistency and Performance Challenges on a shared 3D torus.
	11:30	TBD
Auditorium 1122	12:00	Closing
	12:30	Lunch

	18:00	Bus for diner

...

Software and hardware optimization and co-design of computer systems becomes intolerably complex, ad-hoc, time consuming and error prone due to enormous number of available design and optimization choices, complex interactions between all software and hardware components, and ever changing tools and applications. We present our novel long-term holistic and practical solution to address these problems using new plugin-based Collective Mind infrastructure and repository. For the first time, it can preserve the whole experimental setup and all associated artifacts to distribute program analysis and multi-objective optimization among many participants while utilizing any available smart phone, tablet, laptop, cluster or data center, and continuously observing, classifying and modeling realistic their behavior. Any unexpected behavior is analyzed using shared data mining and predictive modeling plugins or exposed to the community at a public portal cTuning.org and repository c-mind.org/repo for collaborative explanation. Gradually increasing public optimization knowledge helps to continuously improve optimization heuristics of any compiler, predict optimizations for new programs or suggest efficient run-time adaptation strategies depending on end-user requirements. We successfully validated this approach and framework in several academic and industrial projects while releasing hundreds of codelets, numerical applications, data sets, models, universal experimental pipelines, and unified tools to start community-driven, systematic and reproducible R&D to build adaptive, self-tuning computer systems, and initiate new publication model where experiments and techniques are continuously validated and improved by the community.

Wen-Mei Hwu

A New, Portable Algorithm Framework for Parallel Linear Recurrence Problems

Linear recurrence solvers are common constructs in a class of important scientific applications. Many parallel algorithms have been proposed to achieve high performance for different problems that are linear recurrence in nature. Through a detailed investigation of the existing parallel implementations, we identify a general, hierarchical parallel linear recurrence algorithm that has the potential to fully utilize a wide variety of hardware. However, this algorithm is complex and requires enormous programming efforts to achieve high performance across different architectures. To achieve single source performance portability, we create a code-generator using auto-tuning for optimizing high-performance, parallel, linear recurrence solvers that are retargetable to specific platforms. The framework is composed of two major components. The first component is an auto-tuned tiling procedure which generates tiling by searching a unified tiling space (UTS). The UTS combines on-chip memory resources to simplify the complexity of tiling decisions. Based on the tiling decision, the second component selects the best communication implementation to minimize the communication overhead. By heuristically reducing the search space, our auto-tuning technique generates optimized programs in a reasonable time. We evaluate our framework using several benchmarks including prefix sum, IIR filter, bidiagonal solver and tridiagonal solver on GPU architectures. The resulting linear recurrence solvers significantly outperforms the previous state-of-the-art, specialized GPU implementations.

François Pellegrini
Parallel repartitioning and remeshing : results and prospects
The purpose of this talk is to expose the current state and the prospects of research and of implementation regarding two software tools that we develop for HPC : PT-Scotch and PaMPA. PT-Scotch is a parallel partitionning and mapping tool that has been recently extended to provide dynamic remapping features. While its algorithms have been developed with scalability in mind, several algorithmic bottelnecks appear, which impose to re-think the way we perform repartitioning. PaMPA is a library for parallel (re)meshing of distributed, unstructured meshes, that delegates (re)partitioning to PT-SCOTCH. After basic mesh handling features were developed, we focused on parallel remeshing itself, allowing us to produce distributed, tetraedral meshes comprising several hundred million elements.

Child pages

Versions Compared

Old Version 48

New Version 49

Key

TITLES ARE TEMPORARY (except if in bold font)