Child pages
  • Publications
Skip to end of metadata
Go to start of metadata

M. el Mehdi Diouri, O. Guck, L. Lefevre and F. Cappello_,_ Energy considerations in Checkpointing and Fault Tolerance protocol, Proceeding of IEEE/IFIP DSN/FTXS 2012.

S. Donfack, L Grigori, B. Gropp, V. Kale, Hybrid static/dynamic scheduling for already optimized dense matrix factorization, Proceedings of IEEE IPDPS 2012

A. Guermouche, T. Ropars, M. Snir, F. Cappello, HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications, Proceedings of IEEE IPDPS 2012.

A. Gainaru, F. Cappello, B. Kramer, Taming of the Shrew: Modeling the Normal and Faulty Behavior of Large-scale HPC Systems, Proceedings of IEEE IPDPS 2012

A. Gainaru, F. Cappello, J. Fullop, S. Trausan-Matu, B. Kramer, Adaptive Event Prediction Strategy with Dynamic TimeWindow for Large-Scale HPC Systems, Proceedings of SLAMS 2011 (Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques)

L. Bautista Gomez; D. Komatitsch, N. Maruyama; S. Tsuboi, F. Cappello, S. Matsuoka, T Nakamura, FTI: high performance Fault Tolerance Interface for hybrid systems, Proceedings of IEEE/ACM SC11,

E. M. Heien, D. Kondo, A. Gainaru, D. Lapine, B. Kramer, F. Cappello, Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems, Proceedings of IEEE/ACM SC11,

M. Dorrier, G. Antoniu, F. Cappello, M. Snir, L. Orf, Damaris: Leveraging Multicore Parallelism to Mask I/O Jitter, Technical report TR-JLPC-11-07

B. Nicolae, F. Cappello, BlobCR: Ef?cient Checkpoint-Restart for HPC Applications on IaaS Clouds using Virtual Disk Image Snapshots, Proceedings of IEEE/ACM SC11,

T. Ropars, A. Guermouche, M. Snir, F. Cappello, HydEE: An Energy and Memory Efficient Cluster-Based Hybrid Checkpointing Protocol for MPI Applications, Technical report TR-JLPC-11-05

M. Bougeret, H. Casanova, M. Rabie, Y. Robert. F. Vivien, Checkpointing strategies for parallel jobs, Proceedings of IEEE/ACM SC11,

F.Cappello, M. Jacquelin, L. Marchal, Y. Robert and M. Snir, Comparing archival policies for Blue Waters, Proceedings of HIPC 2011,

L. Pilla, C. Pousa, D. Cordeiro, A. Bhatele, P. Navaux, J-F. Méhaut, L. Kale, Improving Parallel System Performance with a NUMA-aware Load-Balancer, Technical report TR-JLPC-11-02

T. Ropars, A. Guermouche, B. Ucar, E. Meneses, L. V. Kale, F. Cappello, On the Use of Cluster-Based Partial Message Logging to Improve Fault Tolerance for MPI HPC Applications , Proceedings of Europar 2011, |||\

B. Nicolae, F. Cappello, G. Antoniu, Optimizing multi-deployment on clouds by means of self-adaptive prefetching, Proceedings of Europar 2011

A. Gainaru, F. Cappello, B. Kramer, Event log mining tool for large scale HPC systems, Proceedings of Europar 2011

J. Dongarra, F. Cappello, T. H. Dunning, B. Gropp, S. Kale, B. Kramer, M. Snir, et al., The International Exascale Software Project roadmap, IJHPCA 25(1): 3-60 (2011)

Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir, Franck Cappello, Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic Message Passing Applications , Proceedings of IPDPS 2011, |||\

F. Cappello, H. Casanova, Y. Robert, Preventive Migration vs. Preventive Checkpointing for Extreme Scale Supercomputers, Parallel Processing Letters 21(2): 111-132 (2011)

Franck Cappello, Amina Guermouche, Marc Snir, On Communication Determinism in Parallel HPC Applications, Proceedings of IEEE ICCCN 2010, |||\

Franck Cappello, Henri Casanova, Yves Robert, Checkpointing vs. Migration for Post-Petascale Supercomputers, Proceedings of ICPP 2010

Leonardo Arturo Bautista Gomez, Naoya Maruyama, Franck Cappello, Satoshi Matsuoka, Distributed Diskless Checkpoint for Large Scale Systems, Proceedings of IEEE CCGRID 2010

Ana Gainaru, Franck Cappello, Stephan Trausan-Matu, William Kramer, Hierarchical Event Log Organizer, Technical Report of the INRIA-Illinois Joint Laboratory on Petascale Computing (TR-JLPC-10-02)

Ana Gainaru, Franck Cappello, Stephan Trausan-Matu, State of the art on event analysis for large scale computers Technical Report of the INRIA-Illinois Joint Laboratory on Petascale Computing (TR-JLPC-10-01)

Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, Marc Snir, Toward Exascale Resilience, IJHPCA 23(4): 374-388 (2009)

Franck Cappello, Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities, IJHPCA 23(3): 212-226 (2009)

Jack Dongarra, Pete Beckman, Patrick Aerts, Franck Cappello, Thomas Lippert, Satoshi Matsuoka, Paul Messina, Terry Moore, Rick Stevens, Anne E. Trefethen, Mateo Valero, The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community, IJHPCA 23(4): 309-322 (2009)

Franck Cappello, Amina Guermouche, Thomas Herault, Marc Snir, Revisiting Fault Tolerant Protocols for HPC Applications, Technical Report of the INRIA-Illinois Joint Laboratory on Petascale Computing (TR-JLPC-09-02), submitted

Franck Cappello, Al Geist, Bill Gropp, Sanjay Kale, Bill Kramer, Marc Snir, Toward Exascale Resilience, Technical Report of the INRIA-Illinois Joint Laboratory on Petascale Computing (TR-JLPC-09-01)

  • No labels