Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

P9 + NVIDIA Volta GPUs + HDR IB

Schedule

Year #dates  
1August 2017 - August 2018
  • A single-node prototype based on the existing technology will be constructed first, including the supporting interconnect and storage infrastructure. This prototype will be used in the first year to investigate application requirements and study technology, such as NVM and FPGA choices, DRAM size, off-node bandwidth needs per GPU.
  • Driven by user case studies, system-level software development for tightly integrating GPUs, NVM, and IB interconnect will begin.
  • Limited number of domain scientists will be given access to the instrument to identify future instrument requirements and to gather data on the usage patterns.
  • Development team will be hired and advisory board will be established.
 
2August 2018 - August 2019
  • The majority of the instrument will be assembled, including GPUs and NVM; selection of an FPGA for future expansion will be completed. Storage system will be fully populated.
  • Development of the software stack for seamless data movement across the storage hierarchies will continue, with first prototype supporting a limited set of DL frameworks.
  • Resources allocation, user management, and system monitoring infrastructure will be fully deployed allowing multiple simultaneous users on the system.
  • The system will be open to users capable of utilizing it via remote access over secure shell.
  • Software development to enable easy user access via web portal and within environments such as Jupiter Notebook will begin.
  • Widely used frameworks, such as Caffe and Tensor Flow, will be made available to the users, their optimization for the instrument will start as well.
  • Initial training and outreach efforts will focus on training the users and developing documentation.
 
3August 2019 - August 2020
  • System monitoring results will be used to optimize the system.
  • A set of DL frameworks identified by the user community will be optimized to make use of the entire instrument, enabling DL at scale.
  • Policies for instrument access and allocation will be developed and implemented with the help of the advisory board.
  • FPGA-based hardware acceleration will be enabled for key computationally intensive tasks.
  • Web portal for easy user access and Jupiter Notebook interface will be completed.
  • Work will begin on other interfaces, such as R and Mathematica, driven by the user community needs. 
  • Software improvements and updates will continue throughout the year.
  • System blueprints and performance results will be published in a paper.
  • Access for education projects and courses will become available for a select set of users.
  • By the end of year 3, instrument construction will be completed and the instrument will be transitioned for operation by the ISL at NCSA.
  • Best practices for using the instrument and developing software for it will be developed and documented.
  • The instrument will be open to the wide community of users, including off-campus researchers.