Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In publications and presentations that use results obtained on this system, please include the following acknowledgement: “This work utilizes resources supported by the National Science Foundation’s Major Research Instrumentation program, grant #1725729, as well as the University of Illinois at Urbana-Champaign”.

Also, please include the following reference in your publications: V. Kindratenko, D. Mu, Y. Zhan, J. Maloney, S. Hashemi, B. Rabe, K. Xu, R. Campbell, J. Peng, and W. Gropp. HAL: Computer System for Scalable Deep Learning. In Practice and Experience in Advanced Research Computing (PEARC ’20), July 26–30, 2020, Portland, OR, USA. ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3311790.3396649”.<insert acknowledge reference for user publications>

Innovative Systems Lab (ISL) cluster


Host name: hal isl-login01.ncsa.illinois.edu

Hardware

Software

Documentation

Science on HAL

Software for HAL

<insert user facing software documentation>

To request access: fill out this form <access request form>. Make sure to follow the link in the confirmation email to request actual system account.

Frequently Asked Questions

To report problems: email us

For our new users: New User Guide for HAL System

User group Slack space:  https://join.slack.com/t/halillinoisncsa<??>

Real-time Dashboards: Here
HAL OnDemand portal: https://hal-ondemandmetrics.ncsa.illinois.edu/

<insert Open OnDemand portal info>

Globus Endpoint: <insert globus endpoint>

Quick start guide: (for complete details see Documentation section on the left)

To connect to the cluster:

Code Block
ssh <username>@hal<username>@isl-login01.ncsa.illinois.edu 

To submit interactive job:

Code Block
languagebash
swrunsrun --pty -p gpux1cpu -- bash

To submit a batch job:

Code Block
swbatchsbatch run_script.swbsb  

Job Queue time limits:

  • "debug" queue: 4 hours"gpux<n>" gpu" and "cpun<n>cpu" queues:  24 hours

Resource limits:

  • 5 concurrently running jobs
  • concurrently allocated resources
  • 5 nodes
  • 16 GPUs<TBA>
  • For larger/more numerous jobs, please contact admins for a special arrangement and/or a reservation

To load the OpenCE module (provides PyTorch, Tensorflow and other ML tools):

Code Block
module load opence
To

see CLI scheduler status:

Code Block
swqueue



Main -> Systems -> HAL

Contact us

Request access to this system: Application<application>

Contact ISL staff: Email Address

Visit: NCSA, room 3050E