In publications and presentations that use results obtained on this system, please include the following acknowledgement: “This work utilizes resources supported by the National Science Foundation’s Major Research Instrumentation program, grant #1725729, as well as the University of Illinois at Urbana-Champaign”.

Also, please include the following reference in your publications: “V. Kindratenko, D. Mu, Y. Zhan, J. Maloney, S. Hashemi, B. Rabe, K. Xu, R. Campbell, J. Peng, and W. Gropp. HAL: Computer System for Scalable Deep Learning. In Practice and Experience in Advanced Research Computing (PEARC ’20), July 26–30, 2020, Portland, OR, USA. ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3311790.3396649”.<insert acknowledge reference for user publications>

Innovative Systems Lab (ISL) cluster

Host name: hal isl-login01.ncsa.illinois.edu

Hardware

4x GPU compute nodes
- Dell PowerEdge R730
- 2x Intel CPUs
- 2x
16 IBM AC922 nodes
- IBM 8335-GTH AC922 server
  - 2x 20-core IBM POWER9 CPU @ 2.4GHz
  - 256 GB DDR4
- 4x NVIDIA V100 GPUs
- 5120 cores
- 16 GB HBM 2
- 2-Port EDR 100 Gb/s IB ConnectX-5 Adapter
1 IBM 9006-22P storage node
- 72TB Hardware RAID array
- NFS
Dell seed nodes
- AMD Genoa CPU node
- AMD Genoa GPU node w/ MI200
- 2x Intel Sapphire Rapids CPU nodes
3 DDN GS400NVE Flash Arrays
- 360 TB usable, NVME SSD-based storage
- Spectrum Scale File System

Software

RedHat 8.4
CUDA 11.2.2
- cuDNN 8.1.1
- NCCL 2.8.3
NVidia HPC-SDK 21.5
PowerAI 1.7.0
OpenCE 1.3.1
8
CUDA 12
SLURM 23.11SLURM 20.02.3

Documentation

Science on HAL

Software for HAL

<insert user facing software documentation>

To request access: fill out this form <access request form>. Make sure to follow the link in the confirmation email to request actual system account.

Frequently Asked Questions

To report problems: email us

For our new users: New User Guide for HAL System

User group Slack space: https://join.slack.com/t/halillinoisncsa<??>

Real-time Dashboards: Here
HAL OnDemand portal: https://hal-ondemandmetrics.ncsa.illinois.edu/

<insert Open OnDemand portal info>

Globus Endpoint: <insert globus endpoint>

Quick start guide: (for complete details see Documentation section on the left)

To connect to the cluster:

Code Block
ssh <username>@hal<username>@isl-login01.ncsa.illinois.edu

To submit interactive job:

Code Block

language	bash

swrunsrun --pty -p gpux1cpu -- bash

To submit a batch job:

Code Block
swbatchsbatch run_script.swbsb

Job Queue time limits:

"debug" queue: 4 hours"gpux<n>" gpu" and "cpun<n>cpu" queues: 24 hours

Resource limits:

5 concurrently running jobs
concurrently allocated resources
5 nodes

16 GPUs

<TBA>

For larger/more numerous jobs, please contact admins for a special arrangement and/or a reservation

To load the OpenCE module (provides PyTorch, Tensorflow and other ML tools):

Code Block
module load opence

To

see CLI scheduler status:

Code Block
swqueue

Main -> Systems -> HAL

Contact us

Request access to this system: Application<application>

Contact ISL staff: Email Address

Visit: NCSA, room 3050E

Child pages

Versions Compared

Old Version 4

New Version 5

Key

Contact us

Child pages

Page History

Versions Compared

Old Version 4

New Version 5

Key

Contact us