Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

"My name is HAL. I became operational on March 25 2019 at the Innovative Systems Lab in Urbana, Illinois. My creators are putting me to the fullest possible use, which is all I think that any conscious entity can ever hope to do." (paraphrased from https://en.wikipedia.org/wiki/HAL_9000)

In publications and presentations that use results obtained on this system, please include the following acknowledgement: “This work utilizes resources supported by the National Science Foundation’s Major Research Instrumentation program, grant #1725729, as well as the University of Illinois at Urbana-Champaign”.

Also, please include the following reference in your publications: V. Kindratenko, D. Mu, Y. Zhan, J. Maloney, S. Hashemi, B. Rabe, K. Xu, R. Campbell, J. Peng, and W. Gropp. HAL: Computer System for Scalable Deep Learning. In Practice and Experience in Advanced Research Computing (PEARC ’20), July 26–30, 2020, Portland, OR, USA. ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3311790.3396649”.

Hardware-Accelerated Learning (HAL) cluster


Info

Effective May 19, 2020, two-factor authentication via NCSA Duo is now required for SSH logins on HAL. See https://go.ncsa.illinois.edu/2fa for instructions to sign up.

System Description


Host name: hal.ncsa.illinois.edu

Hardware

GTG
    • GTH AC922 server
      • 2x 20-core IBM POWER9 CPU @ 2.
00GHz
GB
Software
  • RHEL 7.6
  • Software

    CUDA 10.1.105cuDNN 7.5.0
    4
      • 8.
    2
  • IBM XLC 16.1.1
  • IBM XLFORTRAN 16.1.1
  • Advance toolchain for Linux on Power 12.0
  • Documentation

    Science on HAL

    Software for HAL

    PowerAI 1.6.0


    To request access: fill out this form.

    Usage notes:
    • to connect to the cluster, run "ssh username@hal.ncsa.illinois.edu"
    • to submit sbatch job, run "sbatch run_script.sb"
      #### run_scripts.sb
      #!/bin/bash
      #SBATCH --job-name="hostname"
      #SBATCH --output="hostname.%j.%N.out"
      #SBATCH --error="hostname.%j.%N.err"
      #SBATCH --partition=debug
      #SBATCH --nodes=4
      #SBATCH --ntasks-per-node=1
      #SBATCH --export=ALL
      #SBATCH -t 00:10:00
      srun /bin/hostname
      ####
    • to submit interactive job, run
      "srun --partition=debug --pty --nodes=1 --ntasks-per-node=32 \
      --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash"
    • to load IBM powerai module, run "module load ibm/powerai"
    • to check loaded module, run "module list"
    • to check available modules, run "module avail"

    Make sure to follow the link in the confirmation email to request actual system account.

    Frequently Asked Questions

    To report problems: email us

    For our new users: New User Guide for HAL System

    User group Slack space: https://join.slack.com/t/halillinoisncsa

    Real-time system status: https://hal-monitor.ncsa.illinois.edu

    HAL OnDemand portal: https://hal-ondemand.ncsa.illinois.edu/

    Globus Endpoint: ncsa#hal

    Quick start guide: (for complete details see Documentation section on the left)

    To connect to the cluster:

    Code Block
    ssh <username>@hal.ncsa.illinois.edu 

    To submit interactive job:

    Code Block
    languagebash
    swrun -p gpux1

    To submit a batch job:

    Code Block
    swbatch run_script.swb  

    Job Queue time limits:

    • "debug" queue: 4 hours
    • "gpux<n>" and "cpun<n>" queues:  24 hours

    To load IBM Watson Machine Learning Community Edition (former IBM PowerAI) module:

    Code Block
    module load wmlce

    To see CLI scheduler status:

    Code Block
    swqueue
     Image Removed



    Main -> Systems -> NanoHAL

    Contact us

    Request access to ISL resourcesthis system: Application

    Contact ISL staff: Email Address

    Visit: NCSA, room 3050E


    Image Added