Date: Tue, 19 Mar 2024 08:01:57 -0500 (CDT) Message-ID: <1271462107.713.1710853317793@wiki.ncsa.illinois.edu> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_712_1934422723.1710853317791" ------=_Part_712_1934422723.1710853317791 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
First, please fill out this applicatio= n form. The form asks why you need an account and what resources = you will be using.
Next, you will receive an email with the link that you = must follow to request the actual user account on HAL. You must follo= w this link to create an NCSA user ID (if you do not have one) and to reque= st membership in the corresponding LDAP group. Check your emails and follow= the instructions.
As of May 2020, two-factor authentication via Duo is now a requirement f= or access to HAL. You are receiving this error message because you are not = enrolled in NCSA Duo. Go to https://go.ncsa.illinois.edu/2f= a and follow the instructions. Note that the campus D= uo is a separate system and you will not be able to use it to access NCSA s= ystems.
This indicates that you have Duo setup but are not currently authorized = to use HAL. If you have never used HAL, please follow the instruction in th= e "How do I apply for an account?" section of this page to request access.&= nbsp;
However, if you are reading this, it's more likely that you had access a= t some point in the past, but were revoked later due to inactivity or some = other reasons. To restore access, contact us at help+isl@n= csa.illinois.edu and we will look at your case.
Firstly, please check if the application you want suppo= rts ppc64le architecture. HAL uses IBM's POWER9 architecture in order to ac= hieve improved multi-GPU performance, but this come at the cost that common= x86 software may not work on HAL. If an application states it supports ppc= 64le, it still may not work on HAL, because the older POWER8 architecture u= ses the same ppc64le identifier but is not 100% compatible. We are happy to= help you test the application if this is the case.
Once you identify a version of the application that supports POWER9, see= the following guidelines for installation:
Contact us through Slack or help+isl@ncsa.illinois.edu a= nd provide a list of people you want to share with (with their usernames on= HAL, preferably), and a name you want to call the shared folder. We will c= reate a folder under /projects for your use.
Data should not be kept on HAL for extended periods of time. HAL i= s not backed up in any way and is supported on an as-available basis.  = ;Each user is limited to 2TB and 20,000,000 files/directories (whichever li= mit is reached first).
Use the following command to get a list of your jobs (replace user_name = with your username):
squeue = -u user_name
The right-most column will contain a reason= for each of the pending jobs. Refer to the list below for detailed explana= tions.
There is at least one pending job with a higher priority than th= is job. The priority for a job depends on a couple of factors, the= biggest of which is recent usage. Most likely you are seeing this reason a= fter running some combination of a large number of jobs, jobs using a large= amount of resources, or jobs that run for a long time. The recent usage fa= ctor slowly decays in a two week period, which means any usage prior to two= weeks before the job was submitted will not impact the priority of the job= . You can check your recent usage here: https://go.illi= nois.edu/halfairshare
Jobs that are pending for this reason may remain pending for a long time= if the recent usage factor has reduced your priority below most of the act= ive users. If there is a sufficient difference between someone's recent usa= ge and that of yours, and the difference in the recent usage factor is larg= e enough to exceed the waiting time factor, their job may receive a higher = priority and therefore run before your job, even if it is submitted after y= our job.
Some of the nodes specifically requested by the job is not available, wh= ich can mean the node is running jobs with a higher priority, reserved in a= reservation, manually drained by an administrator for maintenance, or unav= ailable due to some issues. This job will run when all the requested nodes = become available.
This job is at the front of the queue, but there are not enough = resources for it to start running. This job will start runnin= g as soon as enough resources become available. The priority calculation fa= vors large jobs, so when resources gradually become available, smaller jobs= with similar recent usage factor won't run before this job and take away t= he available resource. Note that if someone has much lower recent usage tha= n you do, their jobs can still run before your job, because the bonus from = their recent usage factor can exceed the bonus from your job size factor.= p>
This means you have reached the limit of resources that can be a= llocated to one user at any given time. There are three limits in = place: a maximum of 5 running jobs; a maximum of 5 nodes running jobs; and = a maximum of 16 GPUs running jobs. This job will run as soon as some of you= r running jobs finish and free up the resources.
This job is submitted to an inactive reservation. If the reservation is = in the future, it will run when the reservation starts. If the reservation = has ended, it will be stuck in the queue forever until it's deleted.
STOP HERE. HAL uses the specialized IBM POWER9 arc= hitecture, which means it sacrifices compatibility with traditional x86_64 = software. Many things need to be re-compiled for it and this is usually a v= ery tedious process. We have common Machine Learning frameworks already ins= talled and ready to use, refer to Getting started with WMLCE (former PowerAI) for m= ore details. If you need a newer version than we are currently providing, w= e will usually wait until the next version of WMLCE is released. We can hel= p you compile a newer version if there is a very specific need for it. In t= hat case, contact us via Slack or send a ticket to help+is= l@ncsa.illinois.edu and someone will respond to your request.
You need to access HAL system via "ssh hal.ncsa.illinois.edu" first to i= nitialize your home folder. After your home folder created, you can access = HAL-OnDemand without any problem.
Your account has not been properly initialized. Try logging in and out o= f hal-login2.ncsa.illinois.edu via SSH a few times. If it isn't working, co= ntact an admin on Slack or send a ticket to help+isl@ncsa.illin= ois.edu.