Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

User Guide 

Funded through the Major Research Instrumentation grant from the National Science Foundation ROGER (Resourcing Open Geo-spatial Education and Research) named after Roger Tomlinson (Father of GIS) - is a cutting edge national facility established the CyberGIS Center for Advanced Digital and Spatial Studies to foster geospatial discovery and innovation.

If you run into any issues utilizing ROGER or have questions please create a service request by sending email to help+ROGER@ncsa.illinois.edu.
This guide primarily covers the traditional HPC portion of ROGER and a survey of the Hadoop services. Information for the OpenStack portion of ROGER is in preparation.

Connecting

The ROGER cluster can be accessed via Secure Shell (SSH) to the login nodes using your NCSA login and password.

ssh username@roger-login.ncsa.illinois.edu
Network Details for ROGER:

The ROGER cluster is housed in the National Petascale Computing Facility (NPCF) and interconnected via NCSA's core network to the UIUC campus and multiple external national research and education networks. Inbound and Outbound communication from UIUC to ROGER should work without issue. 

How to request an account

Please refer to ROGER Allocations Request.

Managing your Account

Default Shell

When your account is first activated, the default shell is set to bash. If you wish to change your shell, please send a request to help+ROGER@ncsa.illinois.edu indicating your desired shell, or perform the following workaround for tcsh:

To change your shell to tcsh, add the following line:

exec -l
/bin/tcsh

to the end of the file named .bash_profile , located in your home ($HOME) directory. To begin using this new shell, you can either log out and then log back in, or execute exec -l /bin/tcsh on your command line.

Password

You can reset your password at NCSA's password change page.

Storage

Small Block File System (/gpfs/smallblockFS)

Home Directory Space:

Your home directory is the default directory you are placed in when you log on. You should use this space for storing files you want to keep long term such as source code, scripts, etc. The soft limit is 10 GB, but no hard limit is currently being enforced. This is kept small because this is your personal space, not your project space. The PI who has requested accounts has full access to the home directories and the project directory. The shortcut for your home folder is ~.

NOTE: Computation should not be done in this space as few disks are allocated to it, with the intent that very limited data will reside here.  While it is possible to run jobs on data at this mount point, performance will not be very good.

Large Block File System (/gpfs/largeblockFS)

Scratch Space: 

The scratch filesystem is shared storage space available to all users. It is intended for short term use and should be considered volatile. You can access scratch by creating a directory under /gpfs_scratch/.  Jobs should be run in either this space or project space as most disk is allocated to this file system which allows it to be very performant.  

Note: During the execution of a batch job there is also a node-local scratch directory at /scratch/$PBS_JOBID. That directory is created at the start of each job and removed at the end of the job.

Project Space:

Your project directory is where the entire project is storing files you want to keep long term that is to be shared across the project. Each project has a soft limit of 10 TB. If more space is needed for a project, it can be requested by sending a help ticket to help+ROGER@ncsa.illinois.edu. The PI who has requested accounts has full access to the home directories and the project directory.

Hadoop File System (/gpfs/hadoopFS)

Mounted only on the Hadoop nodes, serves as an alternative backend store for Ambari on the GPFS File System.  Currently has few disks allocated to it giving it similar performance to the nodes internal SSD.

Data Protection

No off-site backups for disaster recovery are provided for any storage. Please make sure to do your own backups of any important data from ROGER to permanent storage as often as necessary.

Data Transfer

Data transfers can be initiated via Globus Online's GridFTP data transfer utility as well as SSH based tools scp (Secure Copy) and sftp (Secure FTP).

GridFTP

The CyberGIS project recommends using Globus Online for large data transfers. Globus Online manages the data transfer operation for the user: monitoring performance, retrying failures, auto-tuning and recovering from faults automatically where possible, and reporting status. Email is sent when the transfer is complete. Live transfer status can viewed on the Globusonline website.

Globus Online implements data transfer between machines through a web interface using the GridFTP protocol. There is a predefined GridFTP endpoint for the ROGER cluster to allow data movement between the ROGER cluster and other resources registered with Globus Online. To transfer data between Roger and a non registered resource, Globus Online provides a software package called Globus Connect that allows for the creation of a personal GridFTP endpoint for virtually any local resource.

In order to user Globus Onlineon ROGER, please request an NCSA RSA token for use with our Two-Factor Authentication service. You will need this in order to active ROGER's endpoint, ncsa#roger.

Steps to use Globus Online (GO) for ROGER data transfers
Data transfer to existing GO endpointCreate a new GO endpoint for data transfers
  • Type in or select one of your target endpoints from the 1st pull down selection box.
  • Activate the the endpoint.
  • Download and Install the Globus Connect software for your OS.
    Note: The Globus Connect software should be installed on the machine that you want to setup as an endpoint.
  • Type in your endpoint name that you created during the Globus Connect installation in the 1st endpoint selection box.
  • Type in or select "ncsa#roger" for your other endpoint in the 2nd pull down selection box.
  • Activate the ncsa#roger endpoint by authenticating using your official NCSA login and password.
  • Highlight the data to be transferred and click the appropriate transfer arrow between the two endpoint selection boxes.

SSH

For initiating data transfers from ROGER, the SSH based tools sftp (Secure FTP) or scp (Secure Copy) can be used.

A variety of SSH based clients are available for initiating transfers from your local system. There are two types of SSH clients, clients that support both remote login access and data transfers and clients that support data transfers only.

SSH ClientRemote LoginData TransferInstalls On
MobaXterm is an enhanced terminal with an X server and a set of Unix commands (GNU/Cygwin) packaged in application.YesYesWindows
SSH Secure Shell allows you to securely login to remote host computers, to execute commands safely on a remote computer, and to provide secure encrypted and authenticated communications between two hosts in an untrusted network.YesYesWindows
Tunnelier is a flexible SSH client which includes terminal emulation, graphical as well as command-line SFTP support, an FTP-to-SFTP bridge, additional tunneling features including dynamic port forwarding through integrated proxy.YesYesWindows
PuTTY is an open source terminal emulator application which can act as a client for the SSH, Telnet, rlogin, and raw TCP computing protocols and as a serial console client.YesYes*Windows
Linux
Mac OS
FileZilla is a fast and reliable cross-platform FTP, FTPS and SFTP client with lots of useful features and an intuitive graphical user interface.NoYesWindows
Linux
Mac OS
WinSCP is an open source free SFTP client, SCP client, FTPS client and FTP client for Windows. Its main function is file transfer between a local and a remote computer. Beyond this, WinSCP offers scripting and basic file manager functionality.NoYesWindows
FireFTP is a free, secure, cross-platform FTP/SFTP client for Mozilla Firefox which provides easy and intuitive access to FTP/SFTP servers.NoYesFirefox(Add-On)
*PuTTY's scp and sftp data transfer functionality is implemented via Command Line Interface (CLI) by default.

Managing Your Environment Using the Modules Command

The module command is a user interface to the Modules package. The Modules package provides for the dynamic modification of the user's environment via modulefiles (a modulefile contains the information needed to configure the shell for an application). Modules are independent of the user's shell, so both tcsh and bash users can use the same commands to change the environment.

Useful Module Commands:

CommandDescription
module availlists all available modules
module listlists currently loaded modules
module help modulefilehelp on module modulefile
module display modulefileDisplay information about modulefile
module load modulefileload modulefile into current shell environment
module unload modulefileremove modulefile from current shell environment
module swap modulefile1 modulefile2unload modulefile1 and load modulefile2

 

To include particular software in the environment for all new shells, edit your shell configuration file ($HOME/.bashrc for bash users and $HOME/.cshrc for tcsh users) by adding the module commands to load the software that you want to be a part of your environment. After saving your changes, you can source your shell configuration file or log out and then log back in for the changes to take effect.

Note: Order is important. With each module load, the changes are prepended to your current environment paths.

For additional information on Modules, see the module and modulefile man pages or visit the Modules SourceForge page.

Currently Installed Software

Numerous geospatial software and other software are installed on the HPC portion of ROGER. To see them, type module avail.

When a module name is followed by (default), that is the version that will be loaded you don't specify a version. For instance, module load gdal will load gdal version 1.11.3.

Programming Environment

Compilers

The GNU compilers (GCC) version 4.4.7 are in the default user environment.

Compiler Commands

Serial: Gnu Compilers

To build (compile and link) a serial program in Fortran, C, or C++ enter:

Code

Build Command
Fortran
gfortran myprog.f   
C
gcc myprog.c
C++
g++ myprog.cc
MPI
MPI Implementationmodulefile for MPI/CompilerBuild Commands

MPICH



OpenMPI

mpich/3.1.4 (default}

MPICH/3.1.4-GCC-4.9.2-binutils-2.25


OpenMPI/1.8.5-GCC-4.9.2-binutils-2.25-no-OFED

Fortran 77: mpif77 myprog.f
        
Fortran 90: mpif90 myprog.f90
C: mpicc  myprog.c
C++: mpicxx myprog.cc
OpenMP

To build an OpenMP program, use the -openmp / -fopenmp option:

GCC
gfortran -fopenmp myprog.f gcc -fopenmp
myprog.c g++ -fopenmp myprog.cc
Hybrid MPI/OpenMP

To build an MPI/OpenMP hybrid program, use the -openmp / -fopenmp option with the MPI compiling commands:

GCC
mpif77 -fopenmp myprog.f mpif90 -fopenmp
myprog.f90 mpicc -fopenmp myprog.c mpicxx -fopenmp
myprog.cc 

CUDA

NVIDIA K40 GPUs are available in a subset of the ROGER nodes. CUDA is a parallel computing platform and programming model from NVIDIA for use on their GPUs. These GPUs support CUDA compute capability 2.0.

Load the CUDA Toolkit into your environment using the following module:

module load cuda/7.0

Running Jobs

User access to the compute nodes for running jobs is only available via a batch job. The ROGER Cluster uses the Torque Resource Manager for running batch jobs. Torque is based on OpenPBS, so the commands are the same as PBS commands. See the qsub section under Batch Commands below for details on batch job submission.

An interactive batch job provides a way to get interactive access to a compute node via a batch job. See the qsub -I section for information on how to run an interactive job on the compute nodes. 

To ensure the health of the batch system and scheduler users should refrain from having more than 500 of their batch jobs in the queues at any one time.

Running Programs

On successful building (compilation and linking) of your program, an executable is created that is used to run the program. The table below describes how to run different types of programs.

Program TypeHow to run the program/executableExample Command
SerialTo run serial code, specify the name of the executable../a.out
MPI

MPI programs are run with the mpiexec command followed by the name of the executable.
Note: The total number of MPI processes is the {number of nodes} x {cores/node} set in the batch job resource specification.

mpiexec ./a.out
OpenMPThe OMP_NUM_THREADS environment variable can be set to specify the number of threads used by OpenMP programs. If this variable is not set, the number of threads used defaults to one under the Intel compiler. Under GCC, the default behavior is to use one thread for each core available on the node.
To run OpenMP programs, specify the name of the executable.
In bash:
export OMP_NUM_THREADS=12
In tcsh:
setenv OMP_NUM_THREADS 12

./a.out
MPI/OpenMPAs with OpenMP programs, the OMP_NUM_THREADS environment variable can be set to specify the number of threads used by the OpenMP portion of the mixed MPI/OpenMP program. The same default behavior applies with respect to the number of threads used.
Use the mpiexec command followed by the name of the executable to run mixed MPI/OpenMP programs.
Note: The number of MPI processes per node is set in the batch job resource specification for number of cores/node.
In bash:
export OMP_NUM_THREADS=4
In tcsh:
setenv OMP_NUM_THREADS 4

mpiexec ./a.out

Primary Queue

Each CyberGIS group has unrestricted access to the dedicated batch queue with concurrent access to the number and type of nodes in which they are allowed.

Interactive Queue

This queue allows faster access to the nodes for short interactive jobs during Monday - Friday between the hours of 0800 to 2100. Only interactive jobs are allowed to use this queue.

Batch Commands

Below are brief descriptions of the primary batch commands. For more detailed information, refer to the individual man pages.

  • qsub

    Batch jobs are submitted through a job script using the qsub command. Job scripts generally start with a series of PBS directives that describe requirements of the job such as number of nodes, wall time required, etc. to the batch system/scheduler (PBS directives can also be specified as options on the qsub command line; command line options take precedence over those in the script). The rest of the batch script consists of user commands.

    A sample batch script for submitting a job is outlined below 

    Sample Batch Script (sample.qsub)
    # declare a name for this job to be sample_job
    #PBS -N sample_job
    # request a total of 40 processors for this job 
    #   (2 nodes and 20 processors(core) per node)
    #PBS -l nodes=2:ppn=20
    # request 4 hours of wall clock time
    #PBS -l walltime=04:00:00
    # combine PBS standard output and error files
    #PBS -j oe 
    # mail is sent to you when the job starts and when it terminates or aborts
    #PBS -m bea
    # specify your email address 
    #PBS -M name@email.com
    
    #change to the directory where you submitted the job 
    cd $PBS_O_WORKDIR
    #include the full path to the name of your MPI program
    mpirun -np $PBS_NP /path_to_executable/program_name
    exit 0

    The syntax for qsub is:

           qsub [list of qsub options] script_name 
    
    The main qsub options are listed below. Also see the qsub man page for other options.

     

    • -l resource-list: specifies resource limits. (Note: For clarity, the resource-list flag is "dash small L".) The resource_list argument is of the form:
                 resource_name[=[value]][,resource_name[=[value]],...]:resource 
      

      The common resource_names are:
      walltime=time
      time=maximum wall clock time (hh:mm:ss) [default: 30 mins]


      nodes=n:ppn=p
      n=number of 20-core nodes [default: 1 node]
      p=how many cores per node to use (1 through 20) [default: ppn=1]

      Examples:

      -l walltime=00:30:00,nodes=2:ppn=12

      [For users porting from other systems, note that the -l ncpus syntax may not work as expected. So do not use it - please only use ppn to specify cores per node.]

      Specifying nodes with GPUs: To run jobs on nodes with an NVIDIA Tesla K40M GPU, add the resource specification "gpu". Optionally you can specify "nogpu" to obtain nodes without a GPU. Since the GPU and non-GPU nodes are identical other than the presence of a GPU you should normally not need this flag.

      Example: -l walltime=00:30:00,nodes=4:ppn=20:gpu  will assign four nodes, each with a GPU and 20 cores for 30 minutes maximum wall time.


    • -q queue_name: specify queue name.[batch|devel]

    • -N jobname: specifies the job name.

    • -W depend=dependency_list: defines the dependency between current and other jobs. See example jobscript.

    • -t array_request: Specifies the task ids of a job array. The array_request argument is an integer id or a range of integers. Multiple ids or id ranges can be combined in a comma delimted list. See example jobscript.

    • -o out_file: store the standard output of the job to file out_file. After the job is done, this file will be found in the directory from which the qsub command was issued. [default :<jobname>.o<JobID>]

    • -e err_file: store the standard error of the job to file err_file. After the job is done, this file will be found in the directory from which the qsub command was issued. [default :<jobname>.e<JobID>]

    • -j oe: merge standard output and standard error into standard output file.

    • -V: export all your environment variables to the batch job.

    • -m be: send mail at the beginning and end of a job.

    • -M myemail@myuniv.edu : send any email to given email address.

    • -X: enables X11 forwarding.

    Useful PBS Environment Variables

    Job ID$PBS_JOBIDthe job identifier assigned to the job
    Job Submission Directory$PBS_O_WORKDIRBy default, jobs start in the user's home directory. To go to the directory from which the job was submitted, use the following line in the batch script:
    cd $PBS_O_WORKDIR
    Node List$PBS_NODEFILEThe name of the file containing the hostnames of the nodes assigned to the job one per line. Note: hostnames are listed more than once if multiple core (ppn > 1) have been requested.
    Total Core Requested$PBS_NPThe total number of core requested (number of nodes x number of cores)
    Array Job ID$PBS_ARRAYIDeach member of a job array is assigned a unique identifier (see the Job Arrays section)

    See the qsub man page for additional environment variables available.

     

  • qsub -I (Running interactive jobs on devel queue)

    The -I option tells qsub you want to run an interactive job on the compute nodes. (Note: For clarity, the interactive flag is "dash capital I".) For example, the following command:

           [ cg-gpu01 ~]$ qsub -I -l walltime=00:30:00,nodes=1:ppn=12 -q devel

    will run an interactive job with a wall clock limit of 30 minutes, using one node and 12 cores on that node in the devel queue. You can also use other qsub options such as those documented above.

    After you enter the command, you will have to wait for Torque to start the job. As with any job, your interactive job will wait in the queue until the specified number of nodes is available. If you specify a small number of nodes for smaller amounts of time, the wait should be shorter because your job will backfill among larger jobs. Note that the using the devel queue during the hours of 0800-2100 Mondays through Fridays will get you faster access, but you will be limited to 1 hour job lengths.

     You will see something like this after running the qsyb command:

           qsub: waiting for job 145.cg-gpu01 to start 

    Once the job starts, you will see:

           qsub: job 145.cg-gpu01 ready 

    and will be presented with an interactive shell prompt on the launch node. At this point, you can use the appropriate command to start your program.

    When you are done with your runs, you can use the exit command to end the job.

  • qstat

    The qstat command displays the status of batch jobs.
    • qstat -a gives the status of all jobs on the system.
    • qstat -u $USER gives the status of your jobs.
    • qstat -n JobID lists nodes allocated to a running job in addition to basic information.
    • qstat -f JobID gives detailed information on a particular job.
    • qstat -q provides summary information on all the queues.
    • qstat -t JobID[] gives the status of all the jobs within a job array. Use JobID[<index>] to display the status of a specific job within a job array.
    • Note: You only need to use the numeric part of the Job ID when specifying JobID.

    See the man page for other options available.

  • qdel

    The qdel command deletes a queued job or kills a running job. To delete/kill jobs within a job array the square brackets "[]" must be specified with the JobID.

    • qdel JobID deletes/kills a job.
    • qdel JobID[] deletes/kills the entire job array.
    • qdel JobID[<index>] deletes/kills a specific job of the job array.
    • Note: You only need to use the numeric part of the Job ID when specifying JobID.

  • js

    The js command outputs the jobscript file for a running or previous job.
    • js JobID
    • Note: You only need to use the numeric part of the Job ID when specifying JobID.

Job Dependencies

PBS job dependencies allow users to set execution order in which their queued jobs run. Job dependencies are set by using the -W option with the syntax being -W depend=<dependency type>:<JobID>. PBS places the jobs in Hold state until they are eligible to run.

The following are examples on how to specify job dependencies using the afterany dependency type, which indicates to pbs that the dependent job should become eligible to start only after the specified job has completed.

On the command line:

   [cg-gpu01 ~]$ qsub -W depend=afterany:<JobID> jobscript.pbs 

In a job script:

#!/bin/bash
#PBS -l walltime=00:30:00
#PBS -l nodes=1:ppn=12
#PBS -N myjob
#PBS -j oe
#PBS -W depend=afterany:<JobID>        

In a shell script that submits batch jobs:

 #!/bin/bash JOB_01=`qsub jobscript1.pbs` JOB_02=`qsub -W depend=afterany:$JOB_01 jobscript2.pbs` \
JOB_03=`qsub -W depend=afterany:$JOB_02 jobscript3.pbs`

Note: Generally the recommended dependency types to use are before, beforeany, after, and afterany. While there are additional dependency types, those types that work based on batch job error codes may not behave as expected because of the difference between a batch job error and application errors. See the dependency section of the qsub manual page for additional information (man qsub).

Using GNU Parallel Within Batch Jobs

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input into blocks and pipe a block into each command in parallel. Additional details can be found at using GNU Parallel within batch jobs.

HPC & Other Tutorials

CyberGIS Training and Tutorials

The NSF funded XSEDE program offers online training on various HPC topics - see XSEDE Online Training for links to the available courses.

Introduction to Linux offered by the LINUX Foundation (classes start 3rd Quarter 2014).

HPC Training from CSE

OpenStack on ROGER

OpenStack is an open source, cloud computing environment designed for controlling large pools of computing, storage and network resources. With OpenStack, user may create instances (virtual servers) by allocating a portion of the available resources using an online dashboard. The instance may be semi-permanent to support web servers and other internet services or temporary to meet specific computing demands.

OpenStack version Juno is currently available on ROGER while the newer version Kilo is expected to enter friendly user mode in the first quarter of 2016 Update: Kilo version is now in beta testing. Contact CyberGIS staff for additional information or to request access.

Hadoop on ROGER

Users may request access to the Hadoop services on ROGER from the CyberGIS staff by sending an email requesting access to help+roger@ncsa.illinois.edu

Users granted access to the Hadoop services can log in directly to roger-login from any location and then ssh cg-hm08. On node cg-hm08, users will have access the hdfs filesystem and the yarn scheduler for mapreduce jobs. Other services we run include Hbase, Hcat, and Storm.

The hdfs filesystem is built from a GPFS, with 1.9 PB storage available. The login node for hadoop has both your gpfs home and project directories available, for easy transfer to/from hdfs.

We are currently running Ambari 2.2. Please contact the CyberGIS staff for help in using the Hadoop services.

For details see here.

Citation and Acknowledgement

To acknowledge ROGER use in your publication please use the following instructions.

  • No labels