Delta offers support for container based work using apptainer, an HPC aware container engine. Some documentation on how to use existing Docker images available on DockerHub is provided in the Delta documentation. Containers can be used on both the login nodes and on compute nodes in jobs. There are facilities in place use containers with MPI and access the host's GPUs as well.

it does however not provide much suggestions on how to create images or modify them. One way of creating images is to write a Docker file, create the image on your laptop (since usually root  access is required), push to DockerHub, and finally pull it to Delta. Alternatively one can directly create apptainer definition files on Delta and build the images there. Both of these are however somehwhat cumbersome to interactively install software and explore how to use software.

One way to work around this is to use a writable overlay on top of an existing container image, which is what this HOWTO is about.

Writable images

Typically we would like to start from an existing container image of a typical distirbution (eg Ubuntu, CentOS, etc.) which one downloads from DockerHub using:

apptainer pull docker://ubuntu:latest

A basic command to run a program available in a container image is:

apptainer run ubuntu_latest.sif cat /etc/lsb-release

which will report the OS system used in the container e.g.,

INFO:    underlay of /etc/localtime required more than 50 (69) bind mounts
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

but the image is read-only, except for the /u/$USER  directory which apptainer makes available.

apptainer run ubuntu_latest.sif touch /etc/lsb-release
/usr/bin/touch: cannot touch '/etc/lsb-release': Permission denied

This can be worked around by using an overlay which is an extra image that is mounted on top and is writable.

First one creates an overlay image file of a given size (30GB in this example):

apptainer overlay create --fakeroot --size 30000 overlay.img

where --fakeroot  prepares the overlay image so that one can use Ubuntu's apt-get command . With this we can modify files and the modifications are held on the overlay image:

apptainer run --overlay overlay.img --fakeroot ubuntu_latest.sif touch /etc/lsb-release

Interactive usage

So far we ran each command individually using apptainer run  but did not "enter" the container permanently. Sometimes it is more convenient to obtain an interactive bash-shell in the container:

apptainer run --overlay overlay.img --fakeroot ubuntu_latest.sif bash -i

which you will notice reports your user name as root  due to the --fakeroot  option used:

[rhaas@dt-login03 /projects/bbka/rhaas]$ apptainer run --overlay overlay.img --fakeroot ubuntu_latest.sif bash -i
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    Using fakeroot command combined with root-mapped namespace
root@dt-login03:~#

Once in the container we can install software using apt-get  and pip  (if so desired) using available online docs for CUDA and jax:

apt-get update
apt-get install wget

# install CUDA
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
dpkg -i cuda-keyring_1.0-1_all.deb
apt-get update
apt-get install cuda # this will take a very long time
apt-get clean

# install jax
apt-get install python3-pip
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

though honestly you are likely better off starting from a base container from NVIDIA's NGC e.g. the pre-installed sw/external/NGC/tensorflow:22.06-tf2-py3 which will have to pull much fewer files (and not produce a 30GB overlay) and will start from a "known good" image.

Eventually this will finish and you can try this out using:

python3 -c 'import jax;print(jax.__version__)'
0.4.5

Use without fakeroot

Eventually you will want to use the container without fakeroot so that you appear as your usual user account in the container. This requires using the overlay in read-only mode which also, nicely, avoids you accidentally changing the overlay.

apptainer run --overlay overlay.img:ro  ubuntu_latest.sif bash -i

where the :ro  tag instructs apptainer to use the overlay in read-only mode, which is required if --fakeroot  is not used. Note that this gives your regular user name but is in the container:

rhaas@dt-login03:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

Using GPUs

There are no GPUs on the login node so we cannot do much else there. To get a job on a compute node we use SLURM and can get an interactive job using srun as documented on the Delta docs outside  of the container:

srun -A bbka-delta-gpu --time=00:30:00 --nodes=1 --gpus=1 --pty /bin/bash -i

note that important --pty  flag that instructs srun to provide a pseudoterminal in the same way that ssh  would. If not used, the interactive shell will not be very useful.

Now we can run in the job and instruct apptainer (again following the docs) to make the GPUs visible to the container using the --nv  option:

apptainer run --nv --overlay overlay.img:ro  ubuntu_latest.sif bash -i

and in the container:

rhaas@gpua047:~$ nvidia-smi
Thu Mar  9 15:58:39 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   28C    P0    54W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Writing to the host file system

So far by default the only accessible directory on the host from within the container is your $HOME  directory. For actual simulations you will want access to the /projects  and /scratch  file systems as well. This is most easily achieved using a bind bound:

apptainer run --bind /projects --bind /scratch --overlay overlay.img:ro  ubuntu_latest.sif bash -i 

More advanced stuff

Read the docs:

  • No labels