Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Use the image_classification benchmark from the official MLCommons training benchmark repository
  2. Use Nvidia's implementation of the benchmark found here

*Note that while there are prebuilt images for the inference benchmark, I have not found any prebuilt images for the training benchmarks which is what we are trying to use.

MLCommon Implementation

MLCommon's given instructions for running each benchmark are as follows:

...

What has been attempted : The Nvidia implementation comes with a Dockerfile, so I converted this to an apptainer def file with spython. Building the def file with apptainer to a sif image though encounters an issue, "ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'" which doesn't make sense to me as its in the directory and is copied over before the install command is run in the dockerfile and def. If somebody with some experience with docker/apptainer could look at this that would be useful.

Blocking Problem : The conversion of Nvidia's dockerfile fails when trying to build the converted apptatiner def file with apptainer.

...

Jim suggested using kaniko to build the Nvidia dockerfile to an image with docker so we could just run the docker built image with apptainerwhich is a tool to build container images from a Dockerfile, inside a container. This would get rid of any issues with the conversion between Dockerfile and apptainer def file.

What has been attempted To test whether this could work I tried building the docker hello-world example in kaniko first. This is the latest command that I was trying with it :

apptainer run --fakeroot docker://gcr.io/kaniko-project/executor:latest --no-push --tar-path=mybuild.tar --dockerfile=Dockerfile.build 

This fails with:

FATAL:   failed to open /bin/sh for inspection: failed to open elf binary /bin/sh: open /bin/sh: no such file or directory

I was tried using --bind to fix this but it appears there is some issue with using the --fakeroot option and --bind at the same time. Please try to verify that I'm not just messing it up though. 

Blocking Problem : Can't seem to run the kaniko image with apptainer properly or I am doing something wrong (probably the latter).