diff --git a/doc.zih.tu-dresden.de/docs/software/containers.md b/doc.zih.tu-dresden.de/docs/software/containers.md index a67a4a986881ffe09a16582adfeda719e6f90ccd..93c2762667be1e5addecff38c3cf38d08ac60d7e 100644 --- a/doc.zih.tu-dresden.de/docs/software/containers.md +++ b/doc.zih.tu-dresden.de/docs/software/containers.md @@ -2,94 +2,112 @@ [Containerization](https://www.ibm.com/cloud/learn/containerization) encapsulating or packaging up software code and all its dependencies to run uniformly and consistently on any infrastructure. On -Taurus [Singularity](https://sylabs.io/) used as a standard container solution. Singularity enables -users to have full control of their environment. This means that you don’t have to ask an HPC -support to install anything for you - you can put it in a Singularity container and run! As opposed -to Docker (the most famous container solution), Singularity is much more suited to being used in an -HPC environment and more efficient in many cases. Docker containers can easily be used in -Singularity. Information about the use of Singularity on Taurus can be found [here]**todo link**. - -In some cases using Singularity requires a Linux machine with root privileges (e.g. using the ml -partition), the same architecture and a compatible kernel. For many reasons, users on Taurus cannot -be granted root permissions. A solution is a Virtual Machine (VM) on the ml partition which allows -users to gain root permissions in an isolated environment. There are two main options on how to work -with VM on Taurus: - -1. [VM tools]**todo link**. Automative algorithms for using virtual machines; -1. [Manual method]**todo link**. It required more operations but gives you more flexibility and reliability. +ZIH systems [Singularity](https://sylabs.io/) is used as a standard container solution. Singularity +enables users to have full control of their environment. This means that you don’t have to ask the +HPC support to install anything for you - you can put it in a Singularity container and run! As +opposed to Docker (the most famous container solution), Singularity is much more suited to being +used in an HPC environment and more efficient in many cases. Docker containers can easily be used in +Singularity. Information about the use of Singularity on ZIH systems can be found on this page. + +In some cases using Singularity requires a Linux machine with root privileges (e.g. using the +partition `ml`), the same architecture and a compatible kernel. For many reasons, users on ZIH +systems cannot be granted root permissions. A solution is a Virtual Machine (VM) on the partition +`ml` which allows users to gain root permissions in an isolated environment. There are two main +options on how to work with Virtual Machines on ZIH systems: + +1. [VM tools](virtual_machines_tools.md): Automative algorithms for using virtual machines; +1. [Manual method](virtual_machines.md): It requires more operations but gives you more flexibility + and reliability. ## Singularity -If you wish to containerize your workflow/applications, you can use Singularity containers on -Taurus. As opposed to Docker, this solution is much more suited to being used in an HPC environment. -Existing Docker containers can easily be converted. +If you wish to containerize your workflow and/or applications, you can use Singularity containers on +ZIH systems. As opposed to Docker, this solution is much more suited to being used in an HPC +environment. -ZIH wiki sites: +!!! note -- [Example Definitions](singularity_example_definitions.md) -- [Building Singularity images on Taurus](vm_tools.md) -- [Hints on Advanced usage](singularity_recipe_hints.md) + It is not possible for users to generate new custom containers on ZIH systems directly, because + creating a new container requires root privileges. -It is available on Taurus without loading any module. +However, new containers can be created on your local workstation and moved to ZIH systems for +execution. Follow the instructions for [locally install Singularity](#local-installation) and +[container creation](#container-creation). Moreover, existing Docker container can easily be +converted, which is documented [here](#importing-a-docker-container). -### Local installation +If you are already familar with Singularity, you might be more intressted in our [singularity +recipes and hints](singularity_recipe_hints.md). -One advantage of containers is that you can create one on a local machine (e.g. your laptop) and -move it to the HPC system to execute it there. This requires a local installation of singularity. -The easiest way to do so is: +### Local Installation -1. Check if go is installed by executing `go version`. If it is **not**: +The local installation of Singularity comprises two steps: Make `go` available and then follow the +instructions from the official documentation to install Singularity. -```Bash -wget <https://storage.googleapis.com/golang/getgo/installer_linux> && chmod +x -installer_linux && ./installer_linux && source $HOME/.bash_profile -``` +1. Check if `go` is installed by executing `go version`. If it is **not**: -1. Follow the instructions to [install Singularity](https://github.com/sylabs/singularity/blob/master/INSTALL.md#clone-the-repo) + ```console + marie@local$ wget <https://storage.googleapis.com/golang/getgo/installer_linux> && chmod +x + installer_linux && ./installer_linux && source $HOME/.bash_profile + ``` -clone the repo +1. Instructions to + [install Singularity](https://github.com/sylabs/singularity/blob/master/INSTALL.md#clone-the-repo) + from the official documentation: -```Bash -mkdir -p ${GOPATH}/src/github.com/sylabs && cd ${GOPATH}/src/github.com/sylabs && git clone <https://github.com/sylabs/singularity.git> && cd -singularity -``` + Clone the repository -Checkout the version you want (see the [Github releases page](https://github.com/sylabs/singularity/releases) -for available releases), e.g. + ```console + marie@local$ mkdir -p ${GOPATH}/src/github.com/sylabs + marie@local$ cd ${GOPATH}/src/github.com/sylabs + marie@local$ git clone https://github.com/sylabs/singularity.git + marie@local$ cd singularity + ``` -```Bash -git checkout v3.2.1\ -``` + Checkout the version you want (see the [GitHub releases page](https://github.com/sylabs/singularity/releases) + for available releases), e.g. -Build and install + ```console + marie@local$ git checkout v3.2.1 + ``` -```Bash -cd ${GOPATH}/src/github.com/sylabs/singularity && ./mconfig && cd ./builddir && make && sudo -make install -``` + Build and install -### Container creation + ```console + marie@local$ cd ${GOPATH}/src/github.com/sylabs/singularity + marie@local$ ./mconfig && cd ./builddir && make + marie@local$ sudo make install + ``` -Since creating a new container requires access to system-level tools and thus root privileges, it is -not possible for users to generate new custom containers on Taurus directly. You can, however, -import an existing container from, e.g., Docker. +### Container Creation -In case you wish to create a new container, you can do so on your own local machine where you have -the necessary privileges and then simply copy your container file to Taurus and use it there. +!!! note -This does not work on our **ml** partition, as it uses Power9 as its architecture which is -different to the x86 architecture in common computers/laptops. For that you can use the -[VM Tools](vm_tools.md). + It is not possible for users to generate new custom containers on ZIH systems directly, because + creating a new container requires root privileges. -#### Creating a container +There are two possibilities: -Creating a container is done by writing a definition file and passing it to +1. Create a new container on your local workstation (where you have the necessary privileges), and + then copy the container file to ZIH systems for execution. +1. You can, however, import an existing container from, e.g., Docker. -```Bash -singularity build myContainer.sif myDefinition.def -``` +Both methods are outlined in the following. + +#### New Custom Container + +You can create a new custom container on your workstation, if you have root rights. + +!!! attention "Respect the micro-architectures" -NOTE: This must be done on a machine (or [VM](virtual_machines.md) with root rights. + You cannot create containers for the partition `ml`, as it bases on Power9 micro-architecture + which is different to the x86 architecture in common computers/laptops. For that you can use + the [VM Tools](virtual_machines_tools.md). + +Creating a container is done by writing a **definition file** and passing it to + +```console +marie@local$ singularity build myContainer.sif <myDefinition.def> +``` A definition file contains a bootstrap [header](https://sylabs.io/guides/3.2/user-guide/definition_files.html#header) @@ -99,20 +117,26 @@ where you install your software. The most common approach is to start from an existing docker image from DockerHub. For example, to start from an [Ubuntu image](https://hub.docker.com/_/ubuntu) copy the following into a new file -called ubuntu.def (or any other filename of your choosing) +called `ubuntu.def` (or any other filename of your choice) -```Bash -Bootstrap: docker<br />From: ubuntu:trusty<br /><br />%runscript<br /> echo "This is what happens when you run the container..."<br /><br />%post<br /> apt-get install g++ +```bash +Bootstrap: docker +From: ubuntu:trusty + +%runscript + echo "This is what happens when you run the container..." + +%post + apt-get install g++ ``` -Then you can call: +Then you can call -```Bash -singularity build ubuntu.sif ubuntu.def +```console +marie@local$ singularity build ubuntu.sif ubuntu.def ``` And it will install Ubuntu with g++ inside your container, according to your definition file. - More bootstrap options are available. The following example, for instance, bootstraps a basic CentOS 7 image. @@ -131,23 +155,25 @@ Include: yum ``` More examples of definition files can be found at -https://github.com/singularityware/singularity/tree/master/examples +https://github.com/singularityware/singularity/tree/master/examples. + +#### Import a Docker Container + +!!! hint -#### Importing a docker container + As opposed to bootstrapping a container, importing from Docker does **not require root + privileges** and therefore works on ZIH systems directly. You can import an image directly from the Docker repository (Docker Hub): -```Bash -singularity build my-container.sif docker://ubuntu:latest +```console +marie@local$ singularity build my-container.sif docker://ubuntu:latest ``` -As opposed to bootstrapping a container, importing from Docker does **not require root privileges** -and therefore works on Taurus directly. - -Creating a singularity container directly from a local docker image is possible but not recommended. -Steps: +Creating a singularity container directly from a local docker image is possible but not +recommended. The steps are: -```Bash +```console # Start a docker registry $ docker run -d -p 5000:5000 --restart=always --name registry registry:2 @@ -165,109 +191,122 @@ From: alpine $ singularity build --nohttps alpine.sif example.def ``` -#### Starting from a Dockerfile +#### Start from a Dockerfile -As singularity definition files and Dockerfiles are very similar you can start creating a definition +As Singularity definition files and Dockerfiles are very similar you can start creating a definition file from an existing Dockerfile by "translating" each section. -There are tools to automate this. One of them is \<a -href="<https://github.com/singularityhub/singularity-cli>" -target="\_blank">spython\</a> which can be installed with \`pip\` (add -\`--user\` if you don't want to install it system-wide): +There are tools to automate this. One of them is +[spython](https://github.com/singularityhub/singularity-cli) which can be installed with `pip` +(add `--user` if you don't want to install it system-wide): -`pip3 install -U spython` +```console +marie@local$ pip3 install -U spython +``` + +With this you can simply issue the following command to convert a Dockerfile in the current folder +into a singularity definition file: + +```console +marie@local$ spython recipe Dockerfile myDefinition.def +``` -With this you can simply issue the following command to convert a -Dockerfile in the current folder into a singularity definition file: +Please **verify** your generated definition and adjust where required! -`spython recipe Dockerfile myDefinition.def<br />` +There are some notable changes between Singularity definitions and Dockerfiles: -Now please **verify** your generated definition and adjust where -required! +1. Command chains in Dockerfiles (`apt-get update && apt-get install foo`) must be split into + separate commands (`apt-get update; apt-get install foo`). Otherwise a failing command before the + ampersand is considered "checked" and does not fail the build. +1. The environment variables section in Singularity is only set on execution of the final image, not + during the build as with Docker. So `*ENV*` sections from Docker must be translated to an entry + in the `%environment` section and **additionally** set in the `%runscript` section if the + variable is used there. +1. `*VOLUME*` sections from Docker cannot be represented in Singularity containers. Use the runtime + option \`-B\` to bind folders manually. +1. `CMD` and `ENTRYPOINT` from Docker do not have a direct representation in Singularity. + The closest is to check if any arguments are given in the `%runscript` section and call the + command from `ENTRYPOINT` with those, if none are given call `ENTRYPOINT` with the + arguments of `CMD`: -There are some notable changes between singularity definitions and -Dockerfiles: 1 Command chains in Dockerfiles (\`apt-get update && -apt-get install foo\`) must be split into separate commands (\`apt-get -update; apt-get install foo). Otherwise a failing command before the -ampersand is considered "checked" and does not fail the build. 1 The -environment variables section in Singularity is only set on execution of -the final image, not during the build as with Docker. So \`*ENV*\` -sections from Docker must be translated to an entry in the -*%environment* section and **additionally** set in the *%runscript* -section if the variable is used there. 1 \`*VOLUME*\` sections from -Docker cannot be represented in Singularity containers. Use the runtime -option \`-B\` to bind folders manually. 1 *\`CMD\`* and *\`ENTRYPOINT\`* -from Docker do not have a direct representation in Singularity. The -closest is to check if any arguments are given in the *%runscript* -section and call the command from \`*ENTRYPOINT*\` with those, if none -are given call \`*ENTRYPOINT*\` with the arguments of \`*CMD*\`: -\<verbatim>if \[ $# -gt 0 \]; then \<ENTRYPOINT> "$@" else \<ENTRYPOINT> -\<CMD> fi\</verbatim> + ```bash + if [ $# -gt 0 ]; then + <ENTRYPOINT> "$@" + else + <ENTRYPOINT> <CMD> + fi + ``` -### Using the containers +### Use the Containers -#### Entering a shell in your container +#### Enter a Shell in Your Container A read-only shell can be entered as follows: -```Bash -singularity shell my-container.sif +```console +marie@login$ singularity shell my-container.sif ``` -**IMPORTANT:** In contrast to, for instance, Docker, this will mount various folders from the host -system including $HOME. This may lead to problems with, e.g., Python that stores local packages in -the home folder, which may not work inside the container. It also makes reproducibility harder. It -is therefore recommended to use `--contain/-c` to not bind $HOME (and others like `/tmp`) -automatically and instead set up your binds manually via `-B` parameter. Example: +!!! note -```Bash -singularity shell --contain -B /scratch,/my/folder-on-host:/folder-in-container my-container.sif -``` + In contrast to, for instance, Docker, this will mount various folders from the host system + including $HOME. This may lead to problems with, e.g., Python that stores local packages in the + home folder, which may not work inside the container. It also makes reproducibility harder. It + is therefore recommended to use `--contain/-c` to not bind `$HOME` (and others like `/tmp`) + automatically and instead set up your binds manually via `-B` parameter. Example: + + ```console + marie@login$ singularity shell --contain -B /scratch,/my/folder-on-host:/folder-in-container my-container.sif + ``` You can write into those folders by default. If this is not desired, add an `:ro` for read-only to the bind specification (e.g. `-B /scratch:/scratch:ro\`). Note that we already defined bind paths for `/scratch`, `/projects` and `/sw` in our global `singularity.conf`, so you needn't use the `-B` parameter for those. -If you wish, for instance, to install additional packages, you have to use the `-w` parameter to -enter your container with it being writable. This, again, must be done on a system where you have +If you wish to install additional packages, you have to use the `-w` parameter to +enter your container with it being writable. This, again, must be done on a system where you have the necessary privileges, otherwise you can only edit files that your user has the permissions for. E.g: -```Bash -singularity shell -w my-container.sif +```console +marie@local$ singularity shell -w my-container.sif Singularity.my-container.sif> yum install htop ``` The `-w` parameter should only be used to make permanent changes to your container, not for your -productive runs (it can only be used writeable by one user at the same time). You should write your -output to the usual Taurus file systems like `/scratch`. Launching applications in your container +productive runs (it can only be used writable by one user at the same time). You should write your +output to the usual ZIH filesystems like `/scratch`. Launching applications in your container -#### Running a command inside the container +#### Run a Command Inside the Container -While the "shell" command can be useful for tests and setup, you can also launch your applications +While the `shell` command can be useful for tests and setup, you can also launch your applications inside the container directly using "exec": -```Bash -singularity exec my-container.img /opt/myapplication/bin/run_myapp +```console +marie@login$ singularity exec my-container.img /opt/myapplication/bin/run_myapp ``` This can be useful if you wish to create a wrapper script that transparently calls a containerized application for you. E.g.: -```Bash +```bash #!/bin/bash X=`which singularity 2>/dev/null` if [ "z$X" = "z" ] ; then - echo "Singularity not found. Is the module loaded?" - exit 1 + echo "Singularity not found. Is the module loaded?" + exit 1 fi singularity exec /scratch/p_myproject/my-container.sif /opt/myapplication/run_myapp "$@" -The better approach for that however is to use `singularity run` for that, which executes whatever was set in the _%runscript_ section of the definition file with the arguments you pass to it. -Example: -Build the following definition file into an image: +``` + +The better approach is to use `singularity run`, which executes whatever was set in the `%runscript` +section of the definition file with the arguments you pass to it. Example: Build the following +definition file into an image: + +```bash Bootstrap: docker From: ubuntu:trusty @@ -285,33 +324,32 @@ singularity build my-container.sif example.def Then you can run your application via -```Bash +```console singularity run my-container.sif first_arg 2nd_arg ``` -Alternatively you can execute the container directly which is -equivalent: +Alternatively you can execute the container directly which is equivalent: -```Bash +```console ./my-container.sif first_arg 2nd_arg ``` With this you can even masquerade an application with a singularity container as if it was an actual program by naming the container just like the binary: -```Bash +```console mv my-container.sif myCoolAp ``` -### Use-cases +### Use-Cases -One common use-case for containers is that you need an operating system with a newer GLIBC version -than what is available on Taurus. E.g., the bullx Linux on Taurus used to be based on RHEL6 having a -rather dated GLIBC version 2.12, some binary-distributed applications didn't work on that anymore. -You can use one of our pre-made CentOS 7 container images (`/scratch/singularity/centos7.img`) to -circumvent this problem. Example: +One common use-case for containers is that you need an operating system with a newer +[glibc](https://www.gnu.org/software/libc/) version than what is available on ZIH systems. E.g., the +bullx Linux on ZIH systems used to be based on RHEL 6 having a rather dated glibc version 2.12, some +binary-distributed applications didn't work on that anymore. You can use one of our pre-made CentOS +7 container images (`/scratch/singularity/centos7.img`) to circumvent this problem. Example: -```Bash -$ singularity exec /scratch/singularity/centos7.img ldd --version +```console +marie@login$ singularity exec /scratch/singularity/centos7.img ldd --version ldd (GNU libc) 2.17 ``` diff --git a/doc.zih.tu-dresden.de/docs/software/singularity_example_definitions.md b/doc.zih.tu-dresden.de/docs/software/singularity_example_definitions.md deleted file mode 100644 index 28fe94a9d510e577148d7d0c2f526136e813d4ba..0000000000000000000000000000000000000000 --- a/doc.zih.tu-dresden.de/docs/software/singularity_example_definitions.md +++ /dev/null @@ -1,110 +0,0 @@ -# Singularity Example Definitions - -## Basic example - -A usual workflow to create Singularity Definition consists of the -following steps: - -- Start from base image -- Install dependencies - - Package manager - - Other sources -- Build & Install own binaries -- Provide entrypoints & metadata - -An example doing all this: - -```Bash -Bootstrap: docker -From: alpine - -%post - . /.singularity.d/env/10-docker*.sh - - apk add g++ gcc make wget cmake - - wget https://github.com/fmtlib/fmt/archive/5.3.0.tar.gz - tar -xf 5.3.0.tar.gz - mkdir build && cd build - cmake ../fmt-5.3.0 -DFMT_TEST=OFF - make -j$(nproc) install - cd .. - rm -r fmt-5.3.0* - - cat hello.cpp -#include <fmt/format.h> - -int main(int argc, char** argv){ - if(argc == 1) fmt::print("No arguments passed!\n"); - else fmt::print("Hello {}!\n", argv[1]); -} -EOF - - g++ hello.cpp -o hello -lfmt - mv hello /usr/bin/hello - -%runscript - hello "$@" - -%labels - Author Alexander Grund - Version 1.0.0 - -%help - Display a greeting using the fmt library - - Usage: - ./hello -``` - -## CUDA + CuDNN + OpenMPI - -- Chosen CUDA version depends on installed driver of host -- OpenMPI needs PMI for SLURM integration -- OpenMPI needs CUDA for GPU copy-support -- OpenMPI needs ibverbs libs for Infiniband -- openmpi-mca-params.conf required to avoid warnings on fork (OK on - taurus) -- Environment variables SLURM_VERSION, OPENMPI_VERSION can be set to - choose different version when building the container - -``` -Bootstrap: docker -From: nvidia/cuda-ppc64le:10.1-cudnn7-devel-ubuntu18.04 - -%labels - Author ZIH - Requires CUDA driver 418.39+. - -%post - . /.singularity.d/env/10-docker*.sh - - apt-get update - apt-get install -y cuda-compat-10.1 - apt-get install -y libibverbs-dev ibverbs-utils - # Install basic development tools - apt-get install -y gcc g++ make wget python - apt-get autoremove; apt-get clean - - cd /tmp - - : ${SLURM_VERSION:=17-02-11-1} - wget https://github.com/SchedMD/slurm/archive/slurm-${SLURM_VERSION}.tar.gz - tar -xf slurm-${SLURM_VERSION}.tar.gz - cd slurm-slurm-${SLURM_VERSION} - ./configure --prefix=/usr/ --sysconfdir=/etc/slurm --localstatedir=/var --disable-debug - make -C contribs/pmi2 -j$(nproc) install - cd .. - rm -rf slurm-* - - : ${OPENMPI_VERSION:=3.1.4} - wget https://download.open-mpi.org/release/open-mpi/v${OPENMPI_VERSION%.*}/openmpi-${OPENMPI_VERSION}.tar.gz - tar -xf openmpi-${OPENMPI_VERSION}.tar.gz - cd openmpi-${OPENMPI_VERSION}/ - ./configure --prefix=/usr/ --with-pmi --with-verbs --with-cuda - make -j$(nproc) install - echo "mpi_warn_on_fork = 0" >> /usr/etc/openmpi-mca-params.conf - echo "btl_openib_warn_default_gid_prefix = 0" >> /usr/etc/openmpi-mca-params.conf - cd .. - rm -rf openmpi-* -``` diff --git a/doc.zih.tu-dresden.de/docs/software/singularity_recipe_hints.md b/doc.zih.tu-dresden.de/docs/software/singularity_recipe_hints.md index 5e4388fcf95ed06370d7d633544ee685113df1a7..b8304b57de0f1ae5da98341c92f6d9067b838ecd 100644 --- a/doc.zih.tu-dresden.de/docs/software/singularity_recipe_hints.md +++ b/doc.zih.tu-dresden.de/docs/software/singularity_recipe_hints.md @@ -1,6 +1,117 @@ -# Singularity Recipe Hints +# Singularity Recipes and Hints -## GUI (X11) applications +## Example Definitions + +### Basic Example + +A usual workflow to create Singularity Definition consists of the following steps: + +* Start from base image +* Install dependencies + * Package manager + * Other sources +* Build and install own binaries +* Provide entry points and metadata + +An example doing all this: + +```bash +Bootstrap: docker +From: alpine + +%post + . /.singularity.d/env/10-docker*.sh + + apk add g++ gcc make wget cmake + + wget https://github.com/fmtlib/fmt/archive/5.3.0.tar.gz + tar -xf 5.3.0.tar.gz + mkdir build && cd build + cmake ../fmt-5.3.0 -DFMT_TEST=OFF + make -j$(nproc) install + cd .. + rm -r fmt-5.3.0* + + cat hello.cpp +#include <fmt/format.h> + +int main(int argc, char** argv){ + if(argc == 1) fmt::print("No arguments passed!\n"); + else fmt::print("Hello {}!\n", argv[1]); +} +EOF + + g++ hello.cpp -o hello -lfmt + mv hello /usr/bin/hello + +%runscript + hello "$@" + +%labels + Author Alexander Grund + Version 1.0.0 + +%help + Display a greeting using the fmt library + + Usage: + ./hello +``` + +### CUDA + CuDNN + OpenMPI + +* Chosen CUDA version depends on installed driver of host +* OpenMPI needs PMI for Slurm integration +* OpenMPI needs CUDA for GPU copy-support +* OpenMPI needs `ibverbs` library for Infiniband +* `openmpi-mca-params.conf` required to avoid warnings on fork (OK on ZIH systems) +* Environment variables `SLURM_VERSION` and `OPENMPI_VERSION` can be set to choose different + version when building the container + +```bash +Bootstrap: docker +From: nvidia/cuda-ppc64le:10.1-cudnn7-devel-ubuntu18.04 + +%labels + Author ZIH + Requires CUDA driver 418.39+. + +%post + . /.singularity.d/env/10-docker*.sh + + apt-get update + apt-get install -y cuda-compat-10.1 + apt-get install -y libibverbs-dev ibverbs-utils + # Install basic development tools + apt-get install -y gcc g++ make wget python + apt-get autoremove; apt-get clean + + cd /tmp + + : ${SLURM_VERSION:=17-02-11-1} + wget https://github.com/SchedMD/slurm/archive/slurm-${SLURM_VERSION}.tar.gz + tar -xf slurm-${SLURM_VERSION}.tar.gz + cd slurm-slurm-${SLURM_VERSION} + ./configure --prefix=/usr/ --sysconfdir=/etc/slurm --localstatedir=/var --disable-debug + make -C contribs/pmi2 -j$(nproc) install + cd .. + rm -rf slurm-* + + : ${OPENMPI_VERSION:=3.1.4} + wget https://download.open-mpi.org/release/open-mpi/v${OPENMPI_VERSION%.*}/openmpi-${OPENMPI_VERSION}.tar.gz + tar -xf openmpi-${OPENMPI_VERSION}.tar.gz + cd openmpi-${OPENMPI_VERSION}/ + ./configure --prefix=/usr/ --with-pmi --with-verbs --with-cuda + make -j$(nproc) install + echo "mpi_warn_on_fork = 0" >> /usr/etc/openmpi-mca-params.conf + echo "btl_openib_warn_default_gid_prefix = 0" >> /usr/etc/openmpi-mca-params.conf + cd .. + rm -rf openmpi-* +``` + +## Hints + +### GUI (X11) Applications Running GUI applications inside a singularity container is possible out of the box. Check the following definition: @@ -15,25 +126,25 @@ yum install -y xeyes This image may be run with -```Bash +```console singularity exec xeyes.sif xeyes. ``` -This works because all the magic is done by singularity already like setting $DISPLAY to the outside -display and mounting $HOME so $HOME/.Xauthority (X11 authentication cookie) is found. When you are -using \`--contain\` or \`--no-home\` you have to set that cookie yourself or mount/copy it inside -the container. Similar for \`--cleanenv\` you have to set $DISPLAY e.g. via +This works because all the magic is done by Singularity already like setting `$DISPLAY` to the outside +display and mounting `$HOME` so `$HOME/.Xauthority` (X11 authentication cookie) is found. When you are +using `--contain` or `--no-home` you have to set that cookie yourself or mount/copy it inside +the container. Similar for `--cleanenv` you have to set `$DISPLAY`, e.g., via -```Bash +```console export SINGULARITY_DISPLAY=$DISPLAY ``` -When you run a container as root (via \`sudo\`) you may need to allow root for your local display +When you run a container as root (via `sudo`) you may need to allow root for your local display port: `xhost +local:root\` -### Hardware acceleration +### Hardware Acceleration -If you want hardware acceleration you **may** need [VirtualGL](https://virtualgl.org). An example +If you want hardware acceleration, you **may** need [VirtualGL](https://virtualgl.org). An example definition file is as follows: ```Bash @@ -55,25 +166,28 @@ rm VirtualGL-*.rpm yum install -y mesa-dri-drivers # for e.g. intel integrated GPU drivers. Replace by your driver ``` -You can now run the application with vglrun: +You can now run the application with `vglrun`: -```Bash +```console singularity exec vgl.sif vglrun glxgears ``` -**Attention:**Using VirtualGL may not be required at all and could even decrease the performance. To -check install e.g. glxgears as above and your graphics driver (or use the VirtualGL image from -above) and disable vsync: +!!! warning -``` + Using VirtualGL may not be required at all and could even decrease the performance. + +To check install, e.g., `glxgears` as above and your graphics driver (or use the VirtualGL image +from above) and disable `vsync`: + +```console vblank_mode=0 singularity exec vgl.sif glxgears ``` -Compare the FPS output with the glxgears prefixed by vglrun (see above) to see which produces more +Compare the FPS output with the `glxgears` prefixed by `vglrun` (see above) to see which produces more FPS (or runs at all). -**NVIDIA GPUs** need the `--nv` parameter for the singularity command: +**NVIDIA GPUs** need the `--nv` parameter for the Singularity command: -``Bash +``console singularity exec --nv vgl.sif glxgears ``` diff --git a/doc.zih.tu-dresden.de/docs/software/virtual_machines.md b/doc.zih.tu-dresden.de/docs/software/virtual_machines.md index 5104c7b35587aaeaca86d64419ffd8965d2fa27b..9fd64d01dddbfde3119b74fa0e8f9decfe5b49f0 100644 --- a/doc.zih.tu-dresden.de/docs/software/virtual_machines.md +++ b/doc.zih.tu-dresden.de/docs/software/virtual_machines.md @@ -1,88 +1,89 @@ -# Virtual machine on Taurus +# Virtual Machines -The following instructions are primarily aimed at users who want to build their -[Singularity](containers.md) containers on Taurus. +The following instructions are primarily aimed at users who want to build their own +[Singularity](containers.md) containers on ZIH systems. The Singularity container setup requires a Linux machine with root privileges, the same architecture and a compatible kernel. If some of these requirements can not be fulfilled, then there is -also the option of using the provided virtual machines on Taurus. +also the option of using the provided virtual machines (VM) on ZIH systems. -Currently, starting VMs is only possible on ML and HPDLF nodes. The VMs on the ML nodes are used to -build singularity containers for the Power9 architecture and the HPDLF nodes to build singularity -containers for the x86 architecture. +Currently, starting VMs is only possible on partitions `ml` and HPDLF. The VMs on the ML nodes are +used to build singularity containers for the Power9 architecture and the HPDLF nodes to build +Singularity containers for the x86 architecture. -## Create a virtual machine +## Create a Virtual Machine -The `--cloud=kvm` SLURM parameter specifies that a virtual machine should be started. +The `--cloud=kvm` Slurm parameter specifies that a virtual machine should be started. -### On Power9 architecture +### On Power9 Architecture -```Bash -rotscher@tauruslogin3:~> srun -p ml -N 1 -c 4 --hint=nomultithread --cloud=kvm --pty /bin/bash +```console +marie@login$ srun -p ml -N 1 -c 4 --hint=nomultithread --cloud=kvm --pty /bin/bash srun: job 6969616 queued and waiting for resources srun: job 6969616 has been allocated resources bash-4.2$ ``` -### On x86 architecture +### On x86 Architecture -```Bash -rotscher@tauruslogin3:~> srun -p hpdlf -N 1 -c 4 --hint=nomultithread --cloud=kvm --pty /bin/bash +```console +marie@login$ srun -p hpdlf -N 1 -c 4 --hint=nomultithread --cloud=kvm --pty /bin/bash srun: job 2969732 queued and waiting for resources srun: job 2969732 has been allocated resources bash-4.2$ ``` -## Access virtual machine +## Access a Virtual Machine -Since the security issue on Taurus, we restricted the file system permissions. Now you have to wait -until the file /tmp/${SLURM_JOB_USER}\_${SLURM_JOB_ID}/activate is created, then you can try to ssh -into the virtual machine (VM), but it could be that the VM needs some more seconds to boot and start -the SSH daemon. So you may need to try the `ssh` command multiple times till it succeeds. +Since the a security issue on ZIH systems, we restricted the filesystem permissions. Now you have to +wait until the file `/tmp/${SLURM_JOB_USER}\_${SLURM_JOB_ID}/activate` is created, then you can try +to connect via `ssh` into the virtual machine, but it could be that the virtual machine needs some +more seconds to boot and start the SSH daemon. So you may need to try the `ssh` command multiple +times till it succeeds. -```Bash -bash-4.2$ cat /tmp/rotscher_2759627/activate +```console +bash-4.2$ cat /tmp/marie_2759627/activate #!/bin/bash if ! grep -q -- "Key for the VM on the ml partition" "/home/rotscher/.ssh/authorized_keys" >& /dev/null; then - cat "/tmp/rotscher_2759627/kvm.pub" >> "/home/rotscher/.ssh/authorized_keys" + cat "/tmp/marie_2759627/kvm.pub" >> "/home/marie/.ssh/authorized_keys" else - sed -i "s|.*Key for the VM on the ml partition.*|ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3siZfQ6vQ6PtXPG0RPZwtJXYYFY73TwGYgM6mhKoWHvg+ZzclbBWVU0OoU42B3Ddofld7TFE8sqkHM6M+9jh8u+pYH4rPZte0irw5/27yM73M93q1FyQLQ8Rbi2hurYl5gihCEqomda7NQVQUjdUNVc6fDAvF72giaoOxNYfvqAkw8lFyStpqTHSpcOIL7pm6f76Jx+DJg98sXAXkuf9QK8MurezYVj1qFMho570tY+83ukA04qQSMEY5QeZ+MJDhF0gh8NXjX/6+YQrdh8TklPgOCmcIOI8lwnPTUUieK109ndLsUFB5H0vKL27dA2LZ3ZK+XRCENdUbpdoG2Czz Key for the VM on the ml partition|" "/home/rotscher/.ssh/authorized_keys" + sed -i "s|.*Key for the VM on the ml partition.*|ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3siZfQ6vQ6PtXPG0RPZwtJXYYFY73TwGYgM6mhKoWHvg+ZzclbBWVU0OoU42B3Ddofld7TFE8sqkHM6M+9jh8u+pYH4rPZte0irw5/27yM73M93q1FyQLQ8Rbi2hurYl5gihCEqomda7NQVQUjdUNVc6fDAvF72giaoOxNYfvqAkw8lFyStpqTHSpcOIL7pm6f76Jx+DJg98sXAXkuf9QK8MurezYVj1qFMho570tY+83ukA04qQSMEY5QeZ+MJDhF0gh8NXjX/6+YQrdh8TklPgOCmcIOI8lwnPTUUieK109ndLsUFB5H0vKL27dA2LZ3ZK+XRCENdUbpdoG2Czz Key for the VM on the ml partition|" "/home/marie/.ssh/authorized_keys" fi -ssh -i /tmp/rotscher_2759627/kvm root@192.168.0.6 -bash-4.2$ source /tmp/rotscher_2759627/activate +ssh -i /tmp/marie_2759627/kvm root@192.168.0.6 +bash-4.2$ source /tmp/marie_2759627/activate Last login: Fri Jul 24 13:53:48 2020 from gateway -[root@rotscher_2759627 ~]# +[root@marie_2759627 ~]# ``` -## Example usage +## Example Usage ## Automation -We provide [Tools](vm_tools.md) to automate these steps. You may just type `startInVM --arch=power9` -on a tauruslogin node and you will be inside the VM with everything mounted. +We provide [tools](virtual_machines_tools.md) to automate these steps. You may just type `startInVM +--arch=power9` on a login node and you will be inside the VM with everything mounted. ## Known Issues ### Temporary Memory -The available space inside the VM can be queried with `df -h`. Currently the whole VM has 8G and -with the installed operating system, 6.6GB of available space. +The available space inside the VM can be queried with `df -h`. Currently the whole VM has 8 GB and +with the installed operating system, 6.6 GB of available space. Sometimes the Singularity build might fail because of a disk out-of-memory error. In this case it might be enough to delete leftover temporary files from Singularity: -```Bash +```console rm -rf /tmp/sbuild-* ``` If that does not help, e.g., because one build alone needs more than the available disk memory, then it will be necessary to use the tmp folder on scratch. In order to ensure that the files in the -temporary folder will be owned by root, it is necessary to set up an image inside /scratch/tmp -instead of using it directly. E.g., to create a 25GB of temporary memory image: +temporary folder will be owned by root, it is necessary to set up an image inside `/scratch/tmp` +instead of using it directly. E.g., to create a 25 GB of temporary memory image: -```Bash +```console tmpDir="$( mktemp -d --tmpdir=/host_data/tmp )" && tmpImg="$tmpDir/singularity-build-temp-dir" export LANG_BACKUP=$LANG unset LANG @@ -90,13 +91,17 @@ truncate -s 25G "$tmpImg.ext4" && echo yes | mkfs.ext4 "$tmpImg.ext4" export LANG=$LANG_BACKUP ``` -The image can now be mounted and with the **SINGULARITY_TMPDIR** environment variable can be +The image can now be mounted and with the `SINGULARITY_TMPDIR` environment variable can be specified as the temporary directory for Singularity builds. Unfortunately, because of an open Singularity [bug](https://github.com/sylabs/singularity/issues/32) it is should be avoided to mount -the image using **/dev/loop0**. +the image using `/dev/loop0`. -```Bash -mkdir -p "$tmpImg" && i=1 && while test -e "/dev/loop$i"; do (( ++i )); done && mknod -m 0660 "/dev/loop$i" b 7 "$i"<br />mount -o loop="/dev/loop$i" "$tmpImg"{.ext4,}<br /><br />export SINGULARITY_TMPDIR="$tmpImg"<br /><br />singularity build my-container.{sif,def} +```console +mkdir -p "$tmpImg" && i=1 && while test -e "/dev/loop$i"; do (( ++i )); done && mknod -m 0660 "/dev/loop$i" b 7 "$i" +mount -o loop="/dev/loop$i" "$tmpImg"{.ext4,} + +export SINGULARITY_TMPDIR="$tmpImg" +singularity build my-container.{sif,def} ``` The architecture of the base image is automatically chosen when you use an image from DockerHub. @@ -106,4 +111,4 @@ Bootstraps **shub** and **library** should be avoided. ### Transport Endpoint is not Connected This happens when the SSHFS mount gets unmounted because it is not very stable. It is sufficient to -run `\~/mount_host_data.sh` again or just the sshfs command inside that script. +run `\~/mount_host_data.sh` again or just the SSHFS command inside that script. diff --git a/doc.zih.tu-dresden.de/docs/software/vm_tools.md b/doc.zih.tu-dresden.de/docs/software/virtual_machines_tools.md similarity index 50% rename from doc.zih.tu-dresden.de/docs/software/vm_tools.md rename to doc.zih.tu-dresden.de/docs/software/virtual_machines_tools.md index 5a4d58a7e2ac7a1532d5029312e3ff3b479d7939..0b03ddf927aeed68d8726797ed04db373d24b9b3 100644 --- a/doc.zih.tu-dresden.de/docs/software/vm_tools.md +++ b/doc.zih.tu-dresden.de/docs/software/virtual_machines_tools.md @@ -1,71 +1,70 @@ -# Singularity on Power9 / ml partition +# Singularity on Partition `ml` -Building Singularity containers from a recipe on Taurus is normally not possible due to the -requirement of root (administrator) rights, see [Containers](containers.md). For obvious reasons -users on Taurus cannot be granted root permissions. +!!! note "Root privileges" -The solution is to build your container on your local Linux machine by executing something like + Building Singularity containers from a recipe on ZIH system is normally not possible due to the + requirement of root (administrator) rights, see [Containers](containers.md). For obvious reasons + users cannot be granted root permissions. -```Bash -sudo singularity build myContainer.sif myDefinition.def -``` - -Then you can copy the resulting myContainer.sif to Taurus and execute it there. +The solution is to build your container on your local Linux workstation using Singularity and copy +it to ZIH systems for execution. -This does **not** work on the ml partition as it uses the Power9 architecture which your laptop -likely doesn't. +**This does not work on the partition `ml`** as it uses the Power9 architecture which your +workstation likely doesn't. -For this we provide a Virtual Machine (VM) on the ml partition which allows users to gain root +For this we provide a Virtual Machine (VM) on the partition `ml` which allows users to gain root permissions in an isolated environment. The workflow to use this manually is described at -[another page](virtual_machines.md) but is quite cumbersome. +[this page](virtual_machines.md) but is quite cumbersome. To make this easier two programs are provided: `buildSingularityImage` and `startInVM` which do what they say. The latter is for more advanced use cases so you should be fine using -*buildSingularityImage*, see the following section. +`buildSingularityImage`, see the following section. -**IMPORTANT:** You need to have your default SSH key without a password for the scripts to work as -entering a password through the scripts is not supported. +!!! note "SSH key without password" + + You need to have your default SSH key without a password for the scripts to work as + entering a password through the scripts is not supported. **The recommended workflow** is to create and test a definition file locally. You usually start from a base Docker container. Those typically exist for different architectures but with a common name -(e.g. 'ubuntu:18.04'). Singularity automatically uses the correct Docker container for your current +(e.g. `ubuntu:18.04`). Singularity automatically uses the correct Docker container for your current architecture when building. So in most cases you can write your definition file, build it and test -it locally, then move it to Taurus and build it on Power9 without any further changes. However, -sometimes Docker containers for different architectures have different suffixes, in which case you'd -need to change that when moving to Taurus. +it locally, then move it to ZIH systems and build it on Power9 (partition `ml`) without any further +changes. However, sometimes Docker containers for different architectures have different suffixes, +in which case you'd need to change that when moving to ZIH systems. -## Building a Singularity container in a job +## Build a Singularity Container in a Job -To build a singularity container on Taurus simply run: +To build a Singularity container on ZIH systems simply run: -```Bash -buildSingularityImage --arch=power9 myContainer.sif myDefinition.def +```console +marie@login$ buildSingularityImage --arch=power9 myContainer.sif myDefinition.def ``` -This command will submit a batch job and immediately return. Note that while "power9" is currently +This command will submit a batch job and immediately return. Note that while Power9 is currently the only supported architecture, the parameter is still required. If you want it to block while the -image is built and see live output, use the parameter `--interactive`: +image is built and see live output, add the option `--interactive`: -```Bash -buildSingularityImage --arch=power9 --interactive myContainer.sif myDefinition.def +```console +marie@login$ buildSingularityImage --arch=power9 --interactive myContainer.sif myDefinition.def ``` There are more options available which can be shown by running `buildSingularityImage --help`. All have reasonable defaults.The most important ones are: -- `--time <time>`: Set a higher job time if the default time is not - enough to build your image and your job is cancelled before completing. The format is the same - as for SLURM. -- `--tmp-size=<size in GB>`: Set a size used for the temporary +* `--time <time>`: Set a higher job time if the default time is not + enough to build your image and your job is canceled before completing. The format is the same as + for Slurm. +* `--tmp-size=<size in GB>`: Set a size used for the temporary location of the Singularity container. Basically the size of the extracted container. -- `--output=<file>`: Path to a file used for (log) output generated +* `--output=<file>`: Path to a file used for (log) output generated while building your container. -- Various singularity options are passed through. E.g. +* Various Singularity options are passed through. E.g. `--notest, --force, --update`. See, e.g., `singularity --help` for details. For **advanced users** it is also possible to manually request a job with a VM (`srun -p ml --cloud=kvm ...`) and then use this script to build a Singularity container from within the job. In -this case the `--arch` and other SLURM related parameters are not required. The advantage of using +this case the `--arch` and other Slurm related parameters are not required. The advantage of using this script is that it automates the waiting for the VM and mounting of host directories into it (can also be done with `startInVM`) and creates a temporary directory usable with Singularity inside the VM controlled by the `--tmp-size` parameter. @@ -78,31 +77,31 @@ As the build starts in a VM you may not have access to all your files. It is us to refer to local files from inside a definition file anyway as this reduces reproducibility. However common directories are available by default. For others, care must be taken. In short: -- `/home/$USER`, `/scratch/$USER` are available and should be used `/scratch/\<group>` also works for -- all groups the users is in `/projects/\<group>` similar, but is read-only! So don't use this to +* `/home/$USER`, `/scratch/$USER` are available and should be used `/scratch/\<group>` also works for +* all groups the users is in `/projects/\<group>` similar, but is read-only! So don't use this to store your generated container directly, but rather move it here afterwards -- /tmp is the VM local temporary directory. All files put here will be lost! +* /tmp is the VM local temporary directory. All files put here will be lost! If the current directory is inside (or equal to) one of the above (except `/tmp`), then relative paths for container and definition work as the script changes to the VM equivalent of the current directory. Otherwise you need to use absolute paths. Using `~` in place of `$HOME` does work too. -Under the hood, the filesystem of Taurus is mounted via SSHFS at `/host_data`, so if you need any +Under the hood, the filesystem of ZIH systems is mounted via SSHFS at `/host_data`, so if you need any other files they can be found there. -There is also a new SSH key named "kvm" which is created by the scripts and authorized inside the VM -to allow for password-less access to SSHFS. This is stored at `~/.ssh/kvm` and regenerated if it +There is also a new SSH key named `kvm` which is created by the scripts and authorized inside the VM +to allow for password-less access to SSHFS. This is stored at `~/.ssh/kvm` and regenerated if it does not exist. It is also added to `~/.ssh/authorized_keys`. Note that removing the key file does not remove it from `authorized_keys`, so remove it manually if you need to. It can be easily -identified by the comment on the key. However, removing this key is **NOT** recommended, as it +identified by the comment on the key. However, removing this key is **NOT** recommended, as it needs to be re-generated on every script run. -## Starting a Job in a VM +## Start a Job in a VM Especially when developing a Singularity definition file it might be useful to get a shell directly on a VM. To do so simply run: -```Bash +```console startInVM --arch=power9 ``` @@ -114,10 +113,11 @@ build` commands. As usual more options can be shown by running `startInVM --help`, the most important one being `--time`. -There are 2 special use cases for this script: 1 Execute an arbitrary command inside the VM instead -of getting a bash by appending the command to the script. Example: \<pre>startInVM --arch=power9 -singularity build \~/myContainer.sif \~/myDefinition.def\</pre> 1 Use the script in a job manually -allocated via srun/sbatch. This will work the same as when running outside a job but will **not** -start a new job. This is useful for using it inside batch scripts, when you already have an -allocation or need special arguments for the job system. Again you can run an arbitrary command by -passing it to the script. +There are two special use cases for this script: + +1. Execute an arbitrary command inside the VM instead of getting a bash by appending the command to + the script. Example: `startInVM --arch=power9 singularity build \~/myContainer.sif \~/myDefinition.de` +1. Use the script in a job manually allocated via srun/sbatch. This will work the same as when + running outside a job but will **not** start a new job. This is useful for using it inside batch + scripts, when you already have an allocation or need special arguments for the job system. Again + you can run an arbitrary command by passing it to the script. diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index be62f1da54203739a867ecea94833d8eb05b15f2..26c1381d36dfe624b99357c220566838ac0f727f 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -30,9 +30,9 @@ nav: - Python Virtual Environments: software/python_virtual_environments.md - Containers: - Singularity: software/containers.md - - Singularity Recicpe Hints: software/singularity_recipe_hints.md - - Singularity Example Definitions: software/singularity_example_definitions.md - - VM tools: software/vm_tools.md + - Singularity Recipes and Hints: software/singularity_recipe_hints.md + - Virtual Machines Tools: software/virtual_machines_tools.md + - Virtual Machines: software/virtual_machines.md - Applications: - Licenses: software/licenses.md - Computational Fluid Dynamics (CFD): software/cfd.md @@ -55,7 +55,6 @@ nav: - Hyperparameter Optimization (OmniOpt): software/hyperparameter_optimization.md - PowerAI: software/power_ai.md - SCS5 Migration Hints: software/scs5_software.md - - Virtual Machines: software/virtual_machines.md - Virtual Desktops: software/virtual_desktops.md - Software Development and Tools: - Overview: software/software_development_overview.md diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index 4d2cf2456bb8596323c48a16ebc2b6f04f6f3c88..abd66b4afd11aad032c4dc5fe5e168e1c6c77628 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -9,8 +9,10 @@ benchmarking BLAS broadwell bsub +bullx ccNUMA centauri +CentOS Chemnitz citable conda @@ -32,6 +34,7 @@ DFG DistributedDataParallel DockerHub Dockerfile +Dockerfiles dockerized EasyBuild ecryptfs @@ -54,6 +57,8 @@ GiB gifferent GitLab GitLab's +GitHub +glibc gnuplot GPU GPUs @@ -145,6 +150,7 @@ PiB Pika pipelining png +PMI PowerAI ppc PSOCK @@ -154,6 +160,8 @@ queue randint reachability README +reproducibility +RHEL Rmpi rome romeo @@ -186,6 +194,7 @@ SMT squeue srun ssd +SSHFS stderr stdout SUSE @@ -209,7 +218,9 @@ VampirTrace's vectorization venv virtualenv +VirtualGL VPN +VMs WebVNC WinSCP Workdir