Polished language and formatting. Shorten introduction.

af8ebb90 · Taras Lazariv · 27eb7b9e · af8ebb90
Commit af8ebb90 authored 3 years ago by Taras Lazariv
--- a/doc.zih.tu-dresden.de/docs/software/ngc_containers.md
+++ b/doc.zih.tu-dresden.de/docs/software/ngc_containers.md
 # GPU-accelerated Containers for Deep Learning (NGC Containers)
-## Containers
+A [container](containers.md) is an executable and portable unit of software.
-A container is an executable and portable unit of software.
-[Containerization](https://www.ibm.com/cloud/learn/containerization) means
-encapsulating or packaging up software code and all its dependencies
-to run uniformly and consistently on any infrastructure. In other words,
-it is agnostic to host specific environment like OS, etc.
-The entire software environment, from the deep learning framework itself,
-down to the math and communication libraries that are necessary for performance,
-is packaged into a single bundle.
 On ZIH systems, [Singularity](https://sylabs.io/) is used as a standard container solution.
-## NGC Containers in General
 [NGC](https://developer.nvidia.com/ai-hpc-containers),
 a registry of highly GPU-optimized software,
 has been enabling scientists and researchers by providing regularly updated
 and validated containers of HPC and AI applications.
-Singularity supports NGC containers.
-NGC containers are optimized for HPC applications.
 NGC containers are **GPU-optimized** containers
-for **deep learning,** **machine learning**, visualization:
+for deep learning, machine learning, visualization:
 - Built-in libraries and dependencies;
 - Faster training with Automatic Mixed Precision (AMP);
- Opportunity to scaling up from single-node to multi-node systems;
+- Opportunity to scale up from single-node to multi-node systems;
 - Performance optimized.
-### Why NGC Containers?
+!!! note "Advantages of NGC containers"
+    - NGC containers were highly optimized for cluster usage.
-Advantages of NGC containers:
+        The performance provided by NGC containers is comparable to the performance
+        provided by the modules on the ZIH system (which is potentially the most performant way).
- NGC containers were highly optimized for cluster usage.
+        NGC containers are a quick and efficient way to apply the best models
-The performance provided by NGC containers is comparable to the performance
+        on your dataset on a ZIH system;
-provided by the modules on the ZIH system (which is potentially the most performant way).
+    - NGC containers allow using an exact version of the software
-NGC containers are a quick and efficient way to apply the best models
+        without installing it with all prerequisites manually.
-on your dataset on a ZIH system;
+        Manual installation can result in poor performance (e.g. using conda to install a software).
- NGC containers allow using an exact version of the software
-without installing it with all prerequisites manually.
-Manual installation can result in poor performance (e.g. using conda to install a software).
 ## Run NGC Containers on the ZIH System
-### Preparation
 The first step is a choice of the necessary software (container) to run.
 The [NVIDIA NGC catalog](https://ngc.nvidia.com/catalog)
 contains a host of GPU-optimized containers for deep learning,
@@ -60,9 +38,7 @@ To find a container that fits the requirements of your task, please check
 the [official examples page](https://github.com/NVIDIA/DeepLearningExamples)
 with the list of main containers with their features and peculiarities.
-### Building and Running a Container
+### Run NGC container on a Single GPU
-#### Run NGC container on a Single GPU
 !!! note
    Almost all NGC containers can work with a single GPU.
@@ -77,7 +53,7 @@ Create a container from the image from the NGC catalog.
 (For this example, the alpha is used):
 ```console
-marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=1 --ntasks=1 --gres=gpu:1 --time=08:00:00 --pty --mem=50000 bash    #allocate one GPU
+marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=1 --ntasks=1 --gres=gpu:1 --time=08:00:00 --pty --mem=50000 bash 
 marie@compute$ cd /scratch/ws/<name_of_your_workspace>/containers   #please create a Workspace
@@ -104,22 +80,26 @@ It is recommended to run the container with a single command.
 However, for the educational purpose, the separate commands will be presented below:
 ```console
-marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=1 --ntasks=1 --gres=gpu:1 --time=08:00:00 --pty --mem=50000 bash    #allocate one GPU
+marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=1 --ntasks=1 --gres=gpu:1 --time=08:00:00 --pty --mem=50000 bash
 ```
 Run a shell within a container with the `singularity shell` command:
 ```console
-marie@compute$ singularity shell --nv -B /scratch/ws/0/anpo879a-ImgNet/imagenet:/data/imagenet pytorch:21.06-py3
+marie@compute$ singularity shell --nv -B /scratch/imagenet:/data/imagenet pytorch:21.06-py3
 ```
-The flag `--nv` in the command above was used to enable Nvidia support
+The flag `--nv` in the command above was used to enable Nvidia support for GPU usage
 and a flag `-B` for a user-bind path specification.
 Run the training inside the container:
 ```console
-marie@compute$ python /workspace/examples/resnet50v1.5/multiproc.py --nnodes=1 --nproc_per_node 1 --node_rank=0 /workspace/examples/resnet50v1.5/main.py --data-backend dali-cpu --raport-file raport.json -j16 -p 100 --lr 2.048 --optimizer-batch-size 2048 --warmup 8 --arch resnet50 -c fanin --label-smoothing 0.1 --lr-schedule cosine --mom 0.875 --wd 3.0517578125e-05 -b 256 --epochs 90 /data/imagenet
+marie@container$ python /workspace/examples/resnet50v1.5/multiproc.py --nnodes=1 --nproc_per_node=1 \
+                --node_rank=0 /workspace/examples/resnet50v1.5/main.py --data-backend dali-cpu \
+                --raport-file raport.json -j16 -p 100 --lr 2.048 --optimizer-batch-size 2048 --warmup 8 \
+                --arch resnet50 -c fanin --label-smoothing 0.1 --lr-schedule cosine --mom 0.875 \
+                --wd 3.0517578125e-05 -b 256 --epochs 90 /data/imagenet
 ```
 !!! warning
@@ -132,10 +112,15 @@ As an example, please find the full command to run the ResNet50 model
 on the ImageNet dataset inside the PyTorch container:
 ```console
-marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=1 --ntasks=1 --gres=gpu:1 --time=08:00:00 --pty --mem=50000 singularity exec --nv -B /scratch/ws/0/anpo879a-ImgNet/imagenet:/data/imagenet pytorch:21.06-py3 python /workspace/examples/resnet50v1.5/multiproc.py --nnodes=1 --nproc_per_node 1 --node_rank=0 /workspace/examples/resnet50v1.5/main.py --data-backend dali-cpu --raport-file raport.json -j16 -p 100 --lr 2.048 --optimizer-batch-size 2048 --warmup 8 --arch resnet50 -c fanin --label-smoothing 0.1 --lr-schedule cosine --mom 0.875 --wd 3.0517578125e-05 -b 256 --epochs 90 /data/imagenet
+marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=1 --ntasks=1 --gres=gpu:1 --time=08:00:00 --pty --mem=50000 \
+                singularity exec --nv -B /scratch/ws/0/anpo879a-ImgNet/imagenet:/data/imagenet pytorch:21.06-py3 \
+                python /workspace/examples/resnet50v1.5/multiproc.py --nnodes=1 --nproc_per_node 1 \
+                --node_rank=0 /workspace/examples/resnet50v1.5/main.py --data-backend dali-cpu --raport-file raport.json \
+                -j16 -p 100 --lr 2.048 --optimizer-batch-size 2048 --warmup 8 --arch resnet50 -c fanin --label-smoothing 0.1 \
+                --lr-schedule cosine --mom 0.875 --wd 3.0517578125e-05 -b 256 --epochs 90 /data/imagenet
 ```
-#### Multi-GPU Usage
+### Multi-GPU Usage
 The majority of the NGC containers allow you to use multiple GPUs from one node
 to run the model inside the container.
@@ -151,21 +136,26 @@ An example of using the PyTorch container for the training of the ResNet50 model
 on the classification task on the ImageNet dataset is presented below:
 ```console
-marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=8 --ntasks=8 --gres=gpu:8 --time=08:00:00 --pty --mem=500000 bash
+marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=8 --ntasks=8 --gres=gpu:8 --time=08:00:00 --pty --mem=700G bash
 ```
 ```console
-marie@compute$ singularity exec --nv -B /scratch/ws/0/marie-ImgNet/imagenet:/data/imagenet /beegfs/global0/ws/marie-beegfs_container_storage/container_storage/pytorch:21.06-py3 python /workspace/examples/resnet50v1.5/multiproc.py --nnodes=1 --nproc_per_node 8 --node_rank=0 /workspace/examples/resnet50v1.5/main.py --data-backend dali-cpu --raport-file raport.json -j16 -p 100 --lr 2.048 --optimizer-batch-size 2048 --warmup 8 --arch resnet50 -c fanin --label-smoothing 0.1 --lr-schedule cosine --mom 0.875 --wd 3.0517578125e-05 -b 256 --epochs 90 /data/imagenet
+marie@alpha$ singularity exec --nv -B /scratch/ws/0/marie-ImgNet/imagenet:/data/imagenet pytorch:21.06-py3 \
+                python /workspace/examples/resnet50v1.5/multiproc.py --nnodes=1 --nproc_per_node 8 \
+                --node_rank=0 /workspace/examples/resnet50v1.5/main.py --data-backend dali-cpu \
+                --raport-file raport.json -j16 -p 100 --lr 2.048 --optimizer-batch-size 2048 --warmup 8 \
+                --arch resnet50 -c fanin --label-smoothing 0.1 --lr-schedule cosine --mom 0.875 \
+                --wd 3.0517578125e-05 -b 256 --epochs 90 /data/imagenet
 ```
 Please pay attention to the parameter `--nproc_per_node`.
-The value is equal to 8 because 8 GPUs per node were allocated by `srun`.
+The value is equal to 8 because 8 GPUs per node were allocated with `--gres=gpu:8`.
-#### Multi-node Usage
+### Multi-node Usage
 There are few NGC containers with Multi-node support
 [available](https://github.com/NVIDIA/DeepLearningExamples).
 Moreover, the realization of the multi-node usage depends on the authors
 of the exact container.
 Thus, right now, it is not possible to run NGC containers with multi-node support
-on the ZIH system without a change of the source code inside the container.
+on the ZIH system without changing the source code inside the container.