From ce8c86f19db25de589257ddf31a04ebb9f37b481 Mon Sep 17 00:00:00 2001 From: Alexander Grund <alexander.grund@tu-dresden.de> Date: Tue, 5 Mar 2024 17:16:18 +0100 Subject: [PATCH] Fix commands and update expected output in changed code --- .../docs/software/distributed_training.md | 2 +- .../docs/software/hyperparameter_optimization.md | 5 ++--- doc.zih.tu-dresden.de/docs/software/modules.md | 15 ++++++++++----- doc.zih.tu-dresden.de/docs/software/pytorch.md | 4 ++-- 4 files changed, 15 insertions(+), 11 deletions(-) diff --git a/doc.zih.tu-dresden.de/docs/software/distributed_training.md b/doc.zih.tu-dresden.de/docs/software/distributed_training.md index 445c326f3..df7652493 100644 --- a/doc.zih.tu-dresden.de/docs/software/distributed_training.md +++ b/doc.zih.tu-dresden.de/docs/software/distributed_training.md @@ -259,7 +259,7 @@ Or if you want to use Horovod on the cluster `alpha`, you can load it with the d ```console marie@alpha$ module spider Horovod #Check available modules -marie@alpha$ module load release/23.04 GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5 Horovod/0.21.1-TensorFlow-2.4.1 +marie@alpha$ module load release/23.04 release/23.04 GCC/11.3.0 OpenMPI/4.1.4 Horovod/0.28.1-CUDA-11.7.0-TensorFlow-2.11.0 ``` #### Horovod Installation diff --git a/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md b/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md index 1af996cb5..4288eb08a 100644 --- a/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md +++ b/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md @@ -202,7 +202,7 @@ There are the following script preparation steps for OmniOpt: [workspace](../data_lifecycle/workspaces.md). ```console - marie@login$ module load release/23.04 GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5 PyTorch/1.9.0 + marie@login$ module load release/23.04 GCC/11.3.0 OpenMPI/4.1.4 PyTorch/1.12.1 marie@login$ mkdir </path/to/workspace/python-environments> #create folder marie@login$ virtualenv --system-site-packages </path/to/workspace/python-environments/torchvision_env> marie@login$ source </path/to/workspace/python-environments/torchvision_env>/bin/activate #activate virtual environment @@ -212,11 +212,10 @@ There are the following script preparation steps for OmniOpt: ```console # Job submission on alpha nodes with 1 GPU on 1 node with 800 MB per CPU marie@login$ srun --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=800 bash - marie@alpha$ module load release/23.04 GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5 PyTorch/1.9.0 + marie@alpha$ module load release/23.04 GCC/11.3.0 OpenMPI/4.1.4 PyTorch/1.12.1 # Activate virtual environment marie@alpha$ source </path/to/workspace/python-environments/torchvision_env>/bin/activate - Module GCC/10.2.0, CUDA/11.1.1, OpenMPI/4.0.5, PyTorch/1.9.0 and 54 dependencies loaded. marie@alpha$ python </path/to/your/script/mnistFashion.py> --out-layer1=200 --batchsize=10 --epochs=3 [...] Epoch 3 diff --git a/doc.zih.tu-dresden.de/docs/software/modules.md b/doc.zih.tu-dresden.de/docs/software/modules.md index fc16562f1..88379b8cd 100644 --- a/doc.zih.tu-dresden.de/docs/software/modules.md +++ b/doc.zih.tu-dresden.de/docs/software/modules.md @@ -257,13 +257,18 @@ In some cases a desired software is available as an extension of a module. Finaly, you can load the dependencies and `tensorboard/2.4.1` and check the version. ```console - marie@login$ module load release/23.04 GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5 + marie@login$ module load release/23.04 GCC/11.3.0 OpenMPI/4.1.4 + + Modules GCC/10.2.0, CUDA/11.1.1, OpenMPI/4.0.5 and 15 dependencies loaded. + marie@login$ module load TensorFlow/2.11.0-CUDA-11.7.0 + + Aktiviere Module: + 1) CUDA/11.7.0 2) GDRCopy/2.3 + + Module TensorFlow/2.11.0-CUDA-11.7.0 and 39 dependencies loaded. - Module GCC/10.2.0, CUDA/11.1.1, OpenMPI/4.0.5 and 15 dependencies loaded. - marie@login$ module load TensorFlow/2.4.1 - Module TensorFlow/2.4.1 and 34 dependencies loaded. marie@login$ tensorboard --version - 2.4.1 + 2.11.1 ``` ## Toolchains diff --git a/doc.zih.tu-dresden.de/docs/software/pytorch.md b/doc.zih.tu-dresden.de/docs/software/pytorch.md index 4f6d0d78b..efbce5ac4 100644 --- a/doc.zih.tu-dresden.de/docs/software/pytorch.md +++ b/doc.zih.tu-dresden.de/docs/software/pytorch.md @@ -65,8 +65,8 @@ marie@login.power$ module spider pytorch we know that we can load PyTorch (including torchvision) with ```console -marie@power$ module load torchvision/0.7.0-fossCUDA-2019b-Python-3.7.4-PyTorch-1.6.0 -Module torchvision/0.7.0-fossCUDA-2019b-Python-3.7.4-PyTorch-1.6.0 and 55 dependencies loaded. +marie@power$ module load release/23.04 GCC/11.3.0 OpenMPI/4.1.4 torchvision/0.13.1 +Modules GCC/11.3.0, OpenMPI/4.1.4, torchvision/0.13.1 and 62 dependencies loaded. ``` Now, we check that we can access PyTorch: -- GitLab