diff --git a/doc.zih.tu-dresden.de/docs/modules/.gitkeep b/doc.zih.tu-dresden.de/docs/modules/.gitkeep deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/doc.zih.tu-dresden.de/docs/software/Cloud.md b/doc.zih.tu-dresden.de/docs/software/Cloud.md index 9d9e550808df2e0c18f22bea3ff5bb3838bc4180..3819bdc56ceab064bfe46a722b3e1168324e6659 100644 --- a/doc.zih.tu-dresden.de/docs/software/Cloud.md +++ b/doc.zih.tu-dresden.de/docs/software/Cloud.md @@ -1,7 +1,7 @@ # Virtual machine on Taurus The following instructions are primarily aimed at users who want to build their -[Singularity](containers.md) containers on Taurus. +[Singularity](Containers.md) containers on Taurus. The Singularity container setup requires a Linux machine with root privileges, the same architecture and a compatible kernel. If some of these requirements can not be fulfilled, then there is diff --git a/doc.zih.tu-dresden.de/docs/software/containers.md b/doc.zih.tu-dresden.de/docs/software/Containers.md similarity index 100% rename from doc.zih.tu-dresden.de/docs/software/containers.md rename to doc.zih.tu-dresden.de/docs/software/Containers.md diff --git a/doc.zih.tu-dresden.de/docs/software/DeepLearning.md b/doc.zih.tu-dresden.de/docs/software/DeepLearning.md index e9f5854c43c32d6e6cfcf303edd999cc9b2dd17f..5eee674f447668a9d1f7f4119505af3941a2beb0 100644 --- a/doc.zih.tu-dresden.de/docs/software/DeepLearning.md +++ b/doc.zih.tu-dresden.de/docs/software/DeepLearning.md @@ -14,7 +14,7 @@ both the ml environment and the scs5 environment of the Taurus system. for dataflow and differentiable programming across a range of tasks. TensorFlow is available in both main partitions -[ml environment and scs5 environment](modules.md#module-environments) +[ml environment and scs5 environment](Modules.md#module-environments) under the module name "TensorFlow". However, for purposes of machine learning and deep learning, we recommend using Ml partition [HPC-DA](../jobs/HPCDA.md). For example: @@ -24,7 +24,7 @@ module load TensorFlow There are numerous different possibilities on how to work with [TensorFlow](TensorFlow.md) on Taurus. On this page, for all examples default, scs5 partition is used. Generally, the easiest way -is using the [modules system](modules.md) +is using the [modules system](Modules.md) and Python virtual environment (test case). However, in some cases, you may need directly installed Tensorflow stable or night releases. For this purpose use the [EasyBuild](CustomEasyBuildEnvironment.md), [Containers](TensorFlowContainerOnHPCDA.md) and see @@ -38,12 +38,12 @@ versions. [Keras](https://keras.io/) is a high-level neural network API, written in Python and capable of running on top of [TensorFlow](https://github.com/tensorflow/tensorflow) Keras is available in both -environments [ml environment and scs5 environment](modules.md#module-environments) under the module +environments [ml environment and scs5 environment](Modules.md#module-environments) under the module name "Keras". On this page for all examples default scs5 partition used. There are numerous different possibilities on how to work with [TensorFlow](TensorFlow.md) and Keras -on Taurus. Generally, the easiest way is using the [module system](modules.md) and Python +on Taurus. Generally, the easiest way is using the [module system](Modules.md) and Python virtual environment (test case) to see Tensorflow part above. For examples of using Keras for ml partition with the module system see the [Keras page for HPC-DA](Keras.md). diff --git a/doc.zih.tu-dresden.de/docs/software/GetStartedWithHPCDA.md b/doc.zih.tu-dresden.de/docs/software/GetStartedWithHPCDA.md index 4ac517d046a85700a5ce232da711f30f7b9402b5..1c14c5d346050d50992270cecbe7eb3ea9dab582 100644 --- a/doc.zih.tu-dresden.de/docs/software/GetStartedWithHPCDA.md +++ b/doc.zih.tu-dresden.de/docs/software/GetStartedWithHPCDA.md @@ -198,7 +198,7 @@ There are three main options on how to work with Tensorflow and PyTorch: ### Modules -The easiest way is using the [modules system](modules.md) and Python virtual environment. Modules +The easiest way is using the [modules system](Modules.md) and Python virtual environment. Modules are a way to use frameworks, compilers, loader, libraries, and utilities. The module is a user interface that provides utilities for the dynamic modification of a user's environment without manual modifications. You could use them for srun , bath jobs (sbatch) and the Jupyterhub. @@ -327,7 +327,7 @@ page of the container. To use not a pure Tensorflow, PyTorch but also with some Python packages you have to use the definition file to create the container -(bootstrapping). For details please see the [Container](containers.md) page +(bootstrapping). For details please see the [Container](Containers.md) page from our wiki. Bootstrapping **has required root privileges** and Virtual Machine (VM) should be used! There are two main options on how to work with VM on Taurus: [VM tools](VMTools.md) - automotive algorithms diff --git a/doc.zih.tu-dresden.de/docs/software/modules.md b/doc.zih.tu-dresden.de/docs/software/Modules.md similarity index 100% rename from doc.zih.tu-dresden.de/docs/software/modules.md rename to doc.zih.tu-dresden.de/docs/software/Modules.md diff --git a/doc.zih.tu-dresden.de/docs/software/overview.md b/doc.zih.tu-dresden.de/docs/software/Overview.md similarity index 96% rename from doc.zih.tu-dresden.de/docs/software/overview.md rename to doc.zih.tu-dresden.de/docs/software/Overview.md index dbb75ca856f493db73454fb45bbedc4955c6d7be..f856f706e2126c5cbf40939020dece5967e00212 100644 --- a/doc.zih.tu-dresden.de/docs/software/overview.md +++ b/doc.zih.tu-dresden.de/docs/software/Overview.md @@ -14,7 +14,7 @@ There are a lot of different possibilities to work with software on Taurus: ## Modules Usage of software on HPC systems is managed by a **modules system**. Thus, it is crucial to -be familiar with the [modules concept and commands](modules.md). Modules are a way to use +be familiar with the [modules concept and commands](Modules.md). Modules are a way to use frameworks, compilers, loader, libraries, and utilities. A module is a user interface that provides utilities for the dynamic modification of a user's environment without manual modifications. You could use them for `srun`, batch jobs (`sbatch`) and the Jupyterhub. diff --git a/doc.zih.tu-dresden.de/docs/software/pika.md b/doc.zih.tu-dresden.de/docs/software/PIKA.md similarity index 100% rename from doc.zih.tu-dresden.de/docs/software/pika.md rename to doc.zih.tu-dresden.de/docs/software/PIKA.md diff --git a/doc.zih.tu-dresden.de/docs/software/PyTorch.md b/doc.zih.tu-dresden.de/docs/software/PyTorch.md index e73b224881fae055861c8d52c50eb61760ea2d6b..90018e4efd20fef71e7637c516d713fe2b69a608 100644 --- a/doc.zih.tu-dresden.de/docs/software/PyTorch.md +++ b/doc.zih.tu-dresden.de/docs/software/PyTorch.md @@ -24,8 +24,8 @@ and users who are just starting their work with Taurus. 2\. The second way is using the Modules system and Python or conda virtual environment. See [the Python page](Python.md) for the HPC-DA system. -Note: The information on working with the PyTorch using Containers could -be found [here](containers.md). +Note: The information on working with the PyTorch using Containers could be found +[here](Containers.md). ## Get started with PyTorch diff --git a/doc.zih.tu-dresden.de/docs/software/Python.md b/doc.zih.tu-dresden.de/docs/software/Python.md index d345e749df59efa54833908a621c98cfbc0472f7..92d7070a7e5d42ed74e0613ec2dabba9321085c7 100644 --- a/doc.zih.tu-dresden.de/docs/software/Python.md +++ b/doc.zih.tu-dresden.de/docs/software/Python.md @@ -14,8 +14,8 @@ Taurus system and basic knowledge about Python, Numpy and SLURM system. There are three main options on how to work with Keras and Tensorflow on the HPC-DA: 1. Modules; 2. [JupyterNotebook](JupyterHub.md); -3.[Containers](containers.md). The main way is using the [Modules -system](modules.md) and Python virtual environment. +3.[Containers](Containers.md). The main way is using the +[Modules system](Modules.md) and Python virtual environment. Note: You could work with simple examples in your home directory but according to [HPCStorageConcept2019](../data_management/HPCStorageConcept2019.md) please use **workspaces** @@ -170,20 +170,22 @@ module. Moreover, it is possible to install mpi4py in your local conda environment: - srun -p ml --time=04:00:00 -n 1 --pty --mem-per-cpu=8000 bash #allocate recources - module load modenv/ml - module load PythonAnaconda/3.6 #load module to use conda - conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create conda virtual environment +```Bash +srun -p ml --time=04:00:00 -n 1 --pty --mem-per-cpu=8000 bash #allocate recources +module load modenv/ml +module load PythonAnaconda/3.6 #load module to use conda +conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create conda virtual environment - conda activate <location_for_your_environment> #activate your virtual environment +conda activate <location_for_your_environment> #activate your virtual environment - conda install -c conda-forge mpi4py #install mpi4py +conda install -c conda-forge mpi4py #install mpi4py - python #start python +python #start python - from mpi4py import MPI #verify your mpi4py - comm = MPI.COMM_WORLD - print("%d of %d" % (comm.Get_rank(), comm.Get_size())) +from mpi4py import MPI #verify your mpi4py +comm = MPI.COMM_WORLD +print("%d of %d" % (comm.Get_rank(), comm.Get_size())) +``` ### Horovod @@ -203,13 +205,14 @@ in some cases better results than pure TensorFlow and PyTorch. #### Horovod as a module -Horovod is available as a module with **TensorFlow** or **PyTorch**for -**all** module environments. Please check the [software module -list](modules.md) for the current version of the software. +Horovod is available as a module with **TensorFlow** or **PyTorch**for **all** module environments. +Please check the [software module list](Modules.md) for the current version of the software. Horovod can be loaded like other software on the Taurus: - ml av Horovod #Check available modules with Python - module load Horovod #Loading of the module +```Bash +ml av Horovod #Check available modules with Python +module load Horovod #Loading of the module +``` #### Horovod installation @@ -224,36 +227,42 @@ for your study and work projects** (see the Storage concept). Setup: - srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes) - module load modenv/ml #Load dependencies by using modules - module load OpenMPI/3.1.4-gcccuda-2018b - module load Python/3.6.6-fosscuda-2018b - module load cuDNN/7.1.4.18-fosscuda-2018b - module load CMake/3.11.4-GCCcore-7.3.0 - virtualenv --system-site-packages <location_for_your_environment> #create virtual environment - source <location_for_your_environment>/bin/activate #activate virtual environment +```Bash +srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes) +module load modenv/ml #Load dependencies by using modules +module load OpenMPI/3.1.4-gcccuda-2018b +module load Python/3.6.6-fosscuda-2018b +module load cuDNN/7.1.4.18-fosscuda-2018b +module load CMake/3.11.4-GCCcore-7.3.0 +virtualenv --system-site-packages <location_for_your_environment> #create virtual environment +source <location_for_your_environment>/bin/activate #activate virtual environment +``` Or when you need to use conda: - srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes) - module load modenv/ml #Load dependencies by using modules - module load OpenMPI/3.1.4-gcccuda-2018b - module load PythonAnaconda/3.6 - module load cuDNN/7.1.4.18-fosscuda-2018b - module load CMake/3.11.4-GCCcore-7.3.0 - - conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create virtual environment - - conda activate <location_for_your_environment> #activate virtual environment +```Bash +srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes) +module load modenv/ml #Load dependencies by using modules +module load OpenMPI/3.1.4-gcccuda-2018b +module load PythonAnaconda/3.6 +module load cuDNN/7.1.4.18-fosscuda-2018b +module load CMake/3.11.4-GCCcore-7.3.0 + +conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create virtual environment + +conda activate <location_for_your_environment> #activate virtual environment +``` Install Pytorch (not recommended) - cd /tmp - git clone https://github.com/pytorch/pytorch #clone Pytorch from the source - cd pytorch #go to folder - git checkout v1.7.1 #Checkout version (example: 1.7.1) - git submodule update --init #Update dependencies - python setup.py install #install it with python +```Bash +cd /tmp +git clone https://github.com/pytorch/pytorch #clone Pytorch from the source +cd pytorch #go to folder +git checkout v1.7.1 #Checkout version (example: 1.7.1) +git submodule update --init #Update dependencies +python setup.py install #install it with python +``` ##### Install Horovod for Pytorch with python and pip @@ -261,22 +270,28 @@ In the example presented installation for the Pytorch without TensorFlow. Adapt as required and refer to the horovod documentation for details. - HOROVOD_GPU_ALLREDUCE=MPI HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod +```Bash +HOROVOD_GPU_ALLREDUCE=MPI HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod +``` ##### Verify that Horovod works - python #start python - import torch #import pytorch - import horovod.torch as hvd #import horovod - hvd.init() #initialize horovod - hvd.size() - hvd.rank() - print('Hello from:', hvd.rank()) +```Bash +python #start python +import torch #import pytorch +import horovod.torch as hvd #import horovod +hvd.init() #initialize horovod +hvd.size() +hvd.rank() +print('Hello from:', hvd.rank()) +``` ##### Horovod with NCCL If you want to use NCCL instead of MPI you can specify that in the install command after loading the NCCL module: - module load NCCL/2.3.7-fosscuda-2018b - HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod +```Bash +module load NCCL/2.3.7-fosscuda-2018b +HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod +``` diff --git a/doc.zih.tu-dresden.de/docs/software/VMTools.md b/doc.zih.tu-dresden.de/docs/software/VMTools.md index 556270290641d83c618d29709cfc6b88d7c2c193..884926b697f0cf72920874047ef112c0373f4c80 100644 --- a/doc.zih.tu-dresden.de/docs/software/VMTools.md +++ b/doc.zih.tu-dresden.de/docs/software/VMTools.md @@ -1,7 +1,7 @@ # Singularity on Power9 / ml partition Building Singularity containers from a recipe on Taurus is normally not possible due to the -requirement of root (administrator) rights, see [Containers](containers.md). For obvious reasons +requirement of root (administrator) rights, see [Containers](Containers.md). For obvious reasons users on Taurus cannot be granted root permissions. The solution is to build your container on your local Linux machine by executing something like diff --git a/doc.zih.tu-dresden.de/docs/software/Visualization.md b/doc.zih.tu-dresden.de/docs/software/Visualization.md index 2d7b0787d7e1652504c9d8f0d8baafa91f940b4d..79b3bdef27121a47cd6110277f8643aec0237d20 100644 --- a/doc.zih.tu-dresden.de/docs/software/Visualization.md +++ b/doc.zih.tu-dresden.de/docs/software/Visualization.md @@ -4,7 +4,7 @@ [ParaView](https://paraview.org) is an open-source, multi-platform data analysis and visualization application. It is available on Taurus under -the `ParaView` [modules](modules.md#modules-environment) +the `ParaView` [modules](Modules.md#modules-environment) ```Bash taurus$ module avail ParaView diff --git a/doc.zih.tu-dresden.de/docs/use_of_hardware/AlphaCentauri.md b/doc.zih.tu-dresden.de/docs/use_of_hardware/AlphaCentauri.md index 6c13dcee7742a53f12ff0f09c1cfa5eb15a22666..e7a1368f44b5c6ee48f359548a7216ac9427dedb 100644 --- a/doc.zih.tu-dresden.de/docs/use_of_hardware/AlphaCentauri.md +++ b/doc.zih.tu-dresden.de/docs/use_of_hardware/AlphaCentauri.md @@ -29,11 +29,11 @@ cluster: 1. **Modules** 1 **Virtual Environments (manual software installation)** 1. [JupyterHub](https://taurus.hrsk.tu-dresden.de/) -1. [Containers](../software/containers.md) +1. [Containers](../software/Containers.md) ### Modules -The easiest way is using the [module system](../software/modules.md) and Python virtual environment. +The easiest way is using the [module system](../software/Modules.md) and Python virtual environment. Modules are a way to use frameworks, compilers, loader, libraries, and utilities. The software environment for the **alpha** partition is available under the name **hiera**: @@ -100,7 +100,7 @@ conda deactivate #Leave the virtual environment New software for data analytics is emerging faster than we can install it. If you urgently need a certain version we advise you to manually install it (the machine learning frameworks and required -packages) in your virtual environment (or use a [container](../software/containers.md). +packages) in your virtual environment (or use a [container](../software/Containers.md). The **Virtualenv** example: @@ -183,7 +183,7 @@ parameter). On Taurus [Singularity](https://sylabs.io/) is used as a standard container solution. It can be run on the `alpha` partition as well. Singularity enables users to have full control of their environment. Detailed information about containers can be found -[here](../software/containers.md). +[here](../software/Containers.md). Nvidia [NGC](https://developer.nvidia.com/blog/how-to-run-ngc-deep-learning-containers-with-singularity/) diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index f94b619153bf0d925d1dea27b2afd3f10ccc0afa..21004dc1b0d13d306eff9651265f42eed7429303 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -13,17 +13,24 @@ nav: - Login: access/Login.md - Security Restrictions: access/SecurityRestrictions.md - SSH with Putty: access/SSHMitPutty.md - - Available Software and Usage: - - Overview: software/overview.md - - Modules: software/modules.md - - JupyterHub: software/JupyterHub.md - - JupyterHub for Teaching: software/JupyterHubForTeaching.md + - Environment and Software: + - Overview: software/Overview.md + - Environment: + - Modules: software/Modules.md + - Custom EasyBuild Modules: software/CustomEasyBuildEnvironment.md + - JupyterHub: + - Overview: software/JupyterHub.md + - JupyterHub for Teaching: software/JupyterHubForTeaching.md - Containers: - - Singularity: software/containers.md + - Singularity: software/Containers.md - Singularity Recicpe Hints: software/SingularityRecipeHints.md - Singularity Example Definitions: software/SingularityExampleDefinitions.md - - Custom Easy Build Modules: software/CustomEasyBuildEnvironment.md - - Mathematics: software/Mathematics.md + - VM tools: software/VMTools.md + - Applications: + - Bio Informatics: software/Bioinformatics.md + - Computational Fluid Dynamics (CFD): software/CFD.md + - NanoscaleSimulations: software/NanoscaleSimulations.md + - FEMSoftware: software/FEMSoftware.md - Visualization: software/Visualization.md - HPC-DA: - Get started with HPC-DA: software/GetStartedWithHPCDA.md @@ -39,14 +46,8 @@ nav: - Dask: software/Dask.md - Power AI: software/PowerAI.md - PyTorch: software/PyTorch.md - - Computational Fluid Dynamics (CFD): software/CFD.md - - FAQs: software/modules-faq.md - - Bio Informatics: software/Bioinformatics.md - SCS5 Migration Hints: software/SCS5Software.md - - NanoscaleSimulations: software/NanoscaleSimulations.md - - FEMSoftware: software/FEMSoftware.md - Cloud: software/Cloud.md - - VM tools: software/VMTools.md - Virtual Desktops: software/VirtualDesktops.md - Software Development and Tools: - Overview: software/SoftwareDevelopment.md @@ -58,8 +59,9 @@ nav: - Score-P: software/ScoreP.md - PAPI Library: software/PapiLibrary.md - Perf Tools: software/PerfTools.md - - PIKA: software/pika.md + - PIKA: software/PIKA.md - Vampir: software/Vampir.md + - Mathematics: software/Mathematics.md - Data Management: - Overview: data_management/DataManagement.md - Announcement of Quotas: data_management/AnnouncementOfQuotas.md