Skip to content
Snippets Groups Projects
Commit 99fde98b authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Merge branch 'software_usage' into 'preview'

Structure software section

See merge request zih/hpc-compendium/hpc-compendium!163
parents 7e432ea5 9eacc495
No related branches found
No related tags found
3 merge requests!322Merge preview into main,!319Merge preview into main,!163Structure software section
Showing
with 99 additions and 82 deletions
# Virtual machine on Taurus
The following instructions are primarily aimed at users who want to build their
[Singularity](containers.md) containers on Taurus.
[Singularity](Containers.md) containers on Taurus.
The Singularity container setup requires a Linux machine with root privileges, the same architecture
and a compatible kernel. If some of these requirements can not be fulfilled, then there is
......
......@@ -14,7 +14,7 @@ both the ml environment and the scs5 environment of the Taurus system.
for dataflow and differentiable programming across a range of tasks.
TensorFlow is available in both main partitions
[ml environment and scs5 environment](modules.md#module-environments)
[ml environment and scs5 environment](Modules.md#module-environments)
under the module name "TensorFlow". However, for purposes of machine learning and deep learning, we
recommend using Ml partition [HPC-DA](../jobs/HPCDA.md). For example:
......@@ -24,7 +24,7 @@ module load TensorFlow
There are numerous different possibilities on how to work with [TensorFlow](TensorFlow.md) on
Taurus. On this page, for all examples default, scs5 partition is used. Generally, the easiest way
is using the [modules system](modules.md)
is using the [modules system](Modules.md)
and Python virtual environment (test case). However, in some cases, you may need directly installed
Tensorflow stable or night releases. For this purpose use the
[EasyBuild](CustomEasyBuildEnvironment.md), [Containers](TensorFlowContainerOnHPCDA.md) and see
......@@ -38,12 +38,12 @@ versions.
[Keras](https://keras.io/) is a high-level neural network API, written in Python and capable of
running on top of [TensorFlow](https://github.com/tensorflow/tensorflow) Keras is available in both
environments [ml environment and scs5 environment](modules.md#module-environments) under the module
environments [ml environment and scs5 environment](Modules.md#module-environments) under the module
name "Keras".
On this page for all examples default scs5 partition used. There are numerous different
possibilities on how to work with [TensorFlow](TensorFlow.md) and Keras
on Taurus. Generally, the easiest way is using the [module system](modules.md) and Python
on Taurus. Generally, the easiest way is using the [module system](Modules.md) and Python
virtual environment (test case) to see Tensorflow part above.
For examples of using Keras for ml partition with the module system see the
[Keras page for HPC-DA](Keras.md).
......
......@@ -198,7 +198,7 @@ There are three main options on how to work with Tensorflow and PyTorch:
### Modules
The easiest way is using the [modules system](modules.md) and Python virtual environment. Modules
The easiest way is using the [modules system](Modules.md) and Python virtual environment. Modules
are a way to use frameworks, compilers, loader, libraries, and utilities. The module is a user
interface that provides utilities for the dynamic modification of a user's environment without
manual modifications. You could use them for srun , bath jobs (sbatch) and the Jupyterhub.
......@@ -327,7 +327,7 @@ page of the container.
To use not a pure Tensorflow, PyTorch but also with some Python packages
you have to use the definition file to create the container
(bootstrapping). For details please see the [Container](containers.md) page
(bootstrapping). For details please see the [Container](Containers.md) page
from our wiki. Bootstrapping **has required root privileges** and
Virtual Machine (VM) should be used! There are two main options on how
to work with VM on Taurus: [VM tools](VMTools.md) - automotive algorithms
......
......@@ -14,7 +14,7 @@ There are a lot of different possibilities to work with software on Taurus:
## Modules
Usage of software on HPC systems is managed by a **modules system**. Thus, it is crucial to
be familiar with the [modules concept and commands](modules.md). Modules are a way to use
be familiar with the [modules concept and commands](Modules.md). Modules are a way to use
frameworks, compilers, loader, libraries, and utilities. A module is a user interface that provides
utilities for the dynamic modification of a user's environment without manual modifications. You
could use them for `srun`, batch jobs (`sbatch`) and the Jupyterhub.
......
......@@ -24,8 +24,8 @@ and users who are just starting their work with Taurus.
2\. The second way is using the Modules system and Python or conda virtual environment.
See [the Python page](Python.md) for the HPC-DA system.
Note: The information on working with the PyTorch using Containers could
be found [here](containers.md).
Note: The information on working with the PyTorch using Containers could be found
[here](Containers.md).
## Get started with PyTorch
......
......@@ -14,8 +14,8 @@ Taurus system and basic knowledge about Python, Numpy and SLURM system.
There are three main options on how to
work with Keras and Tensorflow on the HPC-DA: 1. Modules; 2. [JupyterNotebook](JupyterHub.md);
3.[Containers](containers.md). The main way is using the [Modules
system](modules.md) and Python virtual environment.
3.[Containers](Containers.md). The main way is using the
[Modules system](Modules.md) and Python virtual environment.
Note: You could work with simple examples in your home directory but according to
[HPCStorageConcept2019](../data_management/HPCStorageConcept2019.md) please use **workspaces**
......@@ -170,20 +170,22 @@ module.
Moreover, it is possible to install mpi4py in your local conda
environment:
srun -p ml --time=04:00:00 -n 1 --pty --mem-per-cpu=8000 bash #allocate recources
module load modenv/ml
module load PythonAnaconda/3.6 #load module to use conda
conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create conda virtual environment
```Bash
srun -p ml --time=04:00:00 -n 1 --pty --mem-per-cpu=8000 bash #allocate recources
module load modenv/ml
module load PythonAnaconda/3.6 #load module to use conda
conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create conda virtual environment
conda activate <location_for_your_environment> #activate your virtual environment
conda activate <location_for_your_environment> #activate your virtual environment
conda install -c conda-forge mpi4py #install mpi4py
conda install -c conda-forge mpi4py #install mpi4py
python #start python
python #start python
from mpi4py import MPI #verify your mpi4py
comm = MPI.COMM_WORLD
print("%d of %d" % (comm.Get_rank(), comm.Get_size()))
from mpi4py import MPI #verify your mpi4py
comm = MPI.COMM_WORLD
print("%d of %d" % (comm.Get_rank(), comm.Get_size()))
```
### Horovod
......@@ -203,13 +205,14 @@ in some cases better results than pure TensorFlow and PyTorch.
#### Horovod as a module
Horovod is available as a module with **TensorFlow** or **PyTorch**for
**all** module environments. Please check the [software module
list](modules.md) for the current version of the software.
Horovod is available as a module with **TensorFlow** or **PyTorch**for **all** module environments.
Please check the [software module list](Modules.md) for the current version of the software.
Horovod can be loaded like other software on the Taurus:
ml av Horovod #Check available modules with Python
module load Horovod #Loading of the module
```Bash
ml av Horovod #Check available modules with Python
module load Horovod #Loading of the module
```
#### Horovod installation
......@@ -224,36 +227,42 @@ for your study and work projects** (see the Storage concept).
Setup:
srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes)
module load modenv/ml #Load dependencies by using modules
module load OpenMPI/3.1.4-gcccuda-2018b
module load Python/3.6.6-fosscuda-2018b
module load cuDNN/7.1.4.18-fosscuda-2018b
module load CMake/3.11.4-GCCcore-7.3.0
virtualenv --system-site-packages <location_for_your_environment> #create virtual environment
source <location_for_your_environment>/bin/activate #activate virtual environment
```Bash
srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes)
module load modenv/ml #Load dependencies by using modules
module load OpenMPI/3.1.4-gcccuda-2018b
module load Python/3.6.6-fosscuda-2018b
module load cuDNN/7.1.4.18-fosscuda-2018b
module load CMake/3.11.4-GCCcore-7.3.0
virtualenv --system-site-packages <location_for_your_environment> #create virtual environment
source <location_for_your_environment>/bin/activate #activate virtual environment
```
Or when you need to use conda:
srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes)
module load modenv/ml #Load dependencies by using modules
module load OpenMPI/3.1.4-gcccuda-2018b
module load PythonAnaconda/3.6
module load cuDNN/7.1.4.18-fosscuda-2018b
module load CMake/3.11.4-GCCcore-7.3.0
conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create virtual environment
conda activate <location_for_your_environment> #activate virtual environment
```Bash
srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes)
module load modenv/ml #Load dependencies by using modules
module load OpenMPI/3.1.4-gcccuda-2018b
module load PythonAnaconda/3.6
module load cuDNN/7.1.4.18-fosscuda-2018b
module load CMake/3.11.4-GCCcore-7.3.0
conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create virtual environment
conda activate <location_for_your_environment> #activate virtual environment
```
Install Pytorch (not recommended)
cd /tmp
git clone https://github.com/pytorch/pytorch #clone Pytorch from the source
cd pytorch #go to folder
git checkout v1.7.1 #Checkout version (example: 1.7.1)
git submodule update --init #Update dependencies
python setup.py install #install it with python
```Bash
cd /tmp
git clone https://github.com/pytorch/pytorch #clone Pytorch from the source
cd pytorch #go to folder
git checkout v1.7.1 #Checkout version (example: 1.7.1)
git submodule update --init #Update dependencies
python setup.py install #install it with python
```
##### Install Horovod for Pytorch with python and pip
......@@ -261,22 +270,28 @@ In the example presented installation for the Pytorch without
TensorFlow. Adapt as required and refer to the horovod documentation for
details.
HOROVOD_GPU_ALLREDUCE=MPI HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod
```Bash
HOROVOD_GPU_ALLREDUCE=MPI HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod
```
##### Verify that Horovod works
python #start python
import torch #import pytorch
import horovod.torch as hvd #import horovod
hvd.init() #initialize horovod
hvd.size()
hvd.rank()
print('Hello from:', hvd.rank())
```Bash
python #start python
import torch #import pytorch
import horovod.torch as hvd #import horovod
hvd.init() #initialize horovod
hvd.size()
hvd.rank()
print('Hello from:', hvd.rank())
```
##### Horovod with NCCL
If you want to use NCCL instead of MPI you can specify that in the
install command after loading the NCCL module:
module load NCCL/2.3.7-fosscuda-2018b
HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod
```Bash
module load NCCL/2.3.7-fosscuda-2018b
HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod
```
# Singularity on Power9 / ml partition
Building Singularity containers from a recipe on Taurus is normally not possible due to the
requirement of root (administrator) rights, see [Containers](containers.md). For obvious reasons
requirement of root (administrator) rights, see [Containers](Containers.md). For obvious reasons
users on Taurus cannot be granted root permissions.
The solution is to build your container on your local Linux machine by executing something like
......
......@@ -4,7 +4,7 @@
[ParaView](https://paraview.org) is an open-source, multi-platform data
analysis and visualization application. It is available on Taurus under
the `ParaView` [modules](modules.md#modules-environment)
the `ParaView` [modules](Modules.md#modules-environment)
```Bash
taurus$ module avail ParaView
......
......@@ -29,11 +29,11 @@ cluster:
1. **Modules**
1 **Virtual Environments (manual software installation)**
1. [JupyterHub](https://taurus.hrsk.tu-dresden.de/)
1. [Containers](../software/containers.md)
1. [Containers](../software/Containers.md)
### Modules
The easiest way is using the [module system](../software/modules.md) and Python virtual environment.
The easiest way is using the [module system](../software/Modules.md) and Python virtual environment.
Modules are a way to use frameworks, compilers, loader, libraries, and utilities. The software
environment for the **alpha** partition is available under the name **hiera**:
......@@ -100,7 +100,7 @@ conda deactivate #Leave the virtual environment
New software for data analytics is emerging faster than we can install it. If you urgently need a
certain version we advise you to manually install it (the machine learning frameworks and required
packages) in your virtual environment (or use a [container](../software/containers.md).
packages) in your virtual environment (or use a [container](../software/Containers.md).
The **Virtualenv** example:
......@@ -183,7 +183,7 @@ parameter).
On Taurus [Singularity](https://sylabs.io/) is used as a standard container
solution. It can be run on the `alpha` partition as well. Singularity enables users to have full
control of their environment. Detailed information about containers can be found
[here](../software/containers.md).
[here](../software/Containers.md).
Nvidia
[NGC](https://developer.nvidia.com/blog/how-to-run-ngc-deep-learning-containers-with-singularity/)
......
......@@ -13,17 +13,24 @@ nav:
- Login: access/Login.md
- Security Restrictions: access/SecurityRestrictions.md
- SSH with Putty: access/SSHMitPutty.md
- Available Software and Usage:
- Overview: software/overview.md
- Modules: software/modules.md
- JupyterHub: software/JupyterHub.md
- JupyterHub for Teaching: software/JupyterHubForTeaching.md
- Environment and Software:
- Overview: software/Overview.md
- Environment:
- Modules: software/Modules.md
- Custom EasyBuild Modules: software/CustomEasyBuildEnvironment.md
- JupyterHub:
- Overview: software/JupyterHub.md
- JupyterHub for Teaching: software/JupyterHubForTeaching.md
- Containers:
- Singularity: software/containers.md
- Singularity: software/Containers.md
- Singularity Recicpe Hints: software/SingularityRecipeHints.md
- Singularity Example Definitions: software/SingularityExampleDefinitions.md
- Custom Easy Build Modules: software/CustomEasyBuildEnvironment.md
- Mathematics: software/Mathematics.md
- VM tools: software/VMTools.md
- Applications:
- Bio Informatics: software/Bioinformatics.md
- Computational Fluid Dynamics (CFD): software/CFD.md
- NanoscaleSimulations: software/NanoscaleSimulations.md
- FEMSoftware: software/FEMSoftware.md
- Visualization: software/Visualization.md
- HPC-DA:
- Get started with HPC-DA: software/GetStartedWithHPCDA.md
......@@ -39,14 +46,8 @@ nav:
- Dask: software/Dask.md
- Power AI: software/PowerAI.md
- PyTorch: software/PyTorch.md
- Computational Fluid Dynamics (CFD): software/CFD.md
- FAQs: software/modules-faq.md
- Bio Informatics: software/Bioinformatics.md
- SCS5 Migration Hints: software/SCS5Software.md
- NanoscaleSimulations: software/NanoscaleSimulations.md
- FEMSoftware: software/FEMSoftware.md
- Cloud: software/Cloud.md
- VM tools: software/VMTools.md
- Virtual Desktops: software/VirtualDesktops.md
- Software Development and Tools:
- Overview: software/SoftwareDevelopment.md
......@@ -58,8 +59,9 @@ nav:
- Score-P: software/ScoreP.md
- PAPI Library: software/PapiLibrary.md
- Perf Tools: software/PerfTools.md
- PIKA: software/pika.md
- PIKA: software/PIKA.md
- Vampir: software/Vampir.md
- Mathematics: software/Mathematics.md
- Data Management:
- Overview: data_management/DataManagement.md
- Announcement of Quotas: data_management/AnnouncementOfQuotas.md
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment