diff --git a/doc.zih.tu-dresden.de/docs/software/machine_learning.md b/doc.zih.tu-dresden.de/docs/software/machine_learning.md index dc280d7c66ff5be947bcddbfc2a954c614d10c83..14d53d3c18165a4aa48fcd0b5696c909e5d4e72b 100644 --- a/doc.zih.tu-dresden.de/docs/software/machine_learning.md +++ b/doc.zih.tu-dresden.de/docs/software/machine_learning.md @@ -23,8 +23,8 @@ specially made for the ml partition (from `modenv/ml`). On the **ML** partition load the module environment: ```console -marie@login$ srun -p ml --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission in ml nodes with 1 gpu on 1 node with 8000 Mb per CPU -marie@ml$ module load modenv/ml #example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml +marie@ml$ module load modenv/ml +The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml ``` ## Alpha partition @@ -38,8 +38,8 @@ space (/tmp) on an NVMe device. You can find more details of the partition [here On the **Alpha** partition load the module environment: ```console -marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission on alpha nodes with 1 gpu on 1 node with 8000 Mb per CPU -marie@romeo$ module load modenv/scs5 +marie@alpha$ module load modenv/scs5 +The following have been reloaded with a version change: 1) modenv/ml => modenv/scs5 ``` ## Machine Learning via Console @@ -96,9 +96,10 @@ In the following example, we build a Singularity container with TensorFlow from start it: ```console -marie@login$ srun -p ml -N 1 --gres=gpu:1 --time=02:00:00 --pty --mem-per-cpu=8000 bash #allocating resourses from ml nodes to start the job to create a container. marie@ml$ singularity build my-ML-container.sif docker://ibmcom/tensorflow-ppc64le #create a container from the DockerHub with the last TensorFlow version +[...] marie@ml$ singularity run --nv my-ML-container.sif #run my-ML-container.sif container supporting the Nvidia's GPU. You can also work with your container by: singularity shell, singularity exec +[...] ``` ## Additional Libraries for Machine Learning diff --git a/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md b/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md index 8bef0e59b6c54d354eb7c30ab79c7af4be6a5ad8..1d34317cb37f3da53a48f4fea03bb7b264b9a4bb 100644 --- a/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md +++ b/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md @@ -22,30 +22,35 @@ conda manager is included in all versions of Anaconda and Miniconda. with the virtual environments previously created with conda tool and vice versa! Prefer virtualenv whenever possible. -## Python virtual environment +## Python Virtual Environment This example shows how to start working with **virtualenv** and Python virtual environment (using -the module system). At first we use an interactive job and create a directory for the virtual -environment: +the module system). -```console -marie@login$ srun -p alpha -N 1 -n 1 -c 7 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash #Job submission in ml nodes with 1 gpu on 1 node. -marie@alpha$ mkdir python-environments #Optional: Create folder. Please use Workspaces! -``` +??? hint + We recommend to use [workspaces](../../data_lifecycle/workspaces) for your virtual environments. -Now we check available Python modules and load the preferred version: +At first we check available Python modules and load the preferred version: ```console -marie@alpha$ module avail Python #Check the available modules with Python -marie@alpha$ module load Python #Load default Python. Example output: Module Python/3.7 4-GCCcore-8.3.0 with 7 dependencies loaded -marie@alpha$ which python #Check which python are you using +marie@compute$ module avail Python #Check the available modules with Python +[...] +marie@compute$ module load Python #Load default Python +Module Python/3.7 2-GCCcore-8.2.0 with 10 dependencies loaded +marie@compute$ which python #Check which python are you using +/sw/installed/Python/3.7.2-GCCcore-8.2.0/bin/python ``` -Then create the virtual environment and activate it: +Then create the virtual environment and activate it. ```console -marie@alpha$ virtualenv --system-site-packages python-environments/envtest #Create virtual environment -marie@alpha$ source python-environments/envtest/bin/activate #Activate virtual environment. Example output: (envtest) bash-4.2$ +marie@compute$ ws_allocate -F scratch python_virtual_environment 1 +Info: creating workspace. +/scratch/ws/1/python_virtual_environment +[...] +marie@compute$ virtualenv --system-site-packages /scratch/ws/1/python_virtual_environment/env #Create virtual environment +[...] +marie@compute$ source /scratch/ws/1/python_virtual_environment/env/bin/activate #Activate virtual environment. Example output: (envtest) bash-4.2$ ``` Now you can work in this isolated environment, without interfering with other tasks running on the @@ -53,26 +58,28 @@ system. Note that the inscription (env) at the beginning of each line represents the virtual environment. You can deactivate the environment as follows: ```console -(envtest) marie@alpha$ deactivate #Leave the virtual environment +(env) marie@compute$ deactivate #Leave the virtual environment ``` -## Conda virtual environment +## Conda Virtual Environment This example shows how to start working with **conda** and virtual environment (with using module system). At first we use an interactive job and create a directory for the conda virtual environment: ```console -marie@login$ srun -p ml -N 1 -n 1 -c 7 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash #Job submission in ml nodes with 1 gpu on 1 node. -marie@alpha$ mkdir conda-virtual-environments #create a folder +marie@compute$ ws_allocate -F scratch conda_virtual_environment 1 +Info: creating workspace. +/scratch/ws/1/conda_virtual_environment +[...] ``` Then we load Anaconda, create an environment in our directory and activate the environment: ```console -marie@alpha$ module load Anaconda3 #load Anaconda module -marie@alpha$ conda create --prefix conda-virtual-environments/conda-testenv python=3.6 #create virtual environment with Python version 3.6 -marie@alpha$ conda activate conda-virtual-environments/conda-testenv #activate conda-testenv virtual environment +marie@compute$ module load Anaconda3 #load Anaconda module +marie@compute$ conda create --prefix /scratch/ws/1/conda_virtual_environment/conda-testenv python=3.6 #create virtual environment with Python version 3.6 +marie@compute$ conda activate /scratch/ws/1/conda_virtual_environment/conda-testenv #activate conda-testenv virtual environment ``` Now you can work in this isolated environment, without interfering with other tasks running on the @@ -80,7 +87,7 @@ system. Note that the inscription (env) at the beginning of each line represents the virtual environment. You can deactivate the conda environment as follows: ```console -(conda-testenv) marie@alpha$ conda deactivate #Leave the virtual environment +(conda-testenv) marie@compute$ conda deactivate #Leave the virtual environment ``` TODO: Link to this page from other DA/ML topics. insert link in alpha centauri diff --git a/doc.zih.tu-dresden.de/docs/software/tensorboard.md b/doc.zih.tu-dresden.de/docs/software/tensorboard.md index f934223a8146b65661e1bdd9070eebe487baeab4..a2da2b7c626839f59d3e4eda8d7a7a167b7f8ec4 100644 --- a/doc.zih.tu-dresden.de/docs/software/tensorboard.md +++ b/doc.zih.tu-dresden.de/docs/software/tensorboard.md @@ -8,6 +8,9 @@ whether a specific TensorFlow module provides TensorBoard, use the following com ```console marie@compute$ module spider TensorFlow/2.3.1 +[...] +Included extensions +[...] ``` If TensorBoard occurs in the `Included extensions` section of the output, TensorBoard is available. @@ -18,16 +21,18 @@ To use TensorBoard, you have to connect via ssh to the ZIH system as usual, sche job and load a TensorFlow module: ```console -marie@login$ srun -p alpha -n 1 -c 1 --pty --mem-per-cpu=8000 bash #Job submission on alpha node -marie@alpha$ module load TensorFlow/2.3.1 -marie@alpha$ tensorboard --logdir /scratch/gpfs/<YourNetID>/myproj/log --bind_all +marie@compute$ module load TensorFlow/2.3.1 +Module TensorFlow/2.3.1-fosscuda-2019b-Python-3.7.4 and 47 dependencies loaded. ``` Then create a workspace for the event data, that should be visualized in TensorBoard. If you already have an event data directory, you can skip that step. ```console -marie@alpha$ ws_allocate -F scratch tensorboard_logdata 1 +marie@compute$ ws_allocate -F scratch tensorboard_logdata 1 +Info: creating workspace. +/scratch/ws/1/marie-tensorboard_logdata +[...] ``` Now you can run your TensorFlow application. Note that you might have to adapt your code to make it @@ -35,7 +40,10 @@ accessible for TensorBoard. Please find further information on the official [Ten Then you can start TensorBoard and pass the directory of the event data: ```console -marie@alpha$ tensorboard --logdir /scratch/ws/1/marie-tensorboard_logdata --bind_all +marie@compute$ tensorboard --logdir /scratch/ws/1/marie-tensorboard_logdata --bind_all +[...] +TensorBoard 2.3.0 at http://taurusi8034.taurus.hrsk.tu-dresden.de:6006/ +[...] ``` TensorBoard will then return a server address on Taurus, e.g. `taurusi8034.taurus.hrsk.tu-dresden.de:6006` diff --git a/doc.zih.tu-dresden.de/docs/software/tensorflow.md b/doc.zih.tu-dresden.de/docs/software/tensorflow.md index 97d5c73a7ae3165d3176aeb8ec9632205708858e..574b9920db52ab175bb24500fd3462a46140ad44 100644 --- a/doc.zih.tu-dresden.de/docs/software/tensorflow.md +++ b/doc.zih.tu-dresden.de/docs/software/tensorflow.md @@ -9,6 +9,7 @@ Please check the software modules list via ```console marie@compute$ module spider TensorFlow +[...] ``` to find out, which TensorFlow modules are available on your partition. @@ -25,42 +26,40 @@ and the TensorFlow library. You can find detailed hardware specification On the **Alpha** partition load the module environment: ```console -marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission on alpha nodes with 1 gpu on 1 node with 8000 Mb per CPU marie@alpha$ module load modenv/scs5 ``` On the **ML** partition load the module environment: ```console -marie@login$ srun -p ml --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission in ml nodes with 1 gpu on 1 node with 8000 Mb per CPU -marie@ml$ module load modenv/ml #example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml +marie@ml$ module load modenv/ml +The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml ``` This example shows how to install and start working with TensorFlow (with using modules system) ```console marie@ml$ module load TensorFlow -Module TensorFlow/1.10.0-PythonAnaconda-3.6 and 1 dependency loaded. +Module TensorFlow/2.3.1-fosscuda-2019b-Python-3.7.4 and 47 dependencies loaded. ``` -Now we check that we can access TensorFlow. One example is tensorflow-test: - -```console -marie@ml$ tensorflow-test -Basic test of tensorflow - A Hello World!!!... -``` - -??? example - Following example shows how to create python virtual environment and import TensorFlow. +Now we can use TensorFlow. In the following example, we create a python virtual environment and +import TensorFlow: +!!! example ```console - marie@ml$ mkdir python-environments #create folder + marie@ml$ ws_allocate -F scratch python_virtual_environment 1 + Info: creating workspace. + /scratch/ws/1/python_virtual_environment + [...] marie@ml$ which python #check which python are you using - /sw/installed/Python/3.7.4-GCCcore-8.3.0/bin/python - marie@ml$ virtualenv --system-site-packages python-environments/env #create virtual environment "env" which inheriting with global site packages + /sw/installed/Python/3.7.2-GCCcore-8.2.0 + marie@ml$ virtualenv --system-site-packages /scratch/ws/1/python_virtual_environment/env #create virtual environment "env" which inheriting with global site packages [...] - marie@ml$ source python-environments/env/bin/activate #activate virtual environment "env". Example output: (env) bash-4.2$ + marie@ml$ source /scratch/ws/1/python_virtual_environment/env/bin/activate #activate virtual environment "env". Example output: (env) bash-4.2$ marie@ml$ python -c "import tensorflow as tf; print(tf.__version__)" + [...] + 2.3.1 ``` ## TensorFlow in JupyterHub @@ -84,14 +83,14 @@ Another option to use TensorFlow are containers. In the HPC domain, the [Singularity](https://singularity.hpcng.org/) container system is a widely used tool. In the following example, we use the tensorflow-test in a Singularity container: -```console -marie@login$ srun -p ml --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash +```console marie@ml$ singularity shell --nv /scratch/singularity/powerai-1.5.3-all-ubuntu16.04-py3.img -marie@ml$ export PATH=/opt/anaconda3/bin:$PATH -marie@ml$ source activate /opt/anaconda3 #activate conda environment -marie@ml$ . /opt/DL/tensorflow/bin/tensorflow-activate -marie@ml$ tensorflow-test +Singularity>$ export PATH=/opt/anaconda3/bin:$PATH +Singularity>$ source activate /opt/anaconda3 #activate conda environment +(base) Singularity>$ . /opt/DL/tensorflow/bin/tensorflow-activate +(base) Singularity>$ tensorflow-test Basic test of tensorflow - A Hello World!!!... +[...] ``` ## TensorFlow with Python or R @@ -128,6 +127,7 @@ top of TensorFlow. Please check the software modules list via ```console marie@compute$ module spider Keras +[...] ``` to find out, which Keras modules are available on your partition. TensorFlow should be automatically