Fix line lenght; minor issues

24ec0536 · Martin Schroschk · 64099f1e · 24ec0536
Commit 24ec0536 authored 3 years ago by Martin Schroschk
--- a/doc.zih.tu-dresden.de/docs/software/pytorch.md
+++ b/doc.zih.tu-dresden.de/docs/software/pytorch.md
 # PyTorch

-[PyTorch](https://pytorch.org/){:target="_blank"} is an open-source machine learning framework.
+[PyTorch](https://pytorch.org/) is an open-source machine learning framework.
 It is an optimized tensor library for deep learning using GPUs and CPUs.
 PyTorch is a machine learning tool developed by Facebooks AI division to process large-scale
 object detection, segmentation, classification, etc.
@@ -13,9 +13,9 @@ Please check the software modules list via
 marie@login$ module spider pytorch
 ```

-to find out, which PyTorch modules are available on your partition.
+to find out, which PyTorch modules are available.

-We recommend using partitions alpha and/or ml when working with machine learning workflows
+We recommend using partitions `alpha` and/or `ml` when working with machine learning workflows
 and the PyTorch library.
 You can find detailed hardware specification in our
 [hardware documentation](../jobs_and_resources/hardware_overview.md).
@@ -25,7 +25,8 @@ You can find detailed hardware specification in our
 On the partition `alpha`, load the module environment:

 ```console
-marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=800 bash #Job submission on alpha nodes with 1 gpu on 1 node with 800 Mb per CPU
+# Job submission on alpha nodes with 1 gpu on 1 node with 800 Mb per CPU
+marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=800 bash
 marie@alpha$ module load modenv/hiera  GCC/10.2.0  CUDA/11.1.1 OpenMPI/4.0.5 PyTorch/1.9.0
 Die folgenden Module wurden in einer anderen Version erneut geladen:
  1) modenv/scs5 => modenv/hiera
@@ -34,6 +35,7 @@ Module GCC/10.2.0, CUDA/11.1.1, OpenMPI/4.0.5, PyTorch/1.9.0 and 54 dependencies
 ```

 ??? hint "Torchvision on partition `alpha`"
+
    On the partition `alpha`, the module torchvision is not yet available within the module
    system. (19.08.2021)
    Torchvision can be made available by using a virtual environment:
@@ -50,7 +52,8 @@ Module GCC/10.2.0, CUDA/11.1.1, OpenMPI/4.0.5, PyTorch/1.9.0 and 54 dependencies
 On the partition `ml`:

 ```console
-marie@login$ srun -p ml --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=800 bash    #Job submission in ml nodes with 1 gpu on 1 node with 800 Mb per CPU
+# Job submission in ml nodes with 1 gpu on 1 node with 800 Mb per CPU
+marie@login$ srun -p ml --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=800 bash
 ```

 After calling
@@ -75,19 +78,24 @@ marie@{ml,alpha}$ python -c "import torch; print(torch.__version__)"
 The following example shows how to create a python virtual environment and import PyTorch.

 ```console
-marie@ml$ mkdir python-environments    #create folder
-marie@ml$ which python    #check which python are you using
+# Create folder
+marie@ml$ mkdir python-environments
+# Check which python are you using
+marie@ml$ which python
 /sw/installed/Python/3.7.4-GCCcore-8.3.0/bin/python
-marie@ml$ virtualenv --system-site-packages python-environments/env    #create virtual environment "env" which inheriting with global site packages
+# Create virtual environment "env" which inheriting with global site packages
+marie@ml$ virtualenv --system-site-packages python-environments/env
 [...]
-marie@ml$ source python-environments/env/bin/activate    #activate virtual environment "env". Example output: (env) bash-4.2$
+# Activate virtual environment "env". Example output: (env) bash-4.2$
+marie@ml$ source python-environments/env/bin/activate
 marie@ml$ python -c "import torch; print(torch.__version__)"
 ```

 ## PyTorch in JupyterHub

-In addition to using interactive and batch jobs, it is possible to work with PyTorch using JupyterHub.
-The production and test environments of JupyterHub contain Python kernels, that come with a PyTorch support.
+In addition to using interactive and batch jobs, it is possible to work with PyTorch using
+JupyterHub.  The production and test environments of JupyterHub contain Python kernels, that come
+with a PyTorch support.

 ![PyTorch module in JupyterHub](misc/Pytorch_jupyter_module.png)
 {: align="center"}
@@ -99,8 +107,8 @@ For details on how to run PyTorch with multiple GPUs and/or multiple nodes, see

 ## Migrate PyTorch-script from CPU to GPU

-It is recommended to use GPUs when using large training data sets. While TensorFlow automatically uses GPUs if they are available, in
-PyTorch you have to move your tensors manually.
+It is recommended to use GPUs when using large training data sets. While TensorFlow automatically
+uses GPUs if they are available, in PyTorch you have to move your tensors manually.

 First, you need to import `torch.cuda`:

@@ -108,7 +116,8 @@ First, you need to import `torch.cuda`:
 import torch.cuda
 ```

-Then you define a `device`-variable, which is set to 'cuda' automatically when a GPU is available with this code:
+Then you define a `device`-variable, which is set to 'cuda' automatically when a GPU is available
+with this code:

 ```python3
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
@@ -121,15 +130,16 @@ x_train = torch.FloatTensor(x_train).to(device)
 y_train = torch.FloatTensor(y_train).to(device)
 ```

-Remember that this does not break backward compatibility when you port the script back to a computer without GPU, because without GPU,
-`device` is set to 'cpu'.
+Remember that this does not break backward compatibility when you port the script back to a computer
+without GPU, because without GPU, `device` is set to 'cpu'.

 ### Caveats

 #### Moving Data Back to the CPU-Memory

-The CPU cannot directly access variables stored on the GPU. If you want to use the variables, e.g. in a `print`-statement or
-when editing with NumPy or anything that is not PyTorch, you have to move them back to the CPU-memory again. This then may look like this:
+The CPU cannot directly access variables stored on the GPU. If you want to use the variables, e.g.,
+in a `print` statement or when editing with NumPy or anything that is not PyTorch, you have to move
+them back to the CPU-memory again. This then may look like this:

 ```python3
 cpu_x_train = x_train.cpu()
@@ -138,8 +148,8 @@ print(cpu_x_train)
 error_train = np.sqrt(metrics.mean_squared_error(y_train[:,1].cpu(), y_prediction_train[:,1]))
 ```

-Remember that, without `.detach()` before the CPU, if you change `cpu_x_train`, `x_train` will also be changed.
-If you want to treat them independently, use
+Remember that, without `.detach()` before the CPU, if you change `cpu_x_train`, `x_train` will also
+be changed.  If you want to treat them independently, use

 ```python3
 cpu_x_train = x_train.detach().cpu()
@@ -149,7 +159,7 @@ Now you can change `cpu_x_train` without `x_train` being affected.

 #### Speed Improvements and Batch Size

-When you have a lot of very small data points, the speed may actually decrease when you try to train them on the GPU.
-This is because moving data from the CPU-memory to the GPU-memory takes time. If this occurs, please try using
-a very large batch size. This way, copying back and forth only takes places a few times and the bottleneck may
-be reduced.
+When you have a lot of very small data points, the speed may actually decrease when you try to train
+them on the GPU.  This is because moving data from the CPU-memory to the GPU-memory takes time. If
+this occurs, please try using a very large batch size. This way, copying back and forth only takes
+places a few times and the bottleneck may be reduced.