Move content to install_jupyter.md and delete deep_learning.md

4afbc1e0 · Taras Lazariv · c2658acf · 4afbc1e0
Commit 4afbc1e0 authored 3 years ago by Taras Lazariv
--- a/doc.zih.tu-dresden.de/docs/archive/deep_learning.md
+++ b/doc.zih.tu-dresden.de/docs/archive/deep_learning.md
-# Deep learning
-
-**Prerequisites**: To work with Deep Learning tools you obviously need [Login](../access/ssh_login.md)
-for the ZIH system system and basic knowledge about Python, Slurm manager.
-
-**Aim** of this page is to introduce users on how to start working with Deep learning software on
-both the ml environment and the scs5 environment of the system.
-
-## Deep Learning Software
-
-### TensorFlow
-
-[TensorFlow](https://www.tensorflow.org/guide/) is a free end-to-end open-source software library
-for dataflow and differentiable programming across a range of tasks.
-
-TensorFlow is available in both [ml environment and scs5 environment](modules.md#module-environments)
-under the module name "TensorFlow". For example:
-
-```Bash
-module load TensorFlow
-```
-
-There are numerous different possibilities on how to work with [TensorFlow](tensorflow.md) on
-ZIH system. On this page, for all examples default, scs5 partition is used. Generally, the easiest way
-is using the [modules system](modules.md)
-and Python virtual environment (test case). However, in some cases, you may need directly installed
-TensorFlow stable or night releases. For this purpose use the
-[EasyBuild](custom_easy_build_environment.md), [Containers](tensorflow_container_on_hpcda.md) and see
-[the example](https://www.tensorflow.org/install/pip). For examples of using TensorFlow for ml partition
-with module system see [TensorFlow page](../software/tensorflow.md).
-
-Note: If you are going used manually installed TensorFlow release we recommend use only stable
-versions.
-
-## Keras
-
-[Keras](https://keras.io/) is a high-level neural network API, written in Python and capable of
-running on top of [TensorFlow](https://github.com/tensorflow/tensorflow) Keras is available in both
-environments [ml environment and scs5 environment](modules.md#module-environments) under the module
-name "Keras".
-
-On this page for all examples default scs5 partition used. There are numerous different
-possibilities on how to work with [TensorFlow](../software/tensorflow.md) and Keras
-on ZIH system. Generally, the easiest way is using the [module system](modules.md) and Python
-virtual environment (test case) to see TensorFlow part above.
-For examples of using Keras for ml partition with the module system see the
-[Keras page](../software/keras.md).
-
-It can either use TensorFlow as its backend. As mentioned in Keras documentation Keras capable of
-running on Theano backend. However, due to the fact that Theano has been abandoned by the
-developers, we don't recommend use Theano anymore. If you wish to use Theano backend you need to
-install it manually. To use the TensorFlow backend, please don't forget to load the corresponding
-TensorFlow module. TensorFlow should be loaded automatically as a dependency.
-
-Test case: Keras with TensorFlow on MNIST data
-
-Go to a directory on ZIH system, get Keras for the examples and go to the examples:
-
-```Bash
-git clone https://github.com/fchollet/keras.git
-cd keras/examples/
-```
-
-If you do not specify Keras backend, then TensorFlow is used as a default
-
-Job-file (schedule job with sbatch, check the status with 'squeue -u \<Username>'):
-
-```Bash
-#!/bin/bash
-#SBATCH --gres=gpu:1                         # 1 - using one gpu, 2 - for using 2 gpus
-#SBATCH --mem=8000
-#SBATCH -p gpu2                              # select the type of nodes (options: haswell, smp, sandy, west, gpu, ml) K80 GPUs on Haswell node
-#SBATCH --time=00:30:00
-#SBATCH -o HLR_&lt;name_of_your_script&gt;.out     # save output under HLR_${SLURMJOBID}.out
-#SBATCH -e HLR_&lt;name_of_your_script&gt;.err     # save error messages under HLR_${SLURMJOBID}.err
-
-module purge                                 # purge if you already have modules loaded
-module load modenv/scs5                      # load scs5 environment
-module load Keras                            # load Keras module
-module load TensorFlow                       # load TensorFlow module
-
-# if you see 'broken pipe error's (might happen in interactive session after the second srun command) uncomment line below
-# module load h5py
-
-python mnist_cnn.py
-```
-
-Keep in mind that you need to put the bash script to the same folder as an executable file or
-specify the path.
-
-Example output:
-
-```Bash
-x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples,
-validate on 10000 samples Epoch 1/12
-
-128/60000 [..............................] - ETA: 12:08 - loss: 2.3064 - acc: 0.0781 256/60000
-[..............................] - ETA: 7:04 - loss: 2.2613 - acc: 0.1523 384/60000
-[..............................] - ETA: 5:22 - loss: 2.2195 - acc: 0.2005
-
-...
-
-60000/60000 [==============================] - 128s 2ms/step - loss: 0.0296 - acc: 0.9905 -
-val_loss: 0.0268 - val_acc: 0.9911 Test loss: 0.02677746053306255 Test accuracy: 0.9911
-```
-
-## Data Sets
-
-There are many different data sets designed for research purposes. If you would like to download some
-of them, first of all, keep in mind that many machine learning libraries have direct access to
-public data sets without downloading it (for example
-[TensorFlow data sets](https://www.tensorflow.org/datasets).
-
-If you still need to download some data sets, first of all, be careful with the size of the data sets
-which you would like to download (some of them have a size of few Terabytes). Don't download what
-you really not need to use! Use login nodes only for downloading small files (hundreds of the
-megabytes). For downloading huge files use [DataMover](../data_transfer/data_mover.md).
-For example, you can use command `dtwget` (it is an analogue of the general wget
-command). This command submits a job to the data transfer machines.  If you need to download or
-allocate massive files (more than one terabyte) please contact the support before.
-
-### The ImageNet Data Set
-
-The [ImageNet](http://www.image-net.org/) project is a large visual database designed for use in
-visual object recognition software research. In order to save space in the filesystem by avoiding
-to have multiple duplicates of this lying around, we have put a copy of the ImageNet database
-(ILSVRC2012 and ILSVR2017) under `/scratch/imagenet` which you can use without having to download it
-again. For the future, the ImageNet data set will be available in `/warm_archive`. ILSVR2017 also
-includes a data set for recognition objects from a video. Please respect the corresponding
-[Terms of Use](https://image-net.org/download.php).
-
-## Jupyter Notebook
+# Jupyter Installation

 Jupyter notebooks are a great way for interactive computing in your web browser. Jupyter allows
 working with data cleaning and transformation, numerical simulation, statistical modelling, data
@@ -148,7 +17,7 @@ analytics tools are available.

 The remote Jupyter server is able to offer more freedom with settings and approaches.

-### Preparation phase (optional)
+## Preparation phase (optional)

 On ZIH system, start an interactive session for setting up the
 environment:
@@ -189,7 +58,7 @@ directory (/home/userxx/anaconda3). Create a new anaconda environment with the n
 conda create --name jnb
 ```

-### Set environmental variables
+## Set environmental variables

 In shell activate previously created python environment (you can
 deactivate it also manually) and install Jupyter packages for this python environment:
@@ -247,7 +116,7 @@ hashed password here>' c.NotebookApp.port = 9999 c.NotebookApp.allow_remote_acce
 Note: `<path-to-cert>` - path to key and certificate files, for example:
 (`/home/\<username>/mycert.pem`)

-### Slurm job file to run the Jupyter server on ZIH system with GPU (1x K80) (also works on K20)
+## Slurm job file to run the Jupyter server on ZIH system with GPU (1x K80) (also works on K20)

 ```Bash
 #!/bin/bash -l #SBATCH --gres=gpu:1 # request GPU #SBATCH --partition=gpu2 # use GPU partition
@@ -310,20 +179,3 @@ To login into the Jupyter notebook site, you have to enter the **token**.
 If you would like to use [JupyterHub](../access/jupyterhub.md) after using a remote manually configured
 Jupyter server (example above) you need to change the name of the configuration file
 (`/home//.jupyter/jupyter_notebook_config.py`) to any other.
-
-### F.A.Q
-
-**Q:** - I have an error to connect to the Jupyter server (e.g. "open failed: administratively
-prohibited: open failed")
-
-**A:** - Check the settings of your Jupyter config file. Is it all necessary lines not commented, the
-right path to cert and key files, right hashed password from .json file? Check is the used local
-port [available](https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers)
-Check local settings e.g. (`/etc/ssh/sshd_config`, `/etc/hosts`).
-
-**Q:** I have an error during the start of the interactive session (e.g.  PMI2_Init failed to
-initialize. Return code: 1)
-
-**A:** Probably you need to provide `--mpi=none` to avoid ompi errors ().
-`srun --mpi=none --reservation \<...> -A \<...> -t 90 --mem=4000 --gres=gpu:1
--partition=gpu2-interactive --pty bash -l`