Skip to content
Snippets Groups Projects
Commit 4afbc1e0 authored by Taras Lazariv's avatar Taras Lazariv
Browse files

Move content to install_jupyter.md and delete deep_learning.md

parent c2658acf
No related branches found
No related tags found
5 merge requests!333Draft: update NGC containers,!322Merge preview into main,!319Merge preview into main,!279Draft: Machine Learning restructuring,!258Data Analytics restructuring
# Deep learning
**Prerequisites**: To work with Deep Learning tools you obviously need [Login](../access/ssh_login.md)
for the ZIH system system and basic knowledge about Python, Slurm manager.
**Aim** of this page is to introduce users on how to start working with Deep learning software on
both the ml environment and the scs5 environment of the system.
## Deep Learning Software
### TensorFlow
[TensorFlow](https://www.tensorflow.org/guide/) is a free end-to-end open-source software library
for dataflow and differentiable programming across a range of tasks.
TensorFlow is available in both [ml environment and scs5 environment](modules.md#module-environments)
under the module name "TensorFlow". For example:
```Bash
module load TensorFlow
```
There are numerous different possibilities on how to work with [TensorFlow](tensorflow.md) on
ZIH system. On this page, for all examples default, scs5 partition is used. Generally, the easiest way
is using the [modules system](modules.md)
and Python virtual environment (test case). However, in some cases, you may need directly installed
TensorFlow stable or night releases. For this purpose use the
[EasyBuild](custom_easy_build_environment.md), [Containers](tensorflow_container_on_hpcda.md) and see
[the example](https://www.tensorflow.org/install/pip). For examples of using TensorFlow for ml partition
with module system see [TensorFlow page](../software/tensorflow.md).
Note: If you are going used manually installed TensorFlow release we recommend use only stable
versions.
## Keras
[Keras](https://keras.io/) is a high-level neural network API, written in Python and capable of
running on top of [TensorFlow](https://github.com/tensorflow/tensorflow) Keras is available in both
environments [ml environment and scs5 environment](modules.md#module-environments) under the module
name "Keras".
On this page for all examples default scs5 partition used. There are numerous different
possibilities on how to work with [TensorFlow](../software/tensorflow.md) and Keras
on ZIH system. Generally, the easiest way is using the [module system](modules.md) and Python
virtual environment (test case) to see TensorFlow part above.
For examples of using Keras for ml partition with the module system see the
[Keras page](../software/keras.md).
It can either use TensorFlow as its backend. As mentioned in Keras documentation Keras capable of
running on Theano backend. However, due to the fact that Theano has been abandoned by the
developers, we don't recommend use Theano anymore. If you wish to use Theano backend you need to
install it manually. To use the TensorFlow backend, please don't forget to load the corresponding
TensorFlow module. TensorFlow should be loaded automatically as a dependency.
Test case: Keras with TensorFlow on MNIST data
Go to a directory on ZIH system, get Keras for the examples and go to the examples:
```Bash
git clone https://github.com/fchollet/keras.git
cd keras/examples/
```
If you do not specify Keras backend, then TensorFlow is used as a default
Job-file (schedule job with sbatch, check the status with 'squeue -u \<Username>'):
```Bash
#!/bin/bash
#SBATCH --gres=gpu:1 # 1 - using one gpu, 2 - for using 2 gpus
#SBATCH --mem=8000
#SBATCH -p gpu2 # select the type of nodes (options: haswell, smp, sandy, west, gpu, ml) K80 GPUs on Haswell node
#SBATCH --time=00:30:00
#SBATCH -o HLR_&lt;name_of_your_script&gt;.out # save output under HLR_${SLURMJOBID}.out
#SBATCH -e HLR_&lt;name_of_your_script&gt;.err # save error messages under HLR_${SLURMJOBID}.err
module purge # purge if you already have modules loaded
module load modenv/scs5 # load scs5 environment
module load Keras # load Keras module
module load TensorFlow # load TensorFlow module
# if you see 'broken pipe error's (might happen in interactive session after the second srun command) uncomment line below
# module load h5py
python mnist_cnn.py
```
Keep in mind that you need to put the bash script to the same folder as an executable file or
specify the path.
Example output:
```Bash
x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples,
validate on 10000 samples Epoch 1/12
128/60000 [..............................] - ETA: 12:08 - loss: 2.3064 - acc: 0.0781 256/60000
[..............................] - ETA: 7:04 - loss: 2.2613 - acc: 0.1523 384/60000
[..............................] - ETA: 5:22 - loss: 2.2195 - acc: 0.2005
...
60000/60000 [==============================] - 128s 2ms/step - loss: 0.0296 - acc: 0.9905 -
val_loss: 0.0268 - val_acc: 0.9911 Test loss: 0.02677746053306255 Test accuracy: 0.9911
```
## Data Sets
There are many different data sets designed for research purposes. If you would like to download some
of them, first of all, keep in mind that many machine learning libraries have direct access to
public data sets without downloading it (for example
[TensorFlow data sets](https://www.tensorflow.org/datasets).
If you still need to download some data sets, first of all, be careful with the size of the data sets
which you would like to download (some of them have a size of few Terabytes). Don't download what
you really not need to use! Use login nodes only for downloading small files (hundreds of the
megabytes). For downloading huge files use [DataMover](../data_transfer/data_mover.md).
For example, you can use command `dtwget` (it is an analogue of the general wget
command). This command submits a job to the data transfer machines. If you need to download or
allocate massive files (more than one terabyte) please contact the support before.
### The ImageNet Data Set
The [ImageNet](http://www.image-net.org/) project is a large visual database designed for use in
visual object recognition software research. In order to save space in the filesystem by avoiding
to have multiple duplicates of this lying around, we have put a copy of the ImageNet database
(ILSVRC2012 and ILSVR2017) under `/scratch/imagenet` which you can use without having to download it
again. For the future, the ImageNet data set will be available in `/warm_archive`. ILSVR2017 also
includes a data set for recognition objects from a video. Please respect the corresponding
[Terms of Use](https://image-net.org/download.php).
## Jupyter Notebook
# Jupyter Installation
Jupyter notebooks are a great way for interactive computing in your web browser. Jupyter allows
working with data cleaning and transformation, numerical simulation, statistical modelling, data
......@@ -148,7 +17,7 @@ analytics tools are available.
The remote Jupyter server is able to offer more freedom with settings and approaches.
### Preparation phase (optional)
## Preparation phase (optional)
On ZIH system, start an interactive session for setting up the
environment:
......@@ -189,7 +58,7 @@ directory (/home/userxx/anaconda3). Create a new anaconda environment with the n
conda create --name jnb
```
### Set environmental variables
## Set environmental variables
In shell activate previously created python environment (you can
deactivate it also manually) and install Jupyter packages for this python environment:
......@@ -247,7 +116,7 @@ hashed password here>' c.NotebookApp.port = 9999 c.NotebookApp.allow_remote_acce
Note: `<path-to-cert>` - path to key and certificate files, for example:
(`/home/\<username>/mycert.pem`)
### Slurm job file to run the Jupyter server on ZIH system with GPU (1x K80) (also works on K20)
## Slurm job file to run the Jupyter server on ZIH system with GPU (1x K80) (also works on K20)
```Bash
#!/bin/bash -l #SBATCH --gres=gpu:1 # request GPU #SBATCH --partition=gpu2 # use GPU partition
......@@ -310,20 +179,3 @@ To login into the Jupyter notebook site, you have to enter the **token**.
If you would like to use [JupyterHub](../access/jupyterhub.md) after using a remote manually configured
Jupyter server (example above) you need to change the name of the configuration file
(`/home//.jupyter/jupyter_notebook_config.py`) to any other.
### F.A.Q
**Q:** - I have an error to connect to the Jupyter server (e.g. "open failed: administratively
prohibited: open failed")
**A:** - Check the settings of your Jupyter config file. Is it all necessary lines not commented, the
right path to cert and key files, right hashed password from .json file? Check is the used local
port [available](https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers)
Check local settings e.g. (`/etc/ssh/sshd_config`, `/etc/hosts`).
**Q:** I have an error during the start of the interactive session (e.g. PMI2_Init failed to
initialize. Return code: 1)
**A:** Probably you need to provide `--mpi=none` to avoid ompi errors ().
`srun --mpi=none --reservation \<...> -A \<...> -t 90 --mem=4000 --gres=gpu:1
--partition=gpu2-interactive --pty bash -l`
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment