Skip to content
Snippets Groups Projects
Commit 57aeda0f authored by Martin Schroschk's avatar Martin Schroschk
Browse files

DeepLearning: Fix checks

parent de229953
No related branches found
No related tags found
3 merge requests!322Merge preview into main,!319Merge preview into main,!129DeepLearning: Fix checks
# Deep learning
**Prerequisites**: To work with Deep Learning tools you obviously need [Login](../access/Login.md)
for the Taurus system and basic knowledge about Python, SLURM manager.
**Aim** of this page is to introduce users on how to start working with Deep learning software on
both the ml environment and the scs5 environment of the Taurus system.
**Prerequisites**: To work with Deep Learning tools you obviously need
\<a href="Login" target="\_blank">access\</a> for the Taurus system and
basic knowledge about Python, SLURM manager.
## Deep Learning Software
**Aim** \<span style="font-size: 1em;">of this page is to introduce
users on how to start working with Deep learning software on both the
\</span>\<span style="font-size: 1em;">ml environment and the scs5
environment of the Taurus system.\</span>
### TensorFlow
## Deep Learning Software
[TensorFlow](https://www.tensorflow.org/guide/) is a free end-to-end open-source software library
for dataflow and differentiable programming across a range of tasks.
TensorFlow is available in both main partitions
[ml environment and scs5 environment](modules.md#module-environments)
under the module name "TensorFlow". However, for purposes of machine learning and deep learning, we
recommend using Ml partition [HPC-DA](../jobs/HPCDA.md). For example:
```Bash
module load TensorFlow
```
There are numerous different possibilities on how to work with [TensorFlow](TensorFlow.md) on
Taurus. On this page, for all examples default, scs5 partition is used. Generally, the easiest way
is using the [modules system](modules.md)
and Python virtual environment (test case). However, in some cases, you may need directly installed
Tensorflow stable or night releases. For this purpose use the
[EasyBuild](CustomEasyBuildEnvironment.md), [Containers](TensorFlowContainerOnHPCDA.md) and see
[the example](https://www.tensorflow.org/install/pip). For examples of using TensorFlow for ml partition
with module system see [TensorFlow page for HPC-DA](TensorFlow.md).
Please refer to our [List of Modules](SoftwareModulesList) page for a
daily-updated list of the respective software versions that are
currently installed.
## TensorFlow
\<a href="<https://www.tensorflow.org/guide/>"
target="\_blank">TensorFlow\</a> is a free end-to-end open-source
software library for dataflow and differentiable programming across a
range of tasks.
TensorFlow is available in both main partitions [ml environment and scs5
environment](RuntimeEnvironment#Module_Environments) under the module
name "TensorFlow". However, for purposes of machine learning and deep
learning, we recommend using Ml partition (\<a href="HPCDA"
target="\_blank">HPC-DA\</a>). For example:
module load TensorFlow
There are numerous different possibilities on how to work with \<a
href="TensorFlow" target="\_blank">Tensorflow\</a> on Taurus. On this
page, for all examples default, scs5 partition is used. Generally, the
easiest way is using the \<a
href="RuntimeEnvironment#Module_Environments" target="\_blank">Modules
system\</a> and Python virtual environment (test case). However, in some
cases, you may need directly installed Tensorflow stable or night
releases. For this purpose use the
[EasyBuild](CustomEasyBuildEnvironment),
[Containers](TensorFlowContainerOnHPCDA) and see [the
example](https://www.tensorflow.org/install/pip). For examples of using
TensorFlow for ml partition with module system see \<a href="TensorFlow"
target="\_self">TensorFlow page for HPC-DA.\</a>
Note: If you are going used manually installed Tensorflow release we
recommend use only stable versions.
Note: If you are going used manually installed Tensorflow release we recommend use only stable
versions.
## Keras
\<a href="<https://keras.io/>" target="\_blank">Keras\</a>\<span
style="font-size: 1em;"> is a high-level neural network API, written in
Python and capable of running on top of \</span>\<a
href="<https://github.com/tensorflow/tensorflow>"
target="\_top">TensorFlow\</a>\<span style="font-size: 1em;">. Keras is
available in both environments \</span> [ml environment and scs5
environment](RuntimeEnvironment#Module_Environments)\<span
style="font-size: 1em;"> under the module name "Keras".\</span>
[Keras](https://keras.io/) is a high-level neural network API, written in Python and capable of
running on top of [TensorFlow](https://github.com/tensorflow/tensorflow) Keras is available in both
environments [ml environment and scs5 environment](modules.md#module-environments) under the module
name "Keras".
On this page for all examples default scs5 partition used. There are
numerous different possibilities on how to work with \<a
href="TensorFlow" target="\_blank">Tensorflow\</a> and Keras on Taurus.
Generally, the easiest way is using the \<a
href="RuntimeEnvironment#Module_Environments" target="\_blank">Modules
system\</a> and Python virtual environment (test case) to see Tensorflow
part above. \<span style="font-size: 1em;">For examples of using Keras
for ml partition with the module system see the \</span> [Keras page for
HPC-DA](Keras).
On this page for all examples default scs5 partition used. There are numerous different
possibilities on how to work with [TensorFlow](TensorFlow.md) and Keras
on Taurus. Generally, the easiest way is using the [module system](modules.md) and Python
virtual environment (test case) to see Tensorflow part above.
For examples of using Keras for ml partition with the module system see the
[Keras page for HPC-DA](Keras.md).
It can either use TensorFlow as its backend. As mentioned in Keras
documentation Keras capable of running on Theano backend. However, due
to the fact that Theano has been abandoned by the developers, we don't
recommend use Theano anymore. If you wish to use Theano backend you need
to install it manually. To use the TensorFlow backend, please don't
forget to load the corresponding TensorFlow module. TensorFlow should be
loaded automatically as a dependency.
It can either use TensorFlow as its backend. As mentioned in Keras documentation Keras capable of
running on Theano backend. However, due to the fact that Theano has been abandoned by the
developers, we don't recommend use Theano anymore. If you wish to use Theano backend you need to
install it manually. To use the TensorFlow backend, please don't forget to load the corresponding
TensorFlow module. TensorFlow should be loaded automatically as a dependency.
\<span style="color: #222222; font-size: 1.385em;">Test case: Keras with
TensorFlow on MNIST data\</span>
Test case: Keras with TensorFlow on MNIST data
Go to a directory on Taurus, get Keras for the examples and go to the
examples:
Go to a directory on Taurus, get Keras for the examples and go to the examples:
git clone <a href='https://github.com/fchollet/keras.git'>https://github.com/fchollet/keras.git</a><br />cd keras/examples/
```Bash
git clone https://github.com/fchollet/keras.git'>https://github.com/fchollet/keras.git
cd keras/examples/
```
If you do not specify Keras backend, then TensorFlow is used as a
default
If you do not specify Keras backend, then TensorFlow is used as a default
Job-file (schedule job with sbatch, check the status with 'squeue -u
\<Username>'):
Job-file (schedule job with sbatch, check the status with 'squeue -u \<Username>'):
#!/bin/bash<br />#SBATCH --gres=gpu:1 # 1 - using one gpu, 2 - for using 2 gpus<br />#SBATCH --mem=8000<br />#SBATCH -p gpu2 # select the type of nodes (opitions: haswell, <code>smp</code>, <code>sandy</code>, <code>west</code>, <code>gpu, ml) </code><b>K80</b> GPUs on Haswell node<br />#SBATCH --time=00:30:00<br />#SBATCH -o HLR_&lt;name_of_your_script&gt;.out # save output under HLR_${SLURMJOBID}.out<br />#SBATCH -e HLR_&lt;name_of_your_script&gt;.err # save error messages under HLR_${SLURMJOBID}.err<br />
module purge # purge if you already have modules loaded<br />module load modenv/scs5 # load scs5 environment<br />module load Keras # load Keras module<br />module load TensorFlow # load TensorFlow module<br />
```Bash
#!/bin/bash
#SBATCH --gres=gpu:1 # 1 - using one gpu, 2 - for using 2 gpus
#SBATCH --mem=8000
#SBATCH -p gpu2 # select the type of nodes (opitions: haswell, smp, sandy, west,gpu, ml) K80 GPUs on Haswell node
#SBATCH --time=00:30:00
#SBATCH -o HLR_&lt;name_of_your_script&gt;.out # save output under HLR_${SLURMJOBID}.out
#SBATCH -e HLR_&lt;name_of_your_script&gt;.err # save error messages under HLR_${SLURMJOBID}.err
# if you see 'broken pipe error's (might happen in interactive session after the second srun command) uncomment line below<br /># module load h5py<br /><br />python mnist_cnn.py
module purge # purge if you already have modules loaded
module load modenv/scs5 # load scs5 environment
module load Keras # load Keras module
module load TensorFlow # load TensorFlow module
Keep in mind that you need to put the bash script to the same folder as
an executable file or specify the path.
# if you see 'broken pipe error's (might happen in interactive session after the second srun
command) uncomment line below
# module load h5py
python mnist_cnn.py
```
Keep in mind that you need to put the bash script to the same folder as an executable file or
specify the path.
Example output:
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
```Bash
x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples,
validate on 10000 samples Epoch 1/12
128/60000 [..............................] - ETA: 12:08 - loss: 2.3064 - acc: 0.0781
256/60000 [..............................] - ETA: 7:04 - loss: 2.2613 - acc: 0.1523
384/60000 [..............................] - ETA: 5:22 - loss: 2.2195 - acc: 0.2005
128/60000 [..............................] - ETA: 12:08 - loss: 2.3064 - acc: 0.0781 256/60000
[..............................] - ETA: 7:04 - loss: 2.2613 - acc: 0.1523 384/60000
[..............................] - ETA: 5:22 - loss: 2.2195 - acc: 0.2005
...
...
60000/60000 [==============================] - 128s 2ms/step - loss: 0.0296 - acc: 0.9905 - val_loss: 0.0268 - val_acc: 0.9911
Test loss: 0.02677746053306255
Test accuracy: 0.9911
60000/60000 [==============================] - 128s 2ms/step - loss: 0.0296 - acc: 0.9905 -
val_loss: 0.0268 - val_acc: 0.9911 Test loss: 0.02677746053306255 Test accuracy: 0.9911
```
## Datasets
There are many different datasets designed for research purposes. If you
would like to download some of them, first of all, keep in mind that
many machine learning libraries have direct access to public datasets
without downloading it (for example [TensorFlow
Datasets](https://www.tensorflow.org/datasets)\<span style="font-size:
1em; color: #444444;">). \</span>
\<span style="font-size: 1em; color: #444444;">If you still need to
download some datasets, first of all, be careful with the size of the
datasets which you would like to download (some of them have a size of
few Terabytes). Don't download what you really not need to use! Use
login nodes only for downloading small files (hundreds of the
megabytes). For downloading huge files use \</span>\<a href="DataMover"
target="\_blank">Datamover\</a>\<span style="font-size: 1em; color:
#444444;">. For example, you can use command **\<span
class="WYSIWYG_TT">dtwget \</span>**(it is an analogue of the general
wget command). This command submits a job to the data transfer machines.
If you need to download or allocate massive files (more than one
terabyte) please contact the support before.\</span>
There are many different datasets designed for research purposes. If you would like to download some
of them, first of all, keep in mind that many machine learning libraries have direct access to
public datasets without downloading it (for example
[TensorFlow Datasets](https://www.tensorflow.org/datasets).
If you still need to download some datasets, first of all, be careful with the size of the datasets
which you would like to download (some of them have a size of few Terabytes). Don't download what
you really not need to use! Use login nodes only for downloading small files (hundreds of the
megabytes). For downloading huge files use [DataMover](../data_moving/DataMover.md).
For example, you can use command `dtwget` (it is an analogue of the general wget
command). This command submits a job to the data transfer machines. If you need to download or
allocate massive files (more than one terabyte) please contact the support before.
### The ImageNet dataset
\<span style="font-size: 1em;">The \</span> [ **ImageNet**
](http://www.image-net.org/)\<span style="font-size: 1em;">project is a
large visual database designed for use in visual object recognition
software research. In order to save space in the file system by avoiding
to have multiple duplicates of this lying around, we have put a copy of
the ImageNet database (ILSVRC2012 and ILSVR2017) under\</span>**
/scratch/imagenet**\<span style="font-size: 1em;"> which you can use
without having to download it again. For the future, the Imagenet
dataset will be available in **/warm_archive.**ILSVR2017 also includes a
dataset for recognition objects from a video. Please respect the
corresponding \</span> [Terms of
Use](http://image-net.org/download-faq)\<span style="font-size:
1em;">.\</span>
## Jupyter notebook
Jupyter notebooks are a great way for interactive computing in your web
browser. Jupyter allows working with data cleaning and transformation,
numerical simulation, statistical modelling, data visualization and of
course with machine learning.
There are two general options on how to work Jupyter notebooks using
HPC: remote jupyter server and jupyterhub.
These sections show how to run and set up a remote jupyter server within
a sbatch GPU job and which modules and packages you need for that.
\<span style="font-size: 1em; color: #444444;">%RED%Note:<span
class="twiki-macro ENDCOLOR"></span> On Taurus, there is a \</span>\<a
href="JupyterHub" target="\_self">jupyterhub\</a>\<span
style="font-size: 1em; color: #444444;">, where you do not need the
manual server setup described below and can simply run your Jupyter
notebook on HPC nodes. Keep in mind that with Jupyterhub you can't work
with some special instruments. However general data analytics tools are
available.\</span>
The remote Jupyter server is able to offer more freedom with settings
and approaches.
The [ImageNet](http://www.image-net.org/) project is a large visual database designed for use in
visual object recognition software research. In order to save space in the file system by avoiding
to have multiple duplicates of this lying around, we have put a copy of the ImageNet database
(ILSVRC2012 and ILSVR2017) under `/scratch/imagenet` which you can use without having to download it
again. For the future, the Imagenet dataset will be available in `/warm_archive`. ILSVR2017 also
includes a dataset for recognition objects from a video. Please respect the corresponding
[Terms of Use](https://image-net.org/download.php).
## Jupyter Notebook
Jupyter notebooks are a great way for interactive computing in your web browser. Jupyter allows
working with data cleaning and transformation, numerical simulation, statistical modelling, data
visualization and of course with machine learning.
There are two general options on how to work Jupyter notebooks using HPC: remote jupyter server and
jupyterhub.
These sections show how to run and set up a remote jupyter server within a sbatch GPU job and which
modules and packages you need for that.
**Note:** On Taurus, there is a [JupyterHub](JupyterHub.md), where you do not need the manual server
setup described below and can simply run your Jupyter notebook on HPC nodes. Keep in mind that with
Jupyterhub you can't work with some special instruments. However general data analytics tools are
available.
The remote Jupyter server is able to offer more freedom with settings and approaches.
Note: Jupyterhub is could be under construction
### Preparation phase (optional)
\<span style="font-size: 1em;">On Taurus, start an interactive session
for setting up the environment:\</span>
On Taurus, start an interactive session for setting up the
environment:
srun --pty -n 1 --cpus-per-task=2 --time=2:00:00 --mem-per-cpu=2500 --x11=first bash -l -i
```Bash
srun --pty -n 1 --cpus-per-task=2 --time=2:00:00 --mem-per-cpu=2500 --x11=first bash -l -i
```
Create a new subdirectory in your home, e.g. Jupyter
mkdir Jupyter
cd Jupyter
```Bash
mkdir Jupyter cd Jupyter
```
There are two ways how to run Anaconda. The easiest way is to load the
Anaconda module. The second one is to download Anaconda in your home
directory.
There are two ways how to run Anaconda. The easiest way is to load the Anaconda module. The second
one is to download Anaconda in your home directory.
1\. Load Anaconda module (recommended):
1. Load Anaconda module (recommended):
module load modenv/scs5
module load Anaconda3
```Bash
module load modenv/scs5 module load Anaconda3
```
2\. Download latest Anaconda release (see example below) and change the
rights to make it an executable script and run the installation script:
1. Download latest Anaconda release (see example below) and change the rights to make it an
executable script and run the installation script:
wget https://repo.continuum.io/archive/Anaconda3-2019.03-Linux-x86_64.sh
chmod 744 Anaconda3-2019.03-Linux-x86_64.sh
./Anaconda3-2019.03-Linux-x86_64.sh
```Bash
wget https://repo.continuum.io/archive/Anaconda3-2019.03-Linux-x86_64.sh chmod 744
Anaconda3-2019.03-Linux-x86_64.sh ./Anaconda3-2019.03-Linux-x86_64.sh
(during installation you have to confirm the licence agreement)
(during installation you have to confirm the licence agreement)
```
\<span style="font-size: 1em;">Next step will install the anaconda
environment into the home directory (/home/userxx/anaconda3). Create a
new anaconda environment with the name "jnb".\</span>
Next step will install the anaconda environment into the home
directory (/home/userxx/anaconda3). Create a new anaconda environment with the name "jnb".
conda create --name jnb
```Bash
conda create --name jnb
```
### Set environmental variables on Taurus
\<span style="font-size: 1em;">In shell activate previously created
python environment (you can deactivate it also manually) and Install
jupyter packages for this python environment:\</span>
In shell activate previously created python environment (you can
deactivate it also manually) and Install jupyter packages for this python environment:
source activate jnb
conda install jupyter
```Bash
source activate jnb conda install jupyter
```
\<span style="font-size: 1em;">If you need to adjust the config, you
should create the template. Generate config files for jupyter notebook
server:\</span>
If you need to adjust the config, you should create the template. Generate config files for jupyter
notebook server:
jupyter notebook --generate-config
```Bash
jupyter notebook --generate-config
```
Find a path of the configuration file, usually in the home under
.jupyter directory, e.g.\<br
/>/home//.jupyter/jupyter_notebook_config.py
Find a path of the configuration file, usually in the home under `.jupyter` directory, e.g.
`/home//.jupyter/jupyter_notebook_config.py`
\<br />Set a password (choose easy one for testing), which is needed
later on to log into the server in browser session:
Set a password (choose easy one for testing), which is needed later on to log into the server
in browser session:
jupyter notebook password
Enter password:
Verify password:
```Bash
jupyter notebook password Enter password: Verify password:
```
you will get a message like that:
[NotebookPasswordApp] Wrote *hashed password* to /home/<zih_user>/.jupyter/jupyter_notebook_config.json
```Bash
[NotebookPasswordApp] Wrote *hashed password* to
/home/<zih_user>/.jupyter/jupyter_notebook_config.json
```
I order to create an SSL certificate for https connections, you can
create a self-signed certificate:
I order to create an SSL certificate for https connections, you can create a self-signed
certificate:
openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mykey.key -out mycert.pem
```Bash
openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mykey.key -out mycert.pem
```
fill in the form with decent values
fill in the form with decent values.
Possible entries for your jupyter config
(\_.jupyter/jupyter_notebook*config.py*). Uncomment below lines:
Possible entries for your jupyter config (`.jupyter/jupyter_notebook*config.py*`). Uncomment below
lines:
c.NotebookApp.certfile = u'<path-to-cert>/mycert.pem'
c.NotebookApp.keyfile = u'<path-to-cert>/mykey.key'
```Bash
c.NotebookApp.certfile = u'<path-to-cert>/mycert.pem' c.NotebookApp.keyfile =
u'<path-to-cert>/mykey.key'
# set ip to '*' otherwise server is bound to localhost only
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
# set ip to '*' otherwise server is bound to localhost only c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
# copy hashed password from the jupyter_notebook_config.json
c.NotebookApp.password = u'<your hashed password here>'
c.NotebookApp.port = 9999
c.NotebookApp.allow_remote_access = True
# copy hashed password from the jupyter_notebook_config.json c.NotebookApp.password = u'<your
hashed password here>' c.NotebookApp.port = 9999 c.NotebookApp.allow_remote_access = True
```
Note: \<path-to-cert> - path to key and certificate files, for example:
('/home/\<username>/mycert.pem')
Note: `<path-to-cert>` - path to key and certificate files, for example:
(`/home/\<username>/mycert.pem`)
### SLURM job file to run the jupyter server on Taurus with GPU (1x K80) (also works on K20)
#!/bin/bash -l
#SBATCH --gres=gpu:1 # request GPU
#SBATCH --partition=gpu2 # use GPU partition
#SBATCH --output=notebok_output.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=02:30:00
#SBATCH --mem=4000M
#SBATCH -J "jupyter-notebook" # job-name
#SBATCH -A <name_of_your_project>
```Bash
#!/bin/bash -l #SBATCH --gres=gpu:1 # request GPU #SBATCH --partition=gpu2 # use GPU partition
SBATCH --output=notebok_output.txt #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --time=02:30:00
SBATCH --mem=4000M #SBATCH -J "jupyter-notebook" # job-name #SBATCH -A <name_of_your_project>
unset XDG_RUNTIME_DIR # might be required when interactive instead of sbatch to avoid 'Permission denied error'
srun jupyter notebook
unset XDG_RUNTIME_DIR # might be required when interactive instead of sbatch to avoid
'Permission denied error' srun jupyter notebook
```
Start the script above (e.g. with the name jnotebook) with sbatch
command:
Start the script above (e.g. with the name jnotebook) with sbatch command:
sbatch jnotebook.slurm
```Bash
sbatch jnotebook.slurm
```
If you have a question about sbatch script see the article about \<a
href="Slurm" target="\_blank">SLURM\</a>
If you have a question about sbatch script see the article about [Slurm](../jobs/Slurm.md).
Check by the command: '\<span>tail notebook_output.txt'\</span> the
status and the **token** of the server. It should look like this:
Check by the command: `tail notebook_output.txt` the status and the **token** of the server. It
should look like this:
https://(taurusi2092.taurus.hrsk.tu-dresden.de or 127.0.0.1):9999/
```Bash
https://(taurusi2092.taurus.hrsk.tu-dresden.de or 127.0.0.1):9999/
```
\<span style="font-size: 1em;">You can see the \</span>**server node's
hostname**\<span style="font-size: 1em;">by the command:
'\</span>\<span>squeue -u \<username>'\</span>\<span style="font-size:
1em;">.\</span>
You can see the **server node's hostname** by the command: `squeue -u <username>`.
\<span style="color: #222222; font-size: 1.231em;">Remote connect to the
server\</span>
Remote connect to the server
There are two options on how to connect to the server:
\<span style="font-size: 1em;">1. You can create an ssh tunnel if you
have problems with the solution above.\</span> \<span style="font-size:
1em;">Open the other terminal and configure ssh tunnel: \</span>\<span
style="font-size: 1em;">(look up connection values in the output file of
slurm job, e.g.)\</span> (recommended):
node=taurusi2092 #see the name of the node with squeue -u <your_login>
localport=8887 #local port on your computer
remoteport=9999 #pay attention on the value. It should be the same value as value in the notebook_output.txt
ssh -fNL ${localport}:${node}:${remoteport} <zih_user>@taurus.hrsk.tu-dresden.de #configure of the ssh tunnel for connection to your remote server
pgrep -f "ssh -fNL ${localport}" #verify that tunnel is alive
\<span style="font-size: 1em;">2. On your client (local machine) you now
can connect to the server. You need to know the\</span>** node's
hostname**\<span style="font-size: 1em;">, the \</span> **port** \<span
style="font-size: 1em;"> of the server and the \</span> **token** \<span
style="font-size: 1em;"> to login (see paragraph above).\</span>
You can connect directly if you know the IP address (just ping the
node's hostname while logged on Taurus).
#comand on remote terminal
taurusi2092$> host taurusi2092
# copy IP address from output
# paste IP to your browser or call on local terminal e.g.
local$> firefox https://<IP>:<PORT> # https important to use SSL cert
To login into the jupyter notebook site, you have to enter the
**token**. (<https://localhost:8887>). Now you can create and execute
notebooks on Taurus with GPU support.
%RED%Note:<span class="twiki-macro ENDCOLOR"></span> If you would like
to use \<a href="JupyterHub" target="\_self">jupyterhub\</a> after using
a remote manually configurated jupyter server (example above) you need
to change the name of the configuration file
(/home//.jupyter/jupyter_notebook_config.py) to any other.
1. You can create an ssh tunnel if you have problems with the
solution above. Open the other terminal and configure ssh
tunnel: (look up connection values in the output file of slurm job, e.g.) (recommended):
```Bash
node=taurusi2092 #see the name of the node with squeue -u <your_login>
localport=8887 #local port on your computer remoteport=9999
#pay attention on the value. It should be the same value as value in the notebook_output.txt ssh
-fNL ${localport}:${node}:${remoteport} <zih_user>@taurus.hrsk.tu-dresden.de #configure
of the ssh tunnel for connection to your remote server pgrep -f "ssh -fNL ${localport}"
#verify that tunnel is alive
```
2. On your client (local machine) you now can connect to the server. You need to know the **node's
hostname**, the **port** of the server and the **token** to login (see paragraph above).
You can connect directly if you know the IP address (just ping the node's hostname while logged on
Taurus).
```Bash
#comand on remote terminal taurusi2092$> host taurusi2092 # copy IP address from output # paste
IP to your browser or call on local terminal e.g. local$> firefox https://<IP>:<PORT> # https
important to use SSL cert
```
To login into the jupyter notebook site, you have to enter the **token**.
(`https://localhost:8887`). Now you can create and execute notebooks on Taurus with GPU support.
If you would like to use [JupyterHub](JupyterHub.md) after using a remote manually configurated
jupyter server (example above) you need to change the name of the configuration file
(`/home//.jupyter/jupyter_notebook_config.py`) to any other.
### F.A.Q
Q: - I have an error to connect to the Jupyter server (e.g. "open
failed: administratively prohibited: open failed")
**Q:** - I have an error to connect to the Jupyter server (e.g. "open failed: administratively
prohibited: open failed")
A: - Check the settings of your \<span style="font-size: 1em;">jupyter
config file. Is it all necessary lines uncommented, the right path to
cert and key files, right hashed password from .json file? Check is the
used local port \<a
href="<https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers>"
target="\_blank">available\</a>? Check local settings e.g.
(/etc/ssh/sshd_config, /etc/hosts)\</span>
**A:** - Check the settings of your jupyter config file. Is it all necessary lines uncommented, the
right path to cert and key files, right hashed password from .json file? Check is the used local
port [available](https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers)
Check local settings e.g. (`/etc/ssh/sshd_config`, `/etc/hosts`).
Q: I have an error during the start of the interactive session (e.g.
PMI2_Init failed to initialize. Return code: 1)
**Q:** I have an error during the start of the interactive session (e.g. PMI2_Init failed to
initialize. Return code: 1)
A: Probably you need to provide --mpi=none to avoid ompi errors ().
\<span style="font-size: 1em;">srun --mpi=none --reservation \<...> -A
\<...> -t 90 --mem=4000 --gres=gpu:1 --partition=gpu2-interactive --pty
bash -l\</span>
**A:** Probably you need to provide `--mpi=none` to avoid ompi errors ().
`srun --mpi=none --reservation \<...> -A \<...> -t 90 --mem=4000 --gres=gpu:1
--partition=gpu2-interactive --pty bash -l`
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment