Skip to content
Snippets Groups Projects
Commit d779ca8f authored by Elias Werner's avatar Elias Werner
Browse files

add first draft for tensorboard

fixes in tensorflow
parent 1fb4fc87
No related branches found
No related tags found
5 merge requests!333Draft: update NGC containers,!322Merge preview into main,!319Merge preview into main,!279Draft: Machine Learning restructuring,!258Data Analytics restructuring
# TensorBoard
TensorBoard is a visualization toolkit for TensorFlow and offers a variety of functionalities such
as presentation of loss and accuracy, visualization of the model graph or profiling of the
application.
On ZIH systems, TensorBoard is only available as an extension of the TensorFlow module. To check
whether a specific TensorFlow module provides TensorBoard, use the following command:
```console
marie@compute$ module spider TensorFlow/2.3.1
```
If TensorBoard occurs in the `Included extensions` section of the output, TensorBoard is available.
## Using TensorBoard
To use TensorBoard, you have to connect via ssh to taurus as usual, schedule an interactive job and
load a TensorFlow module:
```console
marie@login$ srun -p alpha -n 1 -c 1 --pty --mem-per-cpu=8000 bash #Job submission on alpha node
marie@alpha$ module load TensorFlow/2.3.1
marie@alpha$ tensorboard --logdir /scratch/gpfs/<YourNetID>/myproj/log --bind_all
```
Then create a workspace for the event data, that should be visualized in TensorBoard. If you already
have an event data directory, you can skip that step.
```console
marie@alpha$ ws_allocate -F scratch tensorboard_logdata 1
```
Now you can run your TensorFlow application. Note that you might have to adapt your code to make it
accessible for TensorBoard. Please find further information on the official [TensorBoard website](https://www.tensorflow.org/tensorboard/get_started)
Then you can start TensorBoard and pass the directory of the event data:
```console
marie@alpha$ tensorboard --logdir /scratch/ws/1/marie-tensorboard_logdata --bind_all
```
TensorBoard will then return a server address on taurus, e.g. `taurusi8034.taurus.hrsk.tu-dresden.de:6006`
For accessing TensorBoard now, you have to set up some port forwarding via ssh to your local
machine:
```console
marie@local$ ssh -N -f -L 6006:taurusi8034.taurus.hrsk.tu-dresden.de:6006 <zih-login>@taurus.hrsk.tu-dresden.de
```
Now you can see the tensorboard in your browser at `http://localhost:6006/`.
Note that you can also use tensorboard in an [sbatch file](../jobs_and_resources/batch_systems.md).
...@@ -8,7 +8,7 @@ resources. ...@@ -8,7 +8,7 @@ resources.
Please check the software modules list via Please check the software modules list via
```console ```console
marie@login$ module spider TensorFlow marie@compute$ module spider TensorFlow
``` ```
to find out, which TensorFlow modules are available on your partition. to find out, which TensorFlow modules are available on your partition.
...@@ -26,7 +26,7 @@ On the **Alpha** partition load the module environment: ...@@ -26,7 +26,7 @@ On the **Alpha** partition load the module environment:
```console ```console
marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission on alpha nodes with 1 gpu on 1 node with 8000 Mb per CPU marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission on alpha nodes with 1 gpu on 1 node with 8000 Mb per CPU
marie@romeo$ module load modenv/scs5 marie@alpha$ module load modenv/scs5
``` ```
On the **ML** partition load the module environment: On the **ML** partition load the module environment:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment