From d779ca8fe2b1621710d4d29671ec6856d1bb01c6 Mon Sep 17 00:00:00 2001 From: Elias Werner <eliwerner3@googlemail.com> Date: Thu, 26 Aug 2021 15:32:33 +0200 Subject: [PATCH] add first draft for tensorboard fixes in tensorflow --- .../docs/software/tensorboard.md | 52 +++++++++++++++++++ .../docs/software/tensorflow.md | 4 +- 2 files changed, 54 insertions(+), 2 deletions(-) diff --git a/doc.zih.tu-dresden.de/docs/software/tensorboard.md b/doc.zih.tu-dresden.de/docs/software/tensorboard.md index e69de29bb..7272c7f2b 100644 --- a/doc.zih.tu-dresden.de/docs/software/tensorboard.md +++ b/doc.zih.tu-dresden.de/docs/software/tensorboard.md @@ -0,0 +1,52 @@ +# TensorBoard + +TensorBoard is a visualization toolkit for TensorFlow and offers a variety of functionalities such +as presentation of loss and accuracy, visualization of the model graph or profiling of the +application. +On ZIH systems, TensorBoard is only available as an extension of the TensorFlow module. To check +whether a specific TensorFlow module provides TensorBoard, use the following command: + +```console +marie@compute$ module spider TensorFlow/2.3.1 +``` + +If TensorBoard occurs in the `Included extensions` section of the output, TensorBoard is available. + +## Using TensorBoard + +To use TensorBoard, you have to connect via ssh to taurus as usual, schedule an interactive job and +load a TensorFlow module: + +```console +marie@login$ srun -p alpha -n 1 -c 1 --pty --mem-per-cpu=8000 bash #Job submission on alpha node +marie@alpha$ module load TensorFlow/2.3.1 +marie@alpha$ tensorboard --logdir /scratch/gpfs/<YourNetID>/myproj/log --bind_all +``` + +Then create a workspace for the event data, that should be visualized in TensorBoard. If you already +have an event data directory, you can skip that step. + +```console +marie@alpha$ ws_allocate -F scratch tensorboard_logdata 1 +``` + +Now you can run your TensorFlow application. Note that you might have to adapt your code to make it +accessible for TensorBoard. Please find further information on the official [TensorBoard website](https://www.tensorflow.org/tensorboard/get_started) +Then you can start TensorBoard and pass the directory of the event data: + +```console +marie@alpha$ tensorboard --logdir /scratch/ws/1/marie-tensorboard_logdata --bind_all +``` + +TensorBoard will then return a server address on taurus, e.g. `taurusi8034.taurus.hrsk.tu-dresden.de:6006` + +For accessing TensorBoard now, you have to set up some port forwarding via ssh to your local +machine: + +```console +marie@local$ ssh -N -f -L 6006:taurusi8034.taurus.hrsk.tu-dresden.de:6006 <zih-login>@taurus.hrsk.tu-dresden.de +``` + +Now you can see the tensorboard in your browser at `http://localhost:6006/`. + +Note that you can also use tensorboard in an [sbatch file](../jobs_and_resources/batch_systems.md). diff --git a/doc.zih.tu-dresden.de/docs/software/tensorflow.md b/doc.zih.tu-dresden.de/docs/software/tensorflow.md index c4101a569..f8a815c8b 100644 --- a/doc.zih.tu-dresden.de/docs/software/tensorflow.md +++ b/doc.zih.tu-dresden.de/docs/software/tensorflow.md @@ -8,7 +8,7 @@ resources. Please check the software modules list via ```console -marie@login$ module spider TensorFlow +marie@compute$ module spider TensorFlow ``` to find out, which TensorFlow modules are available on your partition. @@ -26,7 +26,7 @@ On the **Alpha** partition load the module environment: ```console marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission on alpha nodes with 1 gpu on 1 node with 8000 Mb per CPU -marie@romeo$ module load modenv/scs5 +marie@alpha$ module load modenv/scs5 ``` On the **ML** partition load the module environment: -- GitLab