Skip to content
Snippets Groups Projects
Commit be848510 authored by Michael Müller's avatar Michael Müller
Browse files

Merge branch 'TensorFlow.md' into 'preview'

Correct Tensorflow.md

See merge request zih/hpc-compendium/hpc-compendium!85
parents 55f85416 ccdd18eb
No related branches found
No related tags found
3 merge requests!322Merge preview into main,!319Merge preview into main,!85Correct Tensorflow.md
# TensorFlow # TensorFlow
## Introduction ## Introduction
This is an introduction of how to start working with TensorFlow and run This is an introduction of how to start working with TensorFlow and run
machine learning applications on the [HPC-DA](HPCDA) system of Taurus. machine learning applications on the [HPC-DA](../jobs/HPCDA.md) system of Taurus.
\<span style="font-size: 1em;">On the machine learning nodes (machine \<span style="font-size: 1em;">On the machine learning nodes (machine
learning partition), you can use the tools from\</span> [IBM Power learning partition), you can use the tools from [IBM PowerAI](PowerAI.md) or the other
AI](PowerAI)\<span style="font-size: 1em;"> or the other modules. PowerAI is an enterprise software distribution that combines popular open-source
modules.\</span> \<span style="font-size: 1em;">PowerAI is an enterprise deep learning frameworks, efficient AI development tools (Tensorflow, Caffe, etc). For
software distribution that combines popular open-source deep learning this page and examples was used [PowerAI version 1.5.4](https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.5.4/navigation/pai_software_pkgs.html)
frameworks, efficient AI development tools (Tensorflow, Caffe, etc). For
this page and examples was used \</span>\<a [TensorFlow](https://www.tensorflow.org/guide/) is a free end-to-end open-source
href="<https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.5.4/navigation/pai_software_pkgs.html>"
target="\_blank">PowerAI version 1.5.4\</a>\<span style="font-size:
1em;">.\</span>
\<a href="<https://www.tensorflow.org/guide/>"
target="\_blank">TensorFlow\</a> is a free end-to-end open-source
software library for dataflow and differentiable programming across many software library for dataflow and differentiable programming across many
tasks. It is a symbolic math library, used primarily for machine tasks. It is a symbolic math library, used primarily for machine
learning applications. \<span style="font-size: 1em;">It has a learning applications. It has a comprehensive, flexible ecosystem of tools, libraries and
comprehensive, flexible ecosystem of tools, libraries and community community resources. It is available on taurus along with other common machine
resources. It is available on taurus along with other common machine learning packages like Pillow, SciPY, Numpy.
learning packages like Pillow, SciPY, Numpy.\</span>
**Prerequisites:** To work with Tensorflow on Taurus, you obviously need **Prerequisites:** To work with Tensorflow on Taurus, you obviously need
\<a href="Login" target="\_blank">access\</a> for the Taurus system and [access](../access/Login.md) for the Taurus system and basic knowledge about Python, SLURM system.
basic knowledge about Python, SLURM system.
**Aim** of this page is to introduce users on how to start working with **Aim** of this page is to introduce users on how to start working with
TensorFlow on the \<a href="HPCDA" target="\_self">HPC-DA\</a> system - TensorFlow on the \<a href="HPCDA" target="\_self">HPC-DA\</a> system -
part of the TU Dresden HPC system. part of the TU Dresden HPC system.
There are three main options on how to work with Tensorflow on the There are three main options on how to work with Tensorflow on the
HPC-DA: **1.** **Modules,** **2.** **JupyterNotebook, 3. Containers**. HPC-DA: **1.** **Modules,** **2.** **JupyterNotebook, 3. Containers**. The best option is
The main way using the \<a href="RuntimeEnvironment#Module_Environments" to use [module system](../data_management/RuntimeEnvironment.md#Module_Environments) and
target="\_blank">Modules system\</a> and Python virtual environment. Python virtual environment. Please see the next chapters and the [Python page](Python.md) for the
Please see the next chapters and the [Python page](Python) for the
HPC-DA system. HPC-DA system.
The information about the Jupyter notebook and the **JupyterHub** could The information about the Jupyter notebook and the **JupyterHub** could
be found \<a href="JupyterHub" target="\_blank">here\</a>. The use of be found [here](JupyterHub.md). The use of
Containers is described \<a href="TensorFlowContainerOnHPCDA" Containers is described [here](TensorFlowContainerOnHPCDA.md).
target="\_blank">here\</a>.
\<span On Taurus, there exist different module environments, each containing a set
style`"font-size: 1em;">On Taurus, there exist different module environments, each containing a set of software modules. The default is *modenv/scs5* which is already loaded, however for the HPC-DA system using the "ml" partition you need to use *modenv/ml*. To find out which partition are you using use: =ml list.` of software modules. The default is *modenv/scs5* which is already loaded,
You can change the module environment with the command: \</span> however for the HPC-DA system using the "ml" partition you need to use *modenv/ml*.
To find out which partition are you using use: `ml list`.
You can change the module environment with the command:
module load modenv/ml module load modenv/ml
\<span style="font-size: 1em;">The machine learning partition is based The machine learning partition is based on the PowerPC Architecture (ppc64le)
on the PowerPC Architecture (ppc64le) (Power9 processors), which means (Power9 processors), which means that the software built for x86_64 will not
that the software built for x86_64 will not work on this partition, so work on this partition, so you most likely can't use your already locally
you most likely can't use your already locally installed packages on installed packages on Taurus. Also, users need to use the modules which are
Taurus. Also, users need to use the modules which are specially made for specially made for the ml partition (from modenv/ml) and not for the rest
the ml partition (from modenv/ml) and not for the rest of taurus (e.g. of Taurus (e.g. from modenv/scs5).
from modenv/scs5). \</span>
\<span style="font-size: 1em;">Each node on the ml partition has 6x Each node on the ml partition has 6x Tesla V-100 GPUs, with 176 parallel threads
Tesla V-100 GPUs, with 176 parallel threads on 44 cores per node on 44 cores per node (Simultaneous multithreading (SMT) enabled) and 256GB RAM.
(Simultaneous multithreading (SMT) enabled) and 256GB RAM. The The specification could be found [here](../use_of_hardware/Power9.md).
specification could be found [here](Power9).\</span>
%RED%Note:<span class="twiki-macro ENDCOLOR"></span> Users should not %RED%Note:<span class="twiki-macro ENDCOLOR"></span> Users should not
reserve more than 28 threads per each GPU device so that other users on reserve more than 28 threads per each GPU device so that other users on
...@@ -273,5 +261,4 @@ else stay with the default of modenv/scs5. ...@@ -273,5 +261,4 @@ else stay with the default of modenv/scs5.
Q: How to change the module environment and know more about modules? Q: How to change the module environment and know more about modules?
A: A: [Modules](../data_management/RuntimeEnvironment.md#Modules)
[https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/RuntimeEnvironment#Modules](RuntimeEnvironment#Modules)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment