-
Taras Lazariv authoredTaras Lazariv authored
TensorFlow
TensorFlow is a free end-to-end open-source software library for dataflow and differentiable programming across many tasks. It is a symbolic math library, used primarily for machine learning applications. It has a comprehensive, flexible ecosystem of tools, libraries and community resources.
Please check the software modules list via
marie@login$ module spider TensorFlow
to find out, which TensorFlow modules are available on your partition.
On ZIH systems, TensorFlow 2 is the default module version. For compatibility hints between TF2 and TF1, see the corresponding section below.
We recommend using Alpha and/or ML partitions when working with machine learning workflows and the TensorFlow library. You can find detailed hardware specification here.
TensorFlow Console
On the Alpha partition load the module environment:
marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission on alpha nodes with 1 gpu on 1 node with 8000 Mb per CPU
marie@romeo$ module load modenv/scs5
On the ML partition load the module environment:
marie@login$ srun -p ml --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission in ml nodes with 1 gpu on 1 node with 8000 Mb per CPU
marie@ml$ module load modenv/ml #example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml
This example shows how to install and start working with TensorFlow (with using modules system)
marie@ml$ module load TensorFlow
Module TensorFlow/1.10.0-PythonAnaconda-3.6 and 1 dependency loaded.
Now we check that we can access TensorFlow. One example is tensorflow-test:
marie@ml$ tensorflow-test
Basic test of tensorflow - A Hello World!!!...
Following example shows how to create python virtual environment and import TensorFlow.
marie@ml$ mkdir python-environments #create folder
marie@ml$ which python #check which python are you using
/sw/installed/Python/3.7.4-GCCcore-8.3.0/bin/python
marie@ml$ virtualenv --system-site-packages python-environments/env #create virtual environment "env" which inheriting with global site packages
[...]
marie@ml$ source python-environments/env/bin/activate #activate virtual environment "env". Example output: (env) bash-4.2$
marie@ml$ python -c "import tensorflow as tf; print(tf.__version__)"
TensorFlow in JupyterHub
In addition to using interactive and batch jobs, it is possible to work with TensorFlow using JupyterHub. The production and test environments of JupyterHub contain Python and R kernels, that both come with a TensorFlow support.
TensorFlow in Containers
Another option to use TensorFlow are containers. In the HPC domain, the Singularity container system is a widely used tool. In the following example, we use the tensorflow-test in a Singularity container:
marie@login$ srun -p ml --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash
marie@ml$ singularity shell --nv /scratch/singularity/powerai-1.5.3-all-ubuntu16.04-py3.img
marie@ml$ export PATH=/opt/anaconda3/bin:$PATH
marie@ml$ source activate /opt/anaconda3 #activate conda environment
marie@ml$ . /opt/DL/tensorflow/bin/tensorflow-activate
marie@ml$ tensorflow-test
Basic test of tensorflow - A Hello World!!!...
TensorFlow with Python or R
For further information on TensorFlow in combination with Python see here, for R see here.
Distributed TensorFlow
For details on how to run TensorFlow with multiple GPUs and/or multiple nodes, see distributed training.
Compatibility TF2 and TF1
TensorFlow 2.0 includes many API changes, such as reordering arguments, renaming symbols, and changing default values for parameters. Thus in some cases, it makes code written for the TensorFlow 1 not compatible with TensorFlow 2. However, If you are using the high-level APIs (tf.keras) there may be little or no action you need to take to make your code fully TensorFlow 2.0 compatible. It is still possible to run 1.X code, unmodified (except for contrib), in TensorFlow 2.0:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior() #instead of "import tensorflow as tf"
To make the transition to TF 2.0 as seamless as possible, the TensorFlow team has created the tf_upgrade_v2 utility to help transition legacy code to the new API.
Additional libraries
The following NVIDIA libraries are available on all nodes:
NCCL | /usr/local/cuda/targets/ppc64le-linux |
cuDNN | /usr/local/cuda/targets/ppc64le-linux |
Note: For optimal NCCL performance it is recommended to set the NCCL_MIN_NRINGS environment variable during execution. You can try different values but 4 should be a pretty good starting point.
export NCCL_MIN_NRINGS=4