From 0b688b5f21dd220ae5e02d2e8ca67e4729b5b563 Mon Sep 17 00:00:00 2001 From: Elias Werner <eliwerner3@googlemail.com> Date: Thu, 26 Aug 2021 13:39:41 +0200 Subject: [PATCH] reviewed machine learning overview --- .../docs/software/machine_learning.md | 64 ++++++------------- 1 file changed, 20 insertions(+), 44 deletions(-) diff --git a/doc.zih.tu-dresden.de/docs/software/machine_learning.md b/doc.zih.tu-dresden.de/docs/software/machine_learning.md index e3ca23e17..3af102cb9 100644 --- a/doc.zih.tu-dresden.de/docs/software/machine_learning.md +++ b/doc.zih.tu-dresden.de/docs/software/machine_learning.md @@ -6,15 +6,15 @@ For machine learning purposes, we recommend to use the **Alpha** and/or **ML** p ## ML partition The compute nodes of the ML partition are built on the base of [Power9](https://www.ibm.com/it-infrastructure/power/power9) -architecture from IBM. The system was created for AI challenges, analytics and working with, -Machine learning, data-intensive workloads, deep-learning frameworks and accelerated databases. +architecture from IBM. The system was created for AI challenges, analytics and working with +data-intensive workloads and accelerated databases. The main feature of the nodes is the ability to work with the [NVIDIA Tesla V100](https://www.nvidia.com/en-gb/data-center/tesla-v100/) GPU with **NV-Link** support that allows a total bandwidth with up to 300 gigabytes per second (GB/sec). Each node on the ml partition has 6x Tesla V-100 GPUs. You can find a detailed specification of the partition [here](../jobs_and_resources/power9.md). -**Note:** The ML partition is based on the PowerPC Architecture, which means that the software built +**Note:** The ML partition is based on the Power9 architecture, which means that the software built for x86_64 will not work on this partition. Also, users need to use the modules which are specially made for the ml partition (from modenv/ml). @@ -29,7 +29,9 @@ marie@ml$ module load modenv/ml #example output: The following have been relo ## Alpha partition -- describe alpha partition +Another partition for machine learning tasks is Alpha. It is mainly dedicated to [ScaDS.AI](https://scads.ai/) +topics. Each node on Alpha has 2x AMD EPYC CPUs, 8x NVIDIA A100-SXM4 GPUs, 1TB RAM and 3.5TB local +space (/tmp) on an NVMe device. You can find more details of the partition [here](../jobs_and_resources/alpha_centauri.md). ### Modules @@ -40,51 +42,22 @@ marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash marie@romeo$ module load modenv/scs5 ``` -## Machine Learning Console and Virtual Environment +## Machine Learning via Console -A virtual environment is a cooperatively isolated runtime environment that allows Python users and -applications to install and update Python distribution packages without interfering with the -behaviour of other Python applications running on the same system. At its core, the main purpose of -Python virtual environments is to create an isolated environment for Python projects. +### Python and Virtual Environments -### Conda virtual environment +Python users should use a virtual environment when conducting machine learning tasks via console. +In case of using [sbatch files](../jobs_and_resources/batch_systems.md) to send your job you usually +don't need a virtual environment. -[Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) -is an open-source package management system and environment management system from the Anaconda. +For more details on machine learning or data science with Python see [here](data_analytics_with_python.md). -```console -marie@login$ srun -p ml -N 1 -n 1 -c 2 --gres=gpu:1 --time=01:00:00 --pty --mem-per-cpu=8000 bash #job submission in ml nodes with allocating: 1 node, 1 task per node, 2 CPUs per task, 1 gpu per node, with 8000 mb on 1 hour. -marie@ml$ module load modenv/ml #example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml -marie@ml$ mkdir python-virtual-environments #create folder for your environments -marie@ml$ cd python-virtual-environments #go to folder -marie@ml$ which python #check which python are you using -marie@ml$ python3 -m venv --system-site-packages env #create virtual environment "env" which inheriting with global site packages -marie@ml$ source env/bin/activate #activate virtual environment "env". Example output: (env) bash-4.2$ -``` - -The inscription (env) at the beginning of each line represents that now you are in the virtual -environment. +### R -### Python virtual environment +R also supports machine learning via console. It does not require a virtual environment due to a +different package managment. -**Virtualenv (venv)** is a standard Python tool to create isolated Python environments. -It has been integrated into the standard library under the [venv module](https://docs.python.org/3/library/venv.html). - -```console -marie@login$ srun -p ml -N 1 -n 1 -c 2 --gres=gpu:1 --time=01:00:00 --pty --mem-per-cpu=8000 bash #job submission in ml nodes with allocating: 1 node, 1 task per node, 2 CPUs per task, 1 gpu per node, with 8000 mb on 1 hour. -marie@ml$ module load modenv/ml #example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml -marie@ml$ mkdir python-virtual-environments #create folder for your environments -marie@ml$ cd python-virtual-environments #go to folder -marie@ml$ which python #check which python are you using -marie@ml$ python3 -m venv --system-site-packages env #create virtual environment "env" which inheriting with global site packages -marie@ml$ source env/bin/activate #activate virtual environment "env". Example output: (env) bash-4.2$ -``` - -The inscription (env) at the beginning of each line represents that now you are in the virtual -environment. - -Note: However in case of using [sbatch files](link) to send your job you usually don't need a -virtual environment. +For more details on machine learning or data science with R see [here](../data_analytics_with_r/#r-console). ## Machine Learning with Jupyter @@ -96,6 +69,9 @@ your Jupyter notebooks on HPC nodes. After accessing JupyterHub, you can start a new session and configure it. For machine learning purposes, select either **Alpha** or **ML** partition and the resources, your application requires. +In your session you can use [Python](../data_analytics_with_python/#jupyter-notebooks), [R](../data_analytics_with_r/#r-in-jupyterhub) +or [R studio](data_analytics_with_rstudio) for your machine learning and data science topics. + ## Machine Learning with Containers Some machine learning tasks require using containers. In the HPC domain, the [Singularity](https://singularity.hpcng.org/) @@ -140,7 +116,7 @@ different values but 4 should be a pretty good starting point. marie@compute$ export NCCL_MIN_NRINGS=4 ``` -### HPC +### HPC related Software The following HPC related software is installed on all nodes: -- GitLab