Skip to content
Snippets Groups Projects
Commit 59dac9b6 authored by Natalie Breidenbach's avatar Natalie Breidenbach
Browse files

Update machine_learning.md

parent e33ceeb6
No related branches found
No related tags found
2 merge requests!938Automated merge from preview to main,!936Update to Five-Cluster-Operation
# Machine Learning
This is an introduction of how to run machine learning applications on ZIH systems.
For machine learning purposes, we recommend to use the partitions `alpha` and/or `ml`.
For machine learning purposes, we recommend to use the cluster `alpha` and/or `power`.
## Partition `ml`
## Cluster: `power`
The compute nodes of the partition `ml` are built on the base of
The compute nodes of the cluster `power` are built on the base of
[Power9 architecture](https://www.ibm.com/it-infrastructure/power/power9) from IBM. The system was created
for AI challenges, analytics and working with data-intensive workloads and accelerated databases.
The main feature of the nodes is the ability to work with the
[NVIDIA Tesla V100](https://www.nvidia.com/en-gb/data-center/tesla-v100/) GPU with **NV-Link**
support that allows a total bandwidth with up to 300 GB/s. Each node on the
partition `ml` has 6x Tesla V-100 GPUs. You can find a detailed specification of the partition in our
cluster `power` has 6x Tesla V-100 GPUs. You can find a detailed specification of the cluster in our
[Power9 documentation](../jobs_and_resources/hardware_overview.md).
!!! note
The partition `ml` is based on the Power9 architecture, which means that the software built
for x86_64 will not work on this partition. Also, users need to use the modules which are
The cluster `power` is based on the Power9 architecture, which means that the software built
for x86_64 will not work on this cluster. Also, users need to use the modules which are
specially build for this architecture (from `modenv/ml`).
### Modules
On the partition `ml` load the module environment:
On the cluster `power` load the module environment:
```console
marie@ml$ module load modenv/ml
marie@power$ module load modenv/ml
The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml
```
### Power AI
There are tools provided by IBM, that work on partition `ml` and are related to AI tasks.
There are tools provided by IBM, that work on cluster `power` and are related to AI tasks.
For more information see our [Power AI documentation](power_ai.md).
## Partition: Alpha
## Cluster: Alpha
Another partition for machine learning tasks is `alpha`. It is mainly dedicated to
[ScaDS.AI](https://scads.ai/) topics. Each node on partition `alpha` has 2x AMD EPYC CPUs, 8x NVIDIA
Another cluster for machine learning tasks is `alpha`. It is mainly dedicated to
[ScaDS.AI](https://scads.ai/) topics. Each node on the cluster `alpha` has 2x AMD EPYC CPUs, 8x NVIDIA
A100-SXM4 GPUs, 1 TB RAM and 3.5 TB local space (`/tmp`) on an NVMe device. You can find more
details of the partition in our [Alpha Centauri](../jobs_and_resources/alpha_centauri.md)
details of the cluster in our [Alpha Centauri](../jobs_and_resources/alpha_centauri.md)
documentation.
### Modules
On the partition `alpha` load the module environment:
On the cluster `alpha` load the module environment:
```console
marie@alpha$ module load modenv/hiera
......@@ -54,7 +54,7 @@ The following have been reloaded with a version change: 1) modenv/ml => modenv/
!!! note
On partition `alpha`, the most recent modules are build in `hiera`. Alternative modules might be
On cluster `alpha`, the most recent modules are build in `hiera`. Alternative modules might be
build in `scs5`.
## Machine Learning via Console
......@@ -83,7 +83,7 @@ create documents containing live code, equations, visualizations, and narrative
TensorFlow or PyTorch) on ZIH systems and to run your Jupyter notebooks on HPC nodes.
After accessing JupyterHub, you can start a new session and configure it. For machine learning
purposes, select either partition `alpha` or `ml` and the resources, your application requires.
purposes, select either cluster `alpha` or `power` and the resources, your application requires.
In your session you can use [Python](data_analytics_with_python.md#jupyter-notebooks),
[R](data_analytics_with_r.md#r-in-jupyterhub) or [RStudio](data_analytics_with_rstudio.md) for your
......@@ -158,7 +158,7 @@ still need to download some datasets use [Datamover](../data_transfer/datamover.
The ImageNet project is a large visual database designed for use in visual object recognition
software research. In order to save space in the filesystem by avoiding to have multiple duplicates
of this lying around, we have put a copy of the ImageNet database (ILSVRC2012 and ILSVR2017) under
`/scratch/imagenet` which you can use without having to download it again. For the future, the
`/data/horse/imagenet` which you can use without having to download it again. For the future, the
ImageNet dataset will be available in
[Warm Archive](../data_lifecycle/workspaces.md#mid-term-storage). ILSVR2017 also includes a dataset
for recognition objects from a video. Please respect the corresponding
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment