Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
hpc-compendium
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ZIH
hpcsupport
hpc-compendium
Commits
59dac9b6
Commit
59dac9b6
authored
1 year ago
by
Natalie Breidenbach
Browse files
Options
Downloads
Patches
Plain Diff
Update machine_learning.md
parent
e33ceeb6
No related branches found
Branches containing commit
No related tags found
2 merge requests
!938
Automated merge from preview to main
,
!936
Update to Five-Cluster-Operation
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc.zih.tu-dresden.de/docs/software/machine_learning.md
+17
-17
17 additions, 17 deletions
doc.zih.tu-dresden.de/docs/software/machine_learning.md
with
17 additions
and
17 deletions
doc.zih.tu-dresden.de/docs/software/machine_learning.md
+
17
−
17
View file @
59dac9b6
# Machine Learning
This is an introduction of how to run machine learning applications on ZIH systems.
For machine learning purposes, we recommend to use the
partitions
`alpha`
and/or
`
ml
`
.
For machine learning purposes, we recommend to use the
cluster
`alpha`
and/or
`
power
`
.
##
Partition `ml
`
##
Cluster: `power
`
The compute nodes of the
partition
`ml
`
are built on the base of
The compute nodes of the
cluster
`power
`
are built on the base of
[
Power9 architecture
](
https://www.ibm.com/it-infrastructure/power/power9
)
from IBM. The system was created
for AI challenges, analytics and working with data-intensive workloads and accelerated databases.
The main feature of the nodes is the ability to work with the
[
NVIDIA Tesla V100
](
https://www.nvidia.com/en-gb/data-center/tesla-v100/
)
GPU with
**NV-Link**
support that allows a total bandwidth with up to 300 GB/s. Each node on the
partition
`ml
`
has 6x Tesla V-100 GPUs. You can find a detailed specification of the
partition
in our
cluster
`power
`
has 6x Tesla V-100 GPUs. You can find a detailed specification of the
cluster
in our
[
Power9 documentation
](
../jobs_and_resources/hardware_overview.md
)
.
!!! note
The
partition `ml
` is based on the Power9 architecture, which means that the software built
for x86_64 will not work on this
partition
. Also, users need to use the modules which are
The
cluster `power
` is based on the Power9 architecture, which means that the software built
for x86_64 will not work on this
cluster
. Also, users need to use the modules which are
specially build for this architecture (from `modenv/ml`).
### Modules
On the
partition
`ml
`
load the module environment:
On the
cluster
`power
`
load the module environment:
```
console
marie@
ml
$
module load modenv/ml
marie@
power
$
module load modenv/ml
The following have been reloaded with a version change: 1) modenv/scs5 =>
modenv/ml
```
### Power AI
There are tools provided by IBM, that work on
partition
`ml
`
and are related to AI tasks.
There are tools provided by IBM, that work on
cluster
`power
`
and are related to AI tasks.
For more information see our
[
Power AI documentation
](
power_ai.md
)
.
##
Partition
: Alpha
##
Cluster
: Alpha
Another
partition
for machine learning tasks is
`alpha`
. It is mainly dedicated to
[
ScaDS.AI
](
https://scads.ai/
)
topics. Each node on
partition
`alpha`
has 2x AMD EPYC CPUs, 8x NVIDIA
Another
cluster
for machine learning tasks is
`alpha`
. It is mainly dedicated to
[
ScaDS.AI
](
https://scads.ai/
)
topics. Each node on
the cluster
`alpha`
has 2x AMD EPYC CPUs, 8x NVIDIA
A100-SXM4 GPUs, 1 TB RAM and 3.5 TB local space (
`/tmp`
) on an NVMe device. You can find more
details of the
partition
in our
[
Alpha Centauri
](
../jobs_and_resources/alpha_centauri.md
)
details of the
cluster
in our
[
Alpha Centauri
](
../jobs_and_resources/alpha_centauri.md
)
documentation.
### Modules
On the
partition
`alpha`
load the module environment:
On the
cluster
`alpha`
load the module environment:
```
console
marie@alpha$
module load modenv/hiera
...
...
@@ -54,7 +54,7 @@ The following have been reloaded with a version change: 1) modenv/ml => modenv/
!!! note
On
partition
`alpha`, the most recent modules are build in `hiera`. Alternative modules might be
On
cluster
`alpha`, the most recent modules are build in `hiera`. Alternative modules might be
build in `scs5`.
## Machine Learning via Console
...
...
@@ -83,7 +83,7 @@ create documents containing live code, equations, visualizations, and narrative
TensorFlow or PyTorch) on ZIH systems and to run your Jupyter notebooks on HPC nodes.
After accessing JupyterHub, you can start a new session and configure it. For machine learning
purposes, select either
partition
`alpha`
or
`
ml
`
and the resources, your application requires.
purposes, select either
cluster
`alpha`
or
`
power
`
and the resources, your application requires.
In your session you can use
[
Python
](
data_analytics_with_python.md#jupyter-notebooks
)
,
[
R
](
data_analytics_with_r.md#r-in-jupyterhub
)
or
[
RStudio
](
data_analytics_with_rstudio.md
)
for your
...
...
@@ -158,7 +158,7 @@ still need to download some datasets use [Datamover](../data_transfer/datamover.
The ImageNet project is a large visual database designed for use in visual object recognition
software research. In order to save space in the filesystem by avoiding to have multiple duplicates
of this lying around, we have put a copy of the ImageNet database (ILSVRC2012 and ILSVR2017) under
`/
scratch
/imagenet`
which you can use without having to download it again. For the future, the
`/
data/horse
/imagenet`
which you can use without having to download it again. For the future, the
ImageNet dataset will be available in
[
Warm Archive
](
../data_lifecycle/workspaces.md#mid-term-storage
)
. ILSVR2017 also includes a dataset
for recognition objects from a video. Please respect the corresponding
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment