Skip to content
Snippets Groups Projects
Commit 1b49c43e authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Brief review

parent c35e08b6
No related branches found
No related tags found
4 merge requests!392Merge preview into contrib guide for browser users,!333Draft: update NGC containers,!327Merge preview into main,!317Jobs and resources
# Alpha Centauri - Multi-GPU sub-cluster # Alpha Centauri - Multi-GPU Sub-Cluster
The sub-cluster "AlphaCentauri" had been installed for AI-related computations (ScaDS.AI). The sub-cluster "Alpha Centauri" had been installed for AI-related computations (ScaDS.AI).
It has 34 nodes, each with: It has 34 nodes, each with:
- 8 x NVIDIA A100-SXM4 (40 GB RAM) * 8 x NVIDIA A100-SXM4 (40 GB RAM)
- 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz with multithreading enabled * 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz with multi-threading enabled
- 1 TB RAM 3.5 TB `/tmp` local NVMe device * 1 TB RAM 3.5 TB `/tmp` local NVMe device
- Hostnames: `taurusi[8001-8034]` * Hostnames: `taurusi[8001-8034]`
- Slurm partition `alpha` for batch jobs and `alpha-interactive` for interactive jobs * Slurm partition `alpha` for batch jobs and `alpha-interactive` for interactive jobs
!!! note !!! note
...@@ -23,8 +23,8 @@ The software for the `alpha` partition is available in `modenv/hiera` module env ...@@ -23,8 +23,8 @@ The software for the `alpha` partition is available in `modenv/hiera` module env
To check the available modules for `modenv/hiera`, use the command To check the available modules for `modenv/hiera`, use the command
```bash ```console
module spider <module_name> marie@alpha$ module spider <module_name>
``` ```
For example, to check whether PyTorch is available in version 1.7.1: For example, to check whether PyTorch is available in version 1.7.1:
...@@ -95,11 +95,11 @@ Successfully installed torchvision-0.10.0 ...@@ -95,11 +95,11 @@ Successfully installed torchvision-0.10.0
### JupyterHub ### JupyterHub
[JupyterHub](../access/jupyterhub.md) can be used to run Jupyter notebooks on AlphaCentauri [JupyterHub](../access/jupyterhub.md) can be used to run Jupyter notebooks on Alpha Centauri
sub-cluster. As a starting configuration, a "GPU (NVIDIA Ampere A100)" preset can be used sub-cluster. As a starting configuration, a "GPU (NVIDIA Ampere A100)" preset can be used
in the advanced form. In order to use latest software, it is recommended to choose in the advanced form. In order to use latest software, it is recommended to choose
`fosscuda-2020b` as a standard environment. Already installed modules from `modenv/hiera` `fosscuda-2020b` as a standard environment. Already installed modules from `modenv/hiera`
can be pre-loaded in "Preload modules (modules load):" field. can be preloaded in "Preload modules (modules load):" field.
### Containers ### Containers
...@@ -109,6 +109,6 @@ Detailed information about containers can be found [here](../software/containers ...@@ -109,6 +109,6 @@ Detailed information about containers can be found [here](../software/containers
Nvidia Nvidia
[NGC](https://developer.nvidia.com/blog/how-to-run-ngc-deep-learning-containers-with-singularity/) [NGC](https://developer.nvidia.com/blog/how-to-run-ngc-deep-learning-containers-with-singularity/)
containers can be used as an effective solution for machine learning related tasks. (Downloading containers can be used as an effective solution for machine learning related tasks. (Downloading
containers requires registration). Nvidia-prepared containers with software solutions for specific containers requires registration). Nvidia-prepared containers with software solutions for specific
scientific problems can simplify the deployment of deep learning workloads on HPC. NGC containers scientific problems can simplify the deployment of deep learning workloads on HPC. NGC containers
have shown consistent performance compared to directly run code. have shown consistent performance compared to directly run code.
...@@ -47,6 +47,7 @@ ecryptfs ...@@ -47,6 +47,7 @@ ecryptfs
engl engl
english english
env env
EPYC
Espresso Espresso
ESSL ESSL
fastfs fastfs
...@@ -78,6 +79,7 @@ HDFS ...@@ -78,6 +79,7 @@ HDFS
HDFView HDFView
Horovod Horovod
hostname hostname
Hostnames
HPC HPC
HPL HPL
html html
...@@ -133,11 +135,13 @@ natively ...@@ -133,11 +135,13 @@ natively
NCCL NCCL
Neptun Neptun
NFS NFS
NGC
NRINGS NRINGS
NUMA NUMA
NUMAlink NUMAlink
NumPy NumPy
Nutzungsbedingungen Nutzungsbedingungen
Nvidia
NVMe NVMe
NWChem NWChem
OME OME
...@@ -169,6 +173,8 @@ PMI ...@@ -169,6 +173,8 @@ PMI
png png
PowerAI PowerAI
ppc ppc
Preload
preloaded
PSOCK PSOCK
Pthreads Pthreads
pymdownx pymdownx
...@@ -220,6 +226,7 @@ stdout ...@@ -220,6 +226,7 @@ stdout
subdirectories subdirectories
subdirectory subdirectory
SUSE SUSE
SXM
TBB TBB
TCP TCP
TensorBoard TensorBoard
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment