diff --git a/doc.zih.tu-dresden.de/docs/data_transfer/overview.md b/doc.zih.tu-dresden.de/docs/data_transfer/overview.md index c82a68ba50c9ab75c2fdc31d647d8a8827db694c..28b458b6b831a36469f30018755bd89fa011a20a 100644 --- a/doc.zih.tu-dresden.de/docs/data_transfer/overview.md +++ b/doc.zih.tu-dresden.de/docs/data_transfer/overview.md @@ -15,7 +15,7 @@ copy data to/from ZIH systems. Please follow the link to the documentation on !!! warning "Note" The former **export nodes** are still available as long as the outdated filesystems (`scratch`, - etc.) are accessible. Their end of life is considered for May 2024. + `ssd`, etc.) are accessible. Their operation will end on January 3rd, 2024. ## Data Transfer Inside ZIH Systems: Datamover diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md index 181497887d33ab24143bb5d4f32d6025c1fa84b3..32e6605041461b481554a2f970da1c39ba7b40a9 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md @@ -7,37 +7,88 @@ processing extremely large data sets. Moreover it is also a perfect platform for data-intensive and compute-intensive applications and has extensive capabilities for energy measurement and performance monitoring. Therefore provides ideal conditions to achieve the ambitious research goals of the users and the ZIH. The HPC system, redesigned in December 2023, consists of five homogeneous clusters with their own [Slurm](../jobs_and_resources/slurm.md) instances and cluster specific [login nodes](hardware_overview.md#login-nodes). The clusters share one -[filesystem](../data_lifecycle/file_systems.md). -This setup enables users to easily switch -between [the components](hardware_overview.md), each specialized for different application -scenarios. +[filesystem](../data_lifecycle/file_systems.md) which enables users to easily switch +between the components. ## Selection of Suitable Hardware -The five clusters `barnard`, `alpha`,`romeo`, `power` and `julia` differ, among others, in number of nodes, cores per node, and GPUs and memory. The particular characteristica qualify them for specific applications. - -## Which cluster do I need? -overview of clusters: - -<!-- partitions_and_limits_table --> -| Partition | Nodes | # Nodes | Cores per Node | Threads per Core | Memory per Node [in MB] | Memory per Core [in MB] | GPUs per Node -|:--------|:------|--------:|---------------:|------------:|------------:|--------------:|--------------:| -| barnard | taurusi[2045-2103] | 59 | 24 | 1 | 62,000 | 2,583 | 4 | - -| haswell | taurusi[6001-6604] | 609 | | | | | | -| haswell64 | taurusi[6001-6540,6559-6604] | 586 | 24 | 1 | 61,000 | 2,541 | | -| haswell256 | taurusi[6541-6558] | 18 | 24 | 1 | 254,000 | 10,583 | | -| interactive | taurusi[6605-6612] | 8 | 24 | 1 | 61,000 | 2,541 | | -| hpdlf | taurusa[3-16] | 14 | 12 | 1 | 95,000 | 7,916 | 3 | -| power | taurusml[3-32] | 30 | 44 | 4 | 254,000 | 1,443 | 6 | -| ml-interactive | taurusml[1-2] | 2 | 44 | 4 | 254,000 | 1,443 | 6 | -| romeo | taurusi[7003-7192] | 190 | 128 | 2 | 505,000 | 1,972 | | -| romeo-interactive | taurusi[7001-7002] | 2 | 128 | 2 | 505,000 | 1,972 | | -| julia | taurussmp8 | 1 | 896 | 1 | 48,390,000 | 54,006 | | -| alpha | taurusi[8003-8034] | 32 | 48 | 2 | 990,000 | 10,312 | 8 | -| alpha-interactive | taurusi[8001-8002] | 2 | 48 | 2 | 990,000 | 10,312 | 8 | -{: summary="Partitions and limits table" align="bottom"} +The five clusters [`barnard`](../jobs_and_resources/barnard.md), [`alpha`](../jobs_and_resources/alpha_centauri.md),[`romeo`](../jobs_and_resources/romeo.md), [`power`](../jobs_and_resources/power9.md) and [`julia`](doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md) differ, among others, in number of nodes, cores per node, and GPUs and memory. The particular [characteristica](hardware_overview.md) qualify them for different applications. + + +### Which cluster do I need? +The majority of the basic tasks can be executed on the conventional nodes like on `barnard`. When log in to ZIH systems, you are placed on a login node where you can execute short tests and compile moderate projects. The login nodes cannot be used for real +experiments and computations. Long and extensive computational work and experiments have to be +encapsulated into so called **jobs** and scheduled to the compute nodes. + +There is no such thing as free lunch at ZIH systems. Since compute nodes are operated in multi-user +node by default, jobs of several users can run at the same time at the very same node sharing +resources, like memory (but not CPU). On the other hand, a higher throughput can be achieved by +smaller jobs. Thus, restrictions w.r.t. [memory](#memory-limits) and +[runtime limits](#runtime-limits) have to be respected when submitting jobs. + + + +The following questions may help to decide which cluster to use +- my application + - is [interactive or a batch job](../jobs_and_ressources/slurm.md)? + - requires [parallelism](#parallel-jobs)? + - requires [multithreading (SMT)](#multithreading)? +- Do I need [GPUs](../jobs_and_resources/overview.md#what do I need, a CPU or GPU?)? +- How much [run time](#runtime-limits) do I need? +- How many [cores](#how-many-cores-do-i-need?) do I need? +- How much [memory](#how-much-memory-do-i-need?) do I need? +- Which [software](#available-software) is required? + + +<!-- cluster_overview_table --> +|Name|Description| DNS | Nodes | # Nodes | Cores per Node | Threads per Core | Memory per Node [in MB] | Memory per Core [in MB] | GPUs per Node +|---|---|----|:---|---:|---:|---:|---:|---:|---:| +|**Barnard**<br>_2023_| CPU|`n[node].barnard.hpc.tu-dresden.de` |n[1001-1630] | 630 |104| 2 |515,000 |2,475 | 0 | +|**Alpha**<br>_2021_| GPU |`i[node].alpha.hpc.tu-dresden.de`|taurusi[8001-8034] | 34 | 48 | 2 | 990,000 | 10,312| 8 | +|**Romeo**<br>_2020_| CPU |`i[node].romeo.hpc.tu-dresden.de`|taurusi[7001-7192] | 192|128 | 2 | 505,000| 1,972 | 0 | +|**Julia**<br>_2021_| single SMP system |`smp8.julia.hpc.tu-dresden.de`|taurusa[3-16] | 14 | 12 | 1 | 95,000 | 7,916 | 3 | +|**Power**<br>_2018_|IBM Power/GPU system |`ml[node].power9.hpc.tu-dresden.de`|taurusml[3-32] | 30 | 44 | 4 | 254,000 | 1,443 | 6 | +{: summary="cluster overview table" align="bottom"} + +### interactive or batch mode +**Interactive jobs:** An interactive job is the best choice for testing and development. See + [interactive-jobs](slurm.md). +Slurm can forward your X11 credentials to the first node (or even all) for a job +with the `--x11` option. To use an interactive job you have to specify `-X` flag for the ssh login. + +However, using `srun` directly on the Shell will lead to blocking and launch an interactive job. +Apart from short test runs, it is recommended to encapsulate your experiments and computational +tasks into batch jobs and submit them to the batch system. For that, you can conveniently put the +parameters directly into the job file which you can submit using `sbatch [options] <job file>`. + + +### Parallel Jobs + +**MPI jobs:** For MPI jobs typically allocates one core per task. Several nodes could be allocated +if it is necessary. The batch system [Slurm](slurm.md) will automatically find suitable hardware. + +**OpenMP jobs:** SMP-parallel applications can only run **within a node**, so it is necessary to +include the [batch system](slurm.md) options `-N 1` and `-n 1`. Using `--cpus-per-task N` Slurm will +start one task and you will have `N` CPUs. The maximum number of processors for an SMP-parallel +program is 896 on partition `julia` (be aware that +the application has to be developed with that large number of threads in mind). + +Partitions with GPUs are best suited for **repetitive** and **highly-parallel** computing tasks. If +you have a task with potential [data parallelism](../software/gpu_programming.md) most likely that +you need the GPUs. Beyond video rendering, GPUs excel in tasks such as machine learning, financial +simulations and risk modeling. Use the cluster `power` only if you need GPUs! Otherwise +using the x86-based partitions most likely would be more beneficial. + + +### Multithreading +Some cluster/nodes have Simultaneous Multithreading (SMT) enabled, e.g [`alpha`](slurm.md) You request for this +additional threads using the Slurm option `--hint=multithread` or by setting the environment +variable `SLURM_HINT=multithread`. Besides the usage of the threads to speed up the computations, +the memory of the other threads is allocated implicitly, too, and you will always get +`Memory per Core`*`number of threads` as memory pledge. + + ### What do I need, a CPU or GPU? @@ -60,57 +111,174 @@ by a significant factor then this might be the obvious choice. a single GPU's core can handle is small), GPUs are not as versatile as CPUs. +### How much time do I need? +#### Runtime limits +!!! warning "Runtime limits on login nodes" -When log in to ZIH systems, you are placed on a login node where you can -[manage data life cycle](../data_lifecycle/overview.md), -setup experiments, -execute short tests and compile moderate projects. The login nodes cannot be used for real -experiments and computations. Long and extensive computational work and experiments have to be -encapsulated into so called **jobs** and scheduled to the compute nodes. + There is a time limit of 600 seconds set for processes on login nodes. Each process running + longer than this time limit is automatically killed. The login nodes are shared ressources + between all users of ZIH system and thus, need to be available and cannot be used for productive + runs. + + ``` + CPU time limit exceeded + ``` + + Please submit extensive application runs to the compute nodes using the [batch system](slurm.md). + +!!! note "Runtime limits are enforced." + + A job is canceled as soon as it exceeds its requested limit. Currently, the maximum run time + limit is 7 days. + +Shorter jobs come with multiple advantages: + +- lower risk of loss of computing time, +- shorter waiting time for scheduling, +- higher job fluctuation; thus, jobs with high priorities may start faster. + +To bring down the percentage of long running jobs we restrict the number of cores with jobs longer +than 2 days to approximately 50% and with jobs longer than 24 to 75% of the total number of cores. +(These numbers are subject to change.) As best practice we advise a run time of about 8h. + +!!! hint "Please always try to make a good estimation of your needed time limit." + + For this, you can use a command line like this to compare the requested timelimit with the + elapsed time for your completed jobs that started after a given date: + + ```console + marie@login$ sacct -X -S 2021-01-01 -E now --format=start,JobID,jobname,elapsed,timelimit -s COMPLETED + ``` + +Instead of running one long job, you should split it up into a chain job. Even applications that are +not capable of checkpoint/restart can be adapted. Please refer to the section +[Checkpoint/Restart](../jobs_and_resources/checkpoint_restart.md) for further documentation. + +### How many cores do I need? + +ZIH systems are focused on data-intensive computing. They are meant to be used for highly +parallelized code. Please take that into account when migrating sequential code from a local +machine to our HPC systems. To estimate your execution time when executing your previously +sequential program in parallel, you can use [Amdahl's law](https://en.wikipedia.org/wiki/Amdahl%27s_law). +Think in advance about the parallelization strategy for your project and how to effectively use HPC resources. + +However, this is highly depending on the used software, investigate if your application supports a parallel execution. + +### How much memory do I need? +#### Memory Limits + +!!! note "Memory limits are enforced." + + Jobs which exceed their per-node memory limit are killed automatically by the batch system. + +Memory requirements for your job can be specified via the `sbatch/srun` parameters: + +`--mem-per-cpu=<MB>` or `--mem=<MB>` (which is "memory per node"). The **default limit** regardless +of the partition it runs on is quite low at **300 MB** per CPU. If you need more memory, you need +to request it. + +ZIH systems comprise different sets of nodes with different amount of installed memory which affect +where your job may be run. To achieve the shortest possible waiting time for your jobs, you should +be aware of the limits shown in the +[Partitions and limits table](../jobs_and_resources/partitions_and_limits.md#slurm-partitions). Follow the page [Slurm](slurm.md) for comprehensive documentation using the batch system at ZIH systems. There is also a page with extensive set of [Slurm examples](slurm_examples.md). + +### Which software is required? +#### Available software + +Pre-installed software on our HPC systems is managed via [modules](../software/modules.md). +You can see the +[list of software that's already installed and accessible via modules](https://gauss-allianz.de/de/application?organizations%5B0%5D=1200). +However, there are many +different variants of these modules available. Each cluster has its own set of installed modules, [depending on their purpose](doc.zih.tu-dresden.de/docs/software/.md) + +Specific modules can be found with: + + +```console +marie@compute$ module spider <software_name> +``` + ### Available Hardware ZIH provides a broad variety of compute resources ranging from normal server CPUs of different manufactures, large shared memory nodes, GPU-assisted nodes up to highly specialized resources for [Machine Learning](../software/machine_learning.md) and AI. -The page [ZIH Systems](hardware_overview.md) holds a comprehensive overview. -The desired hardware can be specified by the partition `-p, --partition` flag in Slurm. -The majority of the basic tasks can be executed on the conventional nodes like a Haswell. Slurm will -automatically select a suitable partition depending on your memory and GPU requirements. -### Parallel Jobs +## Barnard -**MPI jobs:** For MPI jobs typically allocates one core per task. Several nodes could be allocated -if it is necessary. The batch system [Slurm](slurm.md) will automatically find suitable hardware. +The cluster **Barnard** is a general purpose cluster by Bull. It is based on Intel Sapphire Rapids +CPUs. -**OpenMP jobs:** SMP-parallel applications can only run **within a node**, so it is necessary to -include the [batch system](slurm.md) options `-N 1` and `-n 1`. Using `--cpus-per-task N` Slurm will -start one task and you will have `N` CPUs. The maximum number of processors for an SMP-parallel -program is 896 on partition `julia`, see [partitions](partitions_and_limits.md) (be aware that -the application has to be developed with that large number of threads in mind). +- 630 diskless nodes, each with + - 2 x Intel Xeon Platinum 8470 (52 cores) @ 2.00 GHz, Multithreading enabled + - 512 GB RAM +- Hostnames: `n[1001-1630].barnard.hpc.tu-dresden.de` +- Login nodes: `login[1-4].barnard.hpc.tu-dresden.de` -Partitions with GPUs are best suited for **repetitive** and **highly-parallel** computing tasks. If -you have a task with potential [data parallelism](../software/gpu_programming.md) most likely that -you need the GPUs. Beyond video rendering, GPUs excel in tasks such as machine learning, financial -simulations and risk modeling. Use the partitions `gpu2` and `ml` only if you need GPUs! Otherwise -using the x86-based partitions most likely would be more beneficial. +## Alpha Centauri -**Interactive jobs:** An interactive job is the best choice for testing and development. See - [interactive-jobs](slurm.md). -Slurm can forward your X11 credentials to the first node (or even all) for a job -with the `--x11` option. To use an interactive job you have to specify `-X` flag for the ssh login. +The cluster **Alpha Centauri** (short: **Alpha**) by NEC provides AMD Rome CPUs and NVIDIA A100 GPUs +and designed for AI and ML tasks. + +- 34 nodes, each with + - 8 x NVIDIA A100-SXM4 Tensor Core-GPUs + - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available + - 1 TB RAM + - 3.5 TB local memory on NVMe device at `/tmp` +- Hostnames: `i[8001-8037].alpha.hpc.tu-dresden.de` +- Login nodes: `login[1-2].alpha.hpc.tu-dresden.de` +- Further information on the usage is documented on the site [GPU Cluster Alpha Centauri](alpha_centauri.md) + +## Romeo + +The cluster **Romeo** is a general purpose cluster by NEC based on AMD Rome CPUs. + +- 192 nodes, each with + - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available + - 512 GB RAM + - 200 GB local memory on SSD at `/tmp` +- Hostnames: `i[7001-7190].romeo.hpc.tu-dresden.de` (after + [recabling phase](architecture_2023.md#migration-phase)]) +- Login nodes: `login[1-2].romeo.hpc.tu-dresden.de` +- Further information on the usage is documented on the site [CPU Cluster Romeo](romeo.md) + +## Julia + +The cluster **Julia** is a large SMP (shared memory parallel) system by HPE based on Superdome Flex +architecture. + +- 1 node, with + - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20 GHz (28 cores) + - 47 TB RAM +- Configured as one single node +- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols) +- 370 TB of fast NVME storage available at `/nvme/<projectname>` +- Hostname: `smp8.julia.hpc.tu-dresden.de` (after + [recabling phase](architecture_2023.md#migration-phase)]) +- Further information on the usage is documented on the site [SMP System Julia](julia.md) + +## Power + +The cluster **power** by IBM is based on Power9 CPUs and provides NVIDIA V100 GPUs. +**power** is specifically designed for machine learning tasks. + +- 32 nodes, each with + - 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores) + - 256 GB RAM DDR4 2666 MHz + - 6 x NVIDIA VOLTA V100 with 32 GB HBM2 + - NVLINK bandwidth 150 GB/s between GPUs and host +- Hostnames: `ml[1-29].power9.hpc.tu-dresden.de` (after + [recabling phase](architecture_2023.md#migration-phase)]) +- Login nodes: `login[1-2].power9.hpc.tu-dresden.de` +- Further information on the usage is documented on the site [GPU Cluster Power9](power9.md) -## Interactive vs. Batch Mode -However, using `srun` directly on the Shell will lead to blocking and launch an interactive job. -Apart from short test runs, it is recommended to encapsulate your experiments and computational -tasks into batch jobs and submit them to the batch system. For that, you can conveniently put the -parameters directly into the job file which you can submit using `sbatch [options] <job file>`. ## Processing of Data for Input and Output diff --git a/doc.zih.tu-dresden.de/docs/software/visualization.md b/doc.zih.tu-dresden.de/docs/software/visualization.md index 6c68e9a1a5891b92934ae600c57f23bbf1ebd0df..9295598646fa388add7b0b0be3935db7b14e1682 100644 --- a/doc.zih.tu-dresden.de/docs/software/visualization.md +++ b/doc.zih.tu-dresden.de/docs/software/visualization.md @@ -2,6 +2,8 @@ ## ParaView +_**-- currently under construction--**_ + [ParaView](https://paraview.org) is an open-source, multi-platform data analysis and visualization application. The ParaView package comprises different tools which are designed to meet interactive, batch and in-situ workflows. @@ -9,6 +11,10 @@ batch and in-situ workflows. ParaView is available on ZIH systems from the [modules system](modules.md#module-environments). The following command lists the available versions + +_The module environments /hiera, /scs5, /classic and /ml originated from the taurus system are momentarily under construction. The script will be updated after completion of the redesign accordingly_ + + ```console marie@login$ module avail ParaView