diff --git a/README.md b/README.md index f23e803c9f0d0c8a32b0361bfb8d50bfc6a1ffb3..17a8f9baadd7a1f34655f311518f8661eea754eb 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # ZIH HPC Documentation This repository contains the documentation of the HPC systems and services provided at -[TU Dresden/ZIH](https://tu-dresden.de/zih/). +https://doc.zih.tu-dresden.de/ by [TU Dresden/ZIH](https://tu-dresden.de/zih/). ## Setup diff --git a/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md b/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md index fc73348f258ce2e8e4264ded0e7eb6fcfa29ea25..56daf41c37faa0f446228b6afb1938b69ad92444 100644 --- a/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md +++ b/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md @@ -7,6 +7,8 @@ ## Login Nodes +### Taurus + The following hostnames can be used to access ZIH systems: - `taurus.hrsk.tu-dresden.de` @@ -40,6 +42,76 @@ shown matches one of the table. In this case, the fingerprint matches the one given in the table. Thus, one can proceed by typing 'yes'. +### Barnard + +The following hostnames can be used to access ZIH systems: + +- `login1.barnard.hpc.tu-dresden.de` +- `login2.barnard.hpc.tu-dresden.de` + +All of these login nodes share common keys. When connecting, please make sure that the fingerprint +shown matches one of the table. + +#### login1.barnard.hpc.tu-dresden.de + +| Key type | Fingerprint | +|:---------|:----------------------------------------------------| +| RSA | SHA256:c3Gs+WH1STvogyR9jTAPH8vhc5Ue75XsYlyCFbLDhJU | +| RSA | MD5:73:dc:bc:79:dc:1b:bf:e8:cb:1b:f5:1f:5e:6d:64:19 | +| ECDSA | SHA256:8Coljw7yoVH6HA8u+K3makRK9HfOSfe+BG8W/CUEPp0 | +| ECDSA | MD5:63:9a:67:68:37:85:31:77:a4:e6:0b:da:8c:d9:2f:96 | +| ED25519 | SHA256:Ws/vbrp5e/Ay+fcVzhsL0jupjGkDdn1cJ+SX6gQB6Bs | +| ED25519 | MD5:7f:5c:e6:2b:6f:94:24:9b:0f:2f:1d:bc:40:6b:59:c7 | +{: summary="List of valid fingerprints for Barnard login1 node"} + +#### login2.barnard.hpc.tu-dresden.de + +| Key type | Fingerprint | +|:---------|:----------------------------------------------------| +| RSA | SHA256:ZYQvEUc493+BETiJZ/jwqp/rHCM+IW1Trf6FdcSrqX4 | +| RSA | MD5:f8:1f:b6:f2:02:81:b9:ee:ce:99:71:ee:35:ad:32:fc | +| ECDSA | SHA256:01KP44mEMzswqdrkgA7ViO62Atdk8pAib/pmLBgetyk | +| ECDSA | MD5:d9:33:27:26:00:c9:81:cf:bb:45:43:dc:05:e8:1f:43 | +| ED25519 | SHA256:BewwkydtP2riZPShvNzAOWm+dQtdOq535j7Vow1HbRQ | +| ED25519 | MD5:18:8b:cd:1e:2e:9a:6c:8c:ee:b5:c9:3e:68:a3:4a:3f | +{: summary="List of valid fingerprints for Barnard login2 node"} + +#### login3.barnard.hpc.tu-dresden.de + +| Key type | Fingerprint | +|:---------|:----------------------------------------------------| +| RSA | SHA256:+VSqQp+6LZrZXOHPuDhxd2ti9mam/gDLSbn5kH0S2UI | +| RSA | MD5:19:16:ce:34:0e:2c:5f:37:42:06:f7:55:7d:19:cf:1a | +| ECDSA | SHA256:qZbC5BDKrTvE3J6qgGJLQwxtjfYy6pmrI7teEjFnHiE | +| ECDSA | MD5:b1:19:a6:bf:9e:95:ce:ee:fd:ab:b3:ee:5e:d7:e0:a7 | +| ED25519 | SHA256:ATNHOAZNjWHAXMwTWgxMvB9DIZ5bZurneN4sBKGSsz8 | +| ED25519 | MD5:ee:cb:cc:ff:be:15:f2:e8:8e:ac:ef:da:a1:f9:48:33 | +{: summary="List of valid fingerprints for Barnard login3 node"} + +#### login4.barnard.hpc.tu-dresden.de + +| Key type | Fingerprint | +|:---------|:----------------------------------------------------| +| RSA | SHA256:IYpo+qHKOIs4TEftlDp63QlQr85xlcgbapfMsbCeZDE | +| RSA | MD5:a7:3d:c3:be:53:62:7f:fc:5a:b5:6b:ba:8c:83:6e:4c | +| ECDSA | SHA256:nnUzS1Zu9+yaXf8ayDIwmfXabPtyvdr5c3Hvp+/zXhs | +| ECDSA | MD5:69:f9:54:60:24:79:22:cb:7f:ba:d0:90:f5:0f:4a:5d | +| ED25519 | SHA256:1QXw+IC51iT55LiE/7JJEXL7Jm1GZjk+/7OjaYfWXUY | +| ED25519 | MD5:17:8c:ea:26:dc:f0:43:61:a8:4d:06:e3:8e:f7:27:29 | +{: summary="List of valid fingerprints for Barnard login4 node"} + +??? example "Connecting with SSH" + + ```console + marie@local$ ssh login1.barnard.hpc.tu-dresden.de + The authenticity of host 'login1.barnard.hpc.tu-dresden.de (172.24.95.28)' can't be established. + ECDSA key fingerprint is SHA256:8Coljw7yoVH6HA8u+K3makRK9HfOSfe+BG8W/CUEPp0. + Are you sure you want to continue connecting (yes/no)? + ``` + + In this case, the fingerprint matches the one given in the table. Thus, one can proceed by + typing 'yes'. + ## Export Nodes The following hostnames can be used to transfer files to/from ZIH systems: diff --git a/doc.zih.tu-dresden.de/docs/archive/hardware_overview_2022.md b/doc.zih.tu-dresden.de/docs/archive/hardware_overview_2022.md new file mode 100644 index 0000000000000000000000000000000000000000..3974a2524de36f7dd2b2fdb03f5a54fa3375f6a0 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/archive/hardware_overview_2022.md @@ -0,0 +1,109 @@ +# HPC Resources + +HPC resources in ZIH systems comprise the *High Performance Computing and Storage Complex* and its +extension *High Performance Computing – Data Analytics*. In total it offers scientists +about 60,000 CPU cores and a peak performance of more than 1.5 quadrillion floating point +operations per second. The architecture specifically tailored to data-intensive computing, Big Data +analytics, and artificial intelligence methods with extensive capabilities for energy measurement +and performance monitoring provides ideal conditions to achieve the ambitious research goals of the +users and the ZIH. + +## Login and Export Nodes + +- 4 Login-Nodes `tauruslogin[3-6].hrsk.tu-dresden.de` + - Each login node is equipped with 2x Intel(R) Xeon(R) CPU E5-2680 v3 with 24 cores in total @ + 2.50 GHz, Multithreading disabled, 64 GB RAM, 128 GB SSD local disk + - IPs: 141.30.73.\[102-105\] +- 2 Data-Transfer-Nodes `taurusexport[3-4].hrsk.tu-dresden.de` + - DNS Alias `taurusexport.hrsk.tu-dresden.de` + - 2 Servers without interactive login, only available via file transfer protocols + (`rsync`, `ftp`) + - IPs: 141.30.73.\[82,83\] + - Further information on the usage is documented on the site + [Export Nodes](../data_transfer/export_nodes.md) + +## AMD Rome CPUs + NVIDIA A100 + +- 34 nodes, each with + - 8 x NVIDIA A100-SXM4 Tensor Core-GPUs + - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available + - 1 TB RAM + - 3.5 TB local memory on NVMe device at `/tmp` +- Hostnames: `taurusi[8001-8034]` +- Slurm partition: `alpha` +- Further information on the usage is documented on the site [Alpha Centauri Nodes](../jobs_and_resources/alpha_centauri.md) + +## Island 7 - AMD Rome CPUs + +- 192 nodes, each with + - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available + - 512 GB RAM + - 200 GB local memory on SSD at `/tmp` +- Hostnames: `taurusi[7001-7192]` +- Slurm partition: `romeo` +- Further information on the usage is documented on the site [AMD Rome Nodes](../jobs_and_resources/rome_nodes.md) + +## Large SMP System HPE Superdome Flex + +- 1 node, with + - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20 GHz (28 cores) + - 47 TB RAM +- Configured as one single node +- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols) +- 370 TB of fast NVME storage available at `/nvme/<projectname>` +- Hostname: `taurussmp8` +- Slurm partition: `julia` +- Further information on the usage is documented on the site [HPE Superdome Flex](../jobs_and_resources/sd_flex.md) + +## IBM Power9 Nodes for Machine Learning + +For machine learning, we have IBM AC922 nodes installed with this configuration: + +- 32 nodes, each with + - 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores) + - 256 GB RAM DDR4 2666 MHz + - 6 x NVIDIA VOLTA V100 with 32 GB HBM2 + - NVLINK bandwidth 150 GB/s between GPUs and host +- Hostnames: `taurusml[1-32]` +- Slurm partition: `ml` + +## Island 6 - Intel Haswell CPUs + +- 612 nodes, each with + - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores) @ 2.50 GHz, Multithreading disabled + - 128 GB local memory on SSD +- Varying amounts of main memory (selected automatically by the batch system for you according to + your job requirements) + * 594 nodes with 2.67 GB RAM per core (64 GB in total): `taurusi[6001-6540,6559-6612]` + - 18 nodes with 10.67 GB RAM per core (256 GB in total): `taurusi[6541-6558]` +- Hostnames: `taurusi[6001-6612]` +- Slurm Partition: `haswell` + +??? hint "Node topology" + +  + {: align=center} + +## Island 2 Phase 2 - Intel Haswell CPUs + NVIDIA K80 GPUs + +- 64 nodes, each with + - 2 x Intel(R) Xeon(R) CPU E5-E5-2680 v3 (12 cores) @ 2.50 GHz, Multithreading disabled + - 64 GB RAM (2.67 GB per core) + - 128 GB local memory on SSD + - 4 x NVIDIA Tesla K80 (12 GB GDDR RAM) GPUs +- Hostnames: `taurusi[2045-2108]` +- Slurm Partition: `gpu2` +- Node topology, same as [island 4 - 6](#island-6-intel-haswell-cpus) + +## SMP Nodes - up to 2 TB RAM + +- 5 Nodes, each with + - 4 x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @ 2.20 GHz, Multithreading disabled + - 2 TB RAM +- Hostnames: `taurussmp[3-7]` +- Slurm partition: `smp2` + +??? hint "Node topology" + +  + {: align=center} diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/hdfview_memory.png b/doc.zih.tu-dresden.de/docs/archive/misc/hdfview_memory.png similarity index 100% rename from doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/hdfview_memory.png rename to doc.zih.tu-dresden.de/docs/archive/misc/hdfview_memory.png diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/i4000.png b/doc.zih.tu-dresden.de/docs/archive/misc/i4000.png similarity index 100% rename from doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/i4000.png rename to doc.zih.tu-dresden.de/docs/archive/misc/i4000.png diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/smp2.png b/doc.zih.tu-dresden.de/docs/archive/misc/smp2.png similarity index 100% rename from doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/smp2.png rename to doc.zih.tu-dresden.de/docs/archive/misc/smp2.png diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_profiling.md b/doc.zih.tu-dresden.de/docs/archive/slurm_profiling.md similarity index 96% rename from doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_profiling.md rename to doc.zih.tu-dresden.de/docs/archive/slurm_profiling.md index deb9e3331aa774d11d4f167225074475eff6383f..3ca0a8e2b6e4618923a379ed8bcec854256b7fbf 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_profiling.md +++ b/doc.zih.tu-dresden.de/docs/archive/slurm_profiling.md @@ -70,7 +70,8 @@ More information about profiling with Slurm: ## Memory Consumption of a Job If you are only interested in the maximal memory consumption of your job, you don't need profiling -at all. This information can be retrieved from within [job files](slurm.md#batch-jobs) as follows: +at all. This information can be retrieved from within +[job files](../jobs_and_resources/slurm.md#batch-jobs) as follows: ```bash #!/bin/bash diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md index 615282e59c23aa93844116a5a58939274bf5f12f..17394c63f91dc6536cc6351a91d52f6a972cb278 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md @@ -180,7 +180,7 @@ Useful options: To list your personal filesystem usage and limits (quota), invoke ```console -marie@login$ lfs quota -h -u $LOGIN <filesystem> +marie@login$ lfs quota -h -u $USER <filesystem> ``` Useful options: diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md index 3db16f94c248bfb726f6f80a207446434a69b29d..924d98077b2489ba5f2516f3e21fe49004747ad2 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md @@ -28,10 +28,10 @@ times. ### List Available Filesystems To list all available filesystems for using workspaces, you can either invoke `ws_list -l` or -`ws_find -l`, e.g., +`ws_find --list`, e.g., ```console -marie@login$ ws_find -l +marie@login$ ws_find --list available filesystems: scratch (default) warm_archive @@ -44,7 +44,7 @@ beegfs The default filesystem is `scratch`. If you prefer another filesystem (cf. section [List Available Filesystems](#list-available-filesystems)), you have to explictly - provide the option `-F <fs>` to the workspace commands. + provide the option `--filesystem=<filesystem>` to the workspace commands. ### List Current Workspaces @@ -67,7 +67,7 @@ overview of some of these options. All available options can be queried by `ws_l === "Certain filesystem" ``` - marie@login$ ws_list --filesystem scratch_fast + marie@login$ ws_list --filesystem=scratch_fast id: numbercrunch workspace directory : /lustre/ssd/ws/marie-numbercrunch remaining time : 2 days 23 hours @@ -135,7 +135,7 @@ overview of some of these options. All available options can be queried by `ws_l ### Allocate a Workspace To allocate a workspace in one of the listed filesystems, use `ws_allocate`. It is necessary to -specify a unique name and the duration of the workspace. +specify a unique name and the duration (in days) of the workspace. ```console ws_allocate: [options] workspace_name duration @@ -154,31 +154,54 @@ Options: -c [ --comment ] arg comment ``` -!!! example +!!! example "Simple workspace allocation" + + The simple way to allocate a workspace is calling `ws_allocate` command with two arguments, + where the first specifies the workspace name and the second the duration. This allocates a + workspace on the default filesystem with no e-mail reminder. ```console - marie@login$ ws_allocate -F scratch -r 7 -m marie.testuser@tu-dresden.de test-workspace 90 + marie@login$ ws_allocate test-workspace 90 Info: creating workspace. /scratch/ws/marie-test-workspace remaining extensions : 10 remaining time in days: 90 ``` -This will create a workspace with the name `test-workspace` on the `/scratch` filesystem for 90 -days with an email reminder for 7 days before the expiration. +!!! example "Workspace allocation on specific filesystem" + + In order to allocate a workspace on a non-default filesystem, the option + `--filesystem=<filesystem>` is required. + + ```console + marie@login$ ws_allocate --filesystem=scratch_fast test-workspace 3 + Info: creating workspace. + /lustre/ssd/ws/marie-test-workspace + remaining extensions : 2 + remaining time in days: 3 + ``` + +!!! example "Workspace allocation with e-mail reminder" -!!! Note "Email reminder" + This command will create a workspace with the name `test-workspace` on the `/scratch` filesystem + with a duration of 90 days and send an e-mail reminder. The e-mail reminder will be sent every + day starting 7 days prior to expiration. We strongly recommend setting this e-mail reminder. - Setting the reminder to `7` means you will get a reminder email on every day starting `7` days - prior to expiration date. We strongly recommend to set this email reminder. + ```console + marie@login$ ws_allocate --reminder=7 --mailaddress=marie.testuser@tu-dresden.de test-workspace 90 + Info: creating workspace. + /scratch/ws/marie-test-workspace + remaining extensions : 10 + remaining time in days: 90 + ``` !!! Note "Name of a workspace" - The workspace name should help you to remember the experiment and data stored here. It has to - be unique on a certain filesystem. On the other hand it is possible to use the very same name - for workspaces on different filesystems. + The workspace name should help you to remember the experiment and data stored here. It has to + be unique on a certain filesystem. On the other hand it is possible to use the very same name + for workspaces on different filesystems. -Please refer to the section [section Cooperative Usage](#cooperative-usage-group-workspaces) for +Please refer to the [section Cooperative Usage](#cooperative-usage-group-workspaces) for group workspaces. ### Extension of a Workspace @@ -186,7 +209,7 @@ group workspaces. The lifetime of a workspace is finite and different filesystems (storage systems) have different maximum durations. A workspace can be extended multiple times, depending on the filesystem. -| Filesystem (use with parameter `-F <fs>`) | Duration, days | Extensions | [Filesystem Feature](../jobs_and_resources/slurm.md#filesystem-features) | Remarks | +| Filesystem (use with parameter `--filesystem=<filesystem>`) | Duration, days | Extensions | [Filesystem Feature](../jobs_and_resources/slurm.md#filesystem-features) | Remarks | |:-------------------------------------|---------------:|-----------:|:-------------------------------------------------------------------------|:--------| | `scratch` (default) | 100 | 10 | `fs_lustre_scratch2` | Scratch filesystem (`/lustre/scratch2`, symbolic link: `/scratch`) with high streaming bandwidth, based on spinning disks | | `ssd` | 30 | 2 | `fs_lustre_ssd` | High-IOPS filesystem (`/lustre/ssd`, symbolic link: `/ssd`) on SSDs. | @@ -205,7 +228,7 @@ remaining extensions : 1 remaining time in days: 100 ``` -Mail reminder settings are retained. I.e., previously set mail alerts apply to the extended +E-mail reminder settings are retained. I.e., previously set e-mail alerts apply to the extended workspace, too. !!! attention @@ -221,28 +244,57 @@ marie@login$ ws_extend -F scratch my-workspace 40 it will now expire in 40 days **not** 130 days. -### Send Reminder for Workspace Expiry Date +### Send Reminder for Workspace Expiration Date + +We strongly recommend using one of the two provided ways to ensure that the expiration date of a +workspace is not forgotten. -Send a calendar invitation by Email to ensure that the expiration date of a workspace is not -forgotten +#### Send Daily Reminder + +An e-mail reminder can be set at workspace allocation using ```console -marie@login$ ws_send_ical -F scratch my-workspace -m marie.testuser@tu-dresden.de +ws_allocate --reminder=<N> --mailaddress=<mail> [...] ``` +This will send an e-mail every day starting `N` days prior to the expiration date. +See the [example above](#allocate-a-workspace) for reference. + +If you missed setting an e-mail reminder at workspace allocation, you can add a reminder later, e.g. + +``` +marie@login$ ws_allocate --name=FancyExp --duration=17 +[...] +marie@login$ ws_allocate --name=FancyExp --duration=17 --reminder=7 --mailaddress=marie@dlr.de +--extension +``` + +This will reallocate the workspace, which counts against your maximum number of reallocations (Note: +No data is deleted, but the database entry is modified). + +#### Send Calender Invitation + +The command `ws_send_ical` sends you an ical event on the expiration date of a specified workspace. This + calender invitation can be further managed according to your personal preferences. The syntax is + as follows: + + ```console + ws_send_ical --filesystem=<filesystem> --mail=<e-mail-address> --workspace=<workspace name> + ``` + ### Deletion of a Workspace To delete a workspace use the `ws_release` command. It is mandatory to specify the name of the workspace and the filesystem in which it is located: ```console -marie@login$ ws_release -F scratch my-workspace +marie@login$ ws_release --filesystem=scratch --name=my-workspace ``` -You can list your already released or expired workspaces using the `ws_restore -l` command. +You can list your already released or expired workspaces using the `ws_restore --list` command. ```console -marie@login$ ws_restore -l +marie@login$ ws_restore --list warm_archive: scratch: marie-my-workspace-1665014486 @@ -257,9 +309,9 @@ beegfs: ``` In this example, the user `marie` has three inactive, i.e., expired, workspaces namely -`my-workspace` in `scratch`, as well as `foo` and `bar` in `ssd` filesystem. The command `ws_restore --l` lists the name of the workspace and the expiration date. As you can see, the expiration date is -added to the workspace name as Unix timestamp. +`my-workspace` in `scratch`, as well as `foo` and `bar` in `ssd` filesystem. The command +`ws_restore --list` lists the name of the workspace and the expiration date. As you can see, the +expiration date is added to the workspace name as Unix timestamp. !!! hint "Deleting data in in an expired workspace" @@ -275,7 +327,7 @@ It performs the following steps once per day and filesystem: - Check for remaining life time of all workspaces. - If the workspaces expired, move it to a hidden directory so that it becomes inactive. -- Send reminder Emails to users if the reminder functionality was configured for their particular +- Send reminder e-mails to users if the reminder functionality was configured for their particular workspaces. - Scan through all workspaces in grace period. - If a workspace exceeded the grace period, the workspace and its data are deleted. @@ -295,7 +347,7 @@ warm_archive: 2 months), you can still restore your data **into an existing work Use ```console -marie@login$ ws_restore -l -F scratch +marie@login$ ws_restore --list --filesystem=scratch scratch: marie-my-workspace-1665014486 unavailable since Thu Oct 6 02:01:26 2022 @@ -305,12 +357,12 @@ to get a list of your expired workspaces, and then restore them like that into a workspace 'new_ws': ```console -marie@login$ ws_restore -F scratch marie-my-workspace-1665014486 new_ws +marie@login$ ws_restore --filesystem=scratch marie-my-workspace-1665014486 new_ws ``` -The expired workspace has to be specified by its full name as listed by `ws_restore -l`, including -username prefix and timestamp suffix (otherwise, it cannot be uniquely identified). The target -workspace, on the other hand, must be given with just its short name, as listed by `ws_list`, +The expired workspace has to be specified by its full name as listed by `ws_restore --list`, +including username prefix and timestamp suffix (otherwise, it cannot be uniquely identified). The +target workspace, on the other hand, must be given with just its short name, as listed by `ws_list`, without the username prefix. Both workspaces must be on the same filesystem. The data from the old workspace will be moved into @@ -381,7 +433,7 @@ the following example (which works [for the program g16](../software/nanoscale_s # Allocate workspace for this job. Adjust time span to time limit of the job (-d <N>). WSNAME=computation_$SLURM_JOB_ID - export WSDDIR=$(ws_allocate -F ssd -n ${WSNAME} -d 2) + export WSDDIR=$(ws_allocate --filesystem=ssd --name=${WSNAME} --duration=2) echo ${WSDIR} # Check allocation @@ -424,7 +476,7 @@ For a series of jobs or calculations that work on the same data, you should allo once, e.g., in `scratch` for 100 days: ```console -marie@login$ ws_allocate -F scratch my_scratchdata 100 +marie@login$ ws_allocate --filesystem=scratch my_scratchdata 100 Info: creating workspace. /scratch/ws/marie-my_scratchdata remaining extensions : 2 @@ -453,7 +505,7 @@ this is mounted read-only on the compute nodes, so you cannot use it as a work d jobs! ```console -marie@login$ ws_allocate -F warm_archive my_inputdata 365 +marie@login$ ws_allocate --filesystem=warm_archive my_inputdata 365 /warm_archive/ws/marie-my_inputdata remaining extensions : 2 remaining time in days: 365 @@ -499,7 +551,7 @@ to others (if in the same group) via `ws_list -g`. in the project `p_number_crunch`, she can allocate a so-called group workspace. ```console - marie@login$ ws_allocate --group --name numbercrunch --duration 30 + marie@login$ ws_allocate --group --name=numbercrunch --duration=30 Info: creating workspace. /scratch/ws/0/marie-numbercrunch remaining extensions : 10 @@ -555,9 +607,14 @@ wrong name. Use only the short name that is listed after `id:` when using `ws_li ---- -**Q**: Man, I've missed to specify mail alert when allocating my workspace. How can I add the mail -alert functionality to an existing workspace? +**Q**: I forgot to specify an e-mail alert when allocating my workspace. How can I add the +e-mail alert functionality to an existing workspace? + +**A**: You can add the e-mail alert by "overwriting" the workspace settings via + +```console +marie@login$ ws_allocate --extension --mailaddress=<mail address> --reminder=<days> \ + --name=<workspace-name> --duration=<duration> --filesystem=<filesystem> +``` -**A**: You can add the mail alert by "overwriting" the workspace settings via `ws_allocate -x -m -<mail address> -r <days> -n <ws-name> -d <duration> -F <fs>`. (This will lower the remaining -extensions by one.) +This will lower the remaining extensions by one. diff --git a/doc.zih.tu-dresden.de/docs/index.md b/doc.zih.tu-dresden.de/docs/index.md index c22ef202a4408ab09d938219fa9be8b896cd7ae1..64802daa9ae83761fb147961185fec3322880f0b 100644 --- a/doc.zih.tu-dresden.de/docs/index.md +++ b/doc.zih.tu-dresden.de/docs/index.md @@ -31,8 +31,10 @@ Please also find out the other ways you could contribute in our ## News +* **2023-11-16** [OpenMPI 4.1.x - Workaround for MPI-IO Performance Loss](jobs_and_resources/mpi_issues/#openmpi-v41x-performance-loss-with-mpi-io-module-ompio) +* **2023-10-04** [User tests on Barnard](jobs_and_resources/barnard_test.md) +* **2023-06-01** [New hardware and complete re-design](jobs_and_resources/architecture_2023.md) * **2023-01-04** [New hardware: NVIDIA Arm HPC Developer Kit](jobs_and_resources/arm_hpc_devkit.md) -* **2022-01-13** [Supercomputing extension for TU Dresden](https://tu-dresden.de/zih/die-einrichtung/news/supercomputing-cluster-2022) ## Training and Courses @@ -41,4 +43,9 @@ We offer a rich and colorful bouquet of courses from classical *HPC introduction [Training Offers](https://tu-dresden.de/zih/hochleistungsrechnen/nhr-training) for a detailed overview of the courses and the respective dates at ZIH. -* [HPC introduction slides](misc/HPC-Introduction.pdf) (Nov. 2022) +* [HPC introduction slides](misc/HPC-Introduction.pdf) Sep. 2023 + +Furthermore, Center for Scalable Data Analytics and Artificial Intelligence +[ScaDS.AI](https://scads.ai) Dresden/Leipzig offers various trainings with HPC focus. +Current schedule and registration is available at the +[ScaDS.AI trainings page](https://scads.ai/transfer-2/teaching-and-training/). diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/architecture_2023.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/architecture_2023.md new file mode 100644 index 0000000000000000000000000000000000000000..b0d23e2e789719ed0ff95a84f8f1056753cbb60c --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/architecture_2023.md @@ -0,0 +1,58 @@ +# Architectural Re-Design 2023 + +With the replacement of the Taurus system by the cluster `Barnard` in 2023, +the rest of the installed hardware had to be re-connected, both with +Infiniband and with Ethernet. + + +{: align=center} + +## Compute Systems + +All compute clusters now act as separate entities having their own +login nodes of the same hardware and their very own Slurm batch systems. The different hardware, +e.g. Romeo and Alpha Centauri, is no longer managed via a single Slurm instance with +corresponding partitions. Instead, you as user now chose the hardware by the choice of the +correct login node. + +The login nodes can be used for smaller interactive jobs on the clusters. There are +restrictions in place, though, wrt. usable resources and time per user. For larger +computations, please use interactive jobs. + +## Storage Systems + +### Permanent Filesystems + +We now have `/home`, `/projects` and `/software` in a Lustre filesystem. Snapshots +and tape backup are configured. For convenience, we will make the old home available +read-only as `/home_old` on the data mover nodes for the data migration period. + +`/warm_archive` is mounted on the data movers, only. + +### Work Filesystems + +With new players with new software in the filesystem market it is getting more and more +complicated to identify the best suited filesystem for temporary data. In many cases, +only tests can provide the right answer, for a short time. + +For an easier grasp on the major categories (size, speed), the work filesystems now come +with the names of animals: + +* `/data/horse` - 20 PB - high bandwidth (Lustre) +* `/data/octopus` - 0.5 PB - for interactive usage (Lustre) +* `/data/weasel` - 1 PB - for high IOPS (WEKA) - coming soon + +### Difference Between "Work" And "Permanent" + +A large number of changing files is a challenge for any backup system. To protect +our snapshots and backup from work data, +`/projects` cannot be used for temporary data on the compute nodes - it is mounted read-only. + +Please use our data mover mechanisms to transfer worthy data to permanent +storages. + +## Migration Phase + +For about one month, the new cluster Barnard, and the old cluster Taurus +will run side-by-side - both with their respective filesystems. You can find a comprehensive +[description of the migration phase here](migration_2023.md). diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/barnard_test.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/barnard_test.md new file mode 100644 index 0000000000000000000000000000000000000000..1529565f8555712da22f15e16141d8be3ad7d301 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/barnard_test.md @@ -0,0 +1,163 @@ +# Tests on Barnard + +All HPC users are invited to test our new HPC system Barnard and prepare your software +and workflows for production there. For general hints please refer to these sites: + +* [Details on architecture](/jobs_and_resources/architecture_2023), +* [Description of the migration](migration_2023.md). + +We value your feedback. Please provide it directly via our ticket system. For better processing, +please add "Barnard:" as a prefix to the subject of the [support ticket](../support/support). + +Here, you can find few hints which might help you with the first steps. + +## Login to Barnard + +All users and projects from Taurus now can work on Barnard. + +They can use `login[2-4].barnard.hpc.tu-dresden.de` to access the system +from campus (or VPN). [Fingerprints](/access/key_fingerprints/#barnard) + +All users have **new empty HOME** file systems, this means you have first have to... + +??? "... install your public ssh key on the system" + + - Please create a new SSH keypair with ed25519 encryption, secured with + a passphrase. Please refer to this + [page for instructions](../../access/ssh_login#before-your-first-connection). + - After login, add the public key to your `.ssh/authorized_keys` file + on Barnard. + +## Data Management + +* The `/project` filesystem is the same on Taurus and Barnard +(mounted read-only on the compute nodes). +* The new work filesystem is `/data/horse`. +* The slower `/data/walrus` can be considered as a substitute for the old + `/warm_archive`- mounted **read-only** on the compute nodes. + It can be used to store e.g. results. + +These `/data/horse` and `/data/walrus` can be accesed via workspaces. Please refer to the +[workspace page](../../data_lifecycle/workspaces/), if you are not familiar with workspaces. + +??? "Tips on workspaces" + * To list all available workspace filessystem, invoke the command `ws_list -l`." + * Please use the command `dtinfo` to get the current mount points: + ``` + marie@login1> dtinfo + [...] + directory on datamover mounting clusters directory on cluster + + /data/old/home Taurus /home + /data/old/lustre/scratch2 Taurus /scratch + /data/old/lustre/ssd Taurus /lustre/ssd + [...] + ``` + +!!! Warning + + All old filesystems fill be shutdown by the end of 2023. + + To work with your data from Taurus you might have to move/copy them to the new storages. + +For this, we have four new [datamover nodes](/data_transfer/datamover) that have mounted all storages +of the old and new system. (Do not use the datamovers from Taurus!) + +??? "Migration from Home Directory" + + Your personal (old) home directory at Taurus will not be automatically transferred to the new Barnard + system. **You are responsible for this task.** Please do not copy your entire home, but consider + this opportunity for cleaning up you data. E.g., it might make sense to delete outdated scripts, old + log files, etc., and move other files to an archive filesystem. Thus, please transfer only selected + directories and files that you need on the new system. + + The well-known [datamover tools](../../data_transfer/datamover/) are available to run such transfer + jobs under Slurm. The steps are as follows: + + 1. Login to Barnard: `ssh login[1-4].barnard.tu-dresden.de` + 1. The command `dtinfo` will provide you the mountpoints + + ```console + marie@barnard$ dtinfo + [...] + directory on datamover mounting clusters directory on cluster + + /data/old/home Taurus /home + /data/old/lustre/scratch2 Taurus /scratch + /data/old/lustre/ssd Taurus /lustre/ssd + [...] + ``` + + 1. Use the `dtls` command to list your files on the old home directory: `marie@barnard$ dtls + /data/old/home/marie` + 1. Use `dtcp` command to invoke a transfer job, e.g., + + ```console + marie@barnard$ dtcp --recursive /data/old/home/marie/<useful data> /home/marie/ + ``` + + **Note**, please adopt the source and target paths to your needs. All available options can be + queried via `dtinfo --help`. + + !!! warning + + Please be aware that there is **no synchronisation process** between your home directories at + Taurus and Barnard. Thus, after the very first transfer, they will become divergent. + + We recommand to **take some minutes for planing the transfer process**. Do not act with + precipitation. + +??? "Migration from `/lustre/ssd` or `/beegfs`" + + **You** are entirely responsible for the transfer of these data to the new location. + Start the dtrsync process as soon as possible. (And maybe repeat it at a later time.) + +??? "Migration from `/lustre/scratch2` aka `/scratch`" + + We are synchronizing this (**last: October 18**) to `/data/horse/lustre/scratch2/`. + + Please do **NOT** copy those data yourself. Instead check if it is already sychronized + to `/data/walrus/warm_archive/ws`. + + In case you need to update this (Gigabytes, not Terabytes!) please run `dtrsync` like in + `dtrsync -a /data/old/lustre/scratch2/ws/0/my-workspace/newest/ /data/horse/lustre/scratch2/ws/0/my-workspace/newest/` + +??? "Migration from `/warm_archive`" + + We are preparing another sync from `/warm_archive` to `The process of syncing data from `/warm_archive` to `/data/walrus/warm_archive` is still ongoing. + + Please do **NOT** copy those data yourself. Instead check if it is already sychronized + to `/data/walrus/warm_archive/ws`. + + In case you need to update this (Gigabytes, not Terabytes!) please run `dtrsync` like in + `dtrsync -a /data/old/warm_archive/ws/my-workspace/newest/ /data/walrus/warm_archive/ws/my-workspace/newest/` + +When the last compute system will have been migrated the old file systems will be +set write-protected and we start a final synchronization (sratch+walrus). +The target directories for synchronization `/data/horse/lustre/scratch2/ws` and +`/data/walrus/warm_archive/ws/` will not be deleted automatically in the mean time. + +## Software + +Please use `module spider` to identify the software modules you need to load.Like +on Taurus. + + The default release version is 23.10. + +## Slurm + +* We are running the most recent Slurm version. +* You must not use the old partition names. +* Not all things are tested. + +## Updates after your feedback (state: October 19) + +* A **second synchronization** from `/scratch` has started on **October, 18**, and is + now nearly done. +* A first, and incomplete synchronization from `/warm_archive` has been done (see above). + With support from NEC we are transferring the rest in the next weeks. +* The **data transfer tools** now work fine. +* After fixing too tight security restrictions, **all users can login** now. +* **ANSYS** now starts: please check if your specific use case works. +* **login1** is under construction, do not use it at the moment. Workspace creation does + not work there. diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md index 538296b4ea52aee6c99f132811af8112803adcf9..bf5f25146730bceb6b442bd20d4e08e73e0863fc 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md @@ -81,7 +81,7 @@ For machine learning, we have IBM AC922 nodes installed with this configuration: ??? hint "Node topology" -  +  {: align=center} ## Island 2 Phase 2 - Intel Haswell CPUs + NVIDIA K80 GPUs @@ -105,5 +105,5 @@ For machine learning, we have IBM AC922 nodes installed with this configuration: ??? hint "Node topology" -  +  {: align=center} diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview_2023.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview_2023.md new file mode 100644 index 0000000000000000000000000000000000000000..c888857b47414e2c068cac78f9ca9804efb056b5 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview_2023.md @@ -0,0 +1,81 @@ +# HPC Resources + +The architecture specifically tailored to data-intensive computing, Big Data +analytics, and artificial intelligence methods with extensive capabilities +for performance monitoring provides ideal conditions to achieve the ambitious +research goals of the users and the ZIH. + +## Overview + +From the users' perspective, there are separate clusters, all of them with their subdomains: + +| Name | Description | Year| DNS | +| --- | --- | --- | --- | +| **Barnard** | CPU cluster |2023| n[1001-1630].barnard.hpc.tu-dresden.de | +| **Romeo** | CPU cluster |2020|i[8001-8190].romeo.hpc.tu-dresden.de | +| **Alpha Centauri** | GPU cluster |2021|i[8001-8037].alpha.hpc.tu-dresden.de | +| **Julia** | single SMP system |2021|smp8.julia.hpc.tu-dresden.de | +| **Power** | IBM Power/GPU system |2018|ml[1-29].power9.hpc.tu-dresden.de | + +They run with their own Slurm batch system. Job submission is possible only from +their respective login nodes. + +All clusters will have access to these shared parallel filesystems: + +| Filesystem | Usable directory | Type | Capacity | Purpose | +| --- | --- | --- | --- | --- | +| Home | `/home` | Lustre | quota per user: 20 GB | permanent user data | +| Project | `/projects` | Lustre | quota per project | permanent project data | +| Scratch for large data / streaming | `/data/horse` | Lustre | 20 PB | | + +## Barnard - Intel Sapphire Rapids CPUs + +- 630 diskless nodes, each with + - 2 x Intel Xeon Platinum 8470 (52 cores) @ 2.00 GHz, Multithreading enabled + - 512 GB RAM +- Hostnames: `n[1001-1630].barnard.hpc.tu-dresden.de` +- Login nodes: `login[1-4].barnard.hpc.tu-dresden.de` + +## AMD Rome CPUs + NVIDIA A100 + +- 34 nodes, each with + - 8 x NVIDIA A100-SXM4 Tensor Core-GPUs + - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available + - 1 TB RAM + - 3.5 TB local memory on NVMe device at `/tmp` +- Hostnames: `taurusi[8001-8034]` -> `i[8001-8037].alpha.hpc.tu-dresden.de` +- Login nodes: `login[1-2].alpha.hpc.tu-dresden.de` +- Further information on the usage is documented on the site [Alpha Centauri Nodes](alpha_centauri.md) + +## Island 7 - AMD Rome CPUs + +- 192 nodes, each with + - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available + - 512 GB RAM + - 200 GB local memory on SSD at `/tmp` +- Hostnames: `taurusi[7001-7192]` -> `i[7001-7190].romeo.hpc.tu-dresden.de` +- Login nodes: `login[1-2].romeo.hpc.tu-dresden.de` +- Further information on the usage is documented on the site [AMD Rome Nodes](rome_nodes.md) + +## Large SMP System HPE Superdome Flex + +- 1 node, with + - 32 x Intel Xeon Platinum 8276M CPU @ 2.20 GHz (28 cores) + - 47 TB RAM +- Configured as one single node +- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols) +- 370 TB of fast NVME storage available at `/nvme/<projectname>` +- Hostname: `taurussmp8` -> `smp8.julia.hpc.tu-dresden.de` +- Further information on the usage is documented on the site [HPE Superdome Flex](sd_flex.md) + +## IBM Power9 Nodes for Machine Learning + +For machine learning, we have IBM AC922 nodes installed with this configuration: + +- 32 nodes, each with + - 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores) + - 256 GB RAM DDR4 2666 MHz + - 6 x NVIDIA VOLTA V100 with 32 GB HBM2 + - NVLINK bandwidth 150 GB/s between GPUs and host +- Hostnames: `taurusml[1-32]` -> `ml[1-29].power9.hpc.tu-dresden.de` +- Login nodes: `login[1-2].power9.hpc.tu-dresden.de` diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/migration_2023.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/migration_2023.md new file mode 100644 index 0000000000000000000000000000000000000000..3a6749cff0814d2dbf54d53288fbcaa7fcb85818 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/migration_2023.md @@ -0,0 +1,82 @@ +# Migration 2023 + +## Brief Overview over Coming Changes + +All components of Taurus will be dismantled step by step. + +### New Hardware + +The new HPC system "Barnard" from Bull comes with these main properties: + +* 630 compute nodes based on Intel Sapphire Rapids +* new Lustre-based storage systems +* HDR Infiniband network large enough to integrate existing and near-future non-Bull hardware +* To help our users to find the best location for their data we now use the name of +animals (size, speed) as mnemonics. + +More details can be found in the [overview](/jobs_and_resources/hardware_overview_2023). + +### New Architecture + +Over the last decade we have been running our HPC system of high heterogeneity with a single +Slurm batch system. This made things very complicated, especially to inexperienced users. +To lower this hurdle we now create homogenous clusters with their own Slurm instances and with +cluster specific login nodes running on the same CPU. Job submission is possible only +from within the cluster (compute or login node). + +All clusters will be integrated to the new Infiniband fabric and have then the same access to +the shared filesystems. This recabling requires a brief downtime of a few days. + +[Details on architecture](/jobs_and_resources/architecture_2023). + +### New Software + +The new nodes run on Linux RHEL 8.7. For a seamless integration of other compute hardware, +all operating system will be updated to the same versions of OS, Mellanox and Lustre drivers. +With this all application software was re-built consequently using GIT and CI for handling +the multitude of versions. + +We start with `release/23.10` which is based on software reqeusts from user feedbacks of our +HPC users. Most major software versions exist on all hardware platforms. + +## Migration Path + +Please make sure to have read [Details on architecture](/jobs_and_resources/architecture_2023) before +further reading. + +The migration can only be successful as a joint effort of HPC team and users. Here is a description +of the action items. + +|When?|TODO ZIH |TODO users |Remark | +|---|---|---|---| +| done (May 2023) |first sync /scratch to /data/horse/old_scratch2| |copied 4 PB in about 3 weeks| +| done (June 2023) |enable access to Barnard| |initialized LDAP tree with Taurus users| +| done (July 2023) | |install new software stack|tedious work | +| ASAP | |adapt scripts|new Slurm version, new resources, no partitions| +| August 2023 | |test new software stack on Barnard|new versions sometimes require different prerequisites| +| August 2023| |test new software stack on other clusters|a few nodes will be made available with the new sw stack, but with the old filesystems| +| ASAP | |prepare data migration|The small filesystems `/beegfs` and `/lustre/ssd`, and `/home` are mounted on the old systems "until the end". They will *not* be migrated to the new system.| +| July 2023 | sync `/warm_archive` to new hardware| |using datamover nodes with Slurm jobs | +| September 2023 |prepare recabling of older hardware (Bull)| |integrate other clusters in the IB infrastructure | +| Autumn 2023 |finalize integration of other clusters (Bull)| |**~2 days downtime**, final rsync and migration of `/projects`, `/warm_archive`| +| Autumn 2023 ||transfer last data from old filesystems | `/beegfs`, `/lustre/scratch`, `/lustre/ssd` are no longer available on the new systems| + +### Data Migration + +Why do users need to copy their data? Why only some? How to do it best? + +* The sync of hundreds of terabytes can only be done planned and carefully. +(`/scratch`, `/warm_archive`, `/projects`). The HPC team will use multiple syncs +to not forget the last bytes. During the downtime, `/projects` will be migrated. +* User homes (`/home`) are relatively small and can be copied by the scientists. +Keeping in mind that maybe deleting and archiving is a better choice. +* For this, datamover nodes are available to run transfer jobs under Slurm. Please refer to the +section [Transfer Data to New Home Directory](../barnard_test#transfer-data-to-new-home-directory) +for more detailed instructions. + +### A Graphical Overview + +(red: user action required): + + +{: align=center} diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2023.drawio b/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2023.drawio new file mode 100644 index 0000000000000000000000000000000000000000..59ba4ad3b2f8b9387de2ede22079055aa3c9b864 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2023.drawio @@ -0,0 +1 @@ +<mxfile host="Electron" modified="2023-06-05T08:54:45.184Z" agent="5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/20.8.10 Chrome/106.0.5249.199 Electron/21.3.5 Safari/537.36" etag="-HTyYpm7KikI7TLmpzGH" version="20.8.10" type="device"><diagram name="Seite-1" id="tXkwnWAt3AUjisLzzv1-"></diagram></mxfile> \ No newline at end of file diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2023.png b/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2023.png new file mode 100644 index 0000000000000000000000000000000000000000..bc1083880f5172240dd78f57dd8b1a7bac39dab5 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2023.png differ diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/migration_2023.png b/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/migration_2023.png new file mode 100644 index 0000000000000000000000000000000000000000..4ec6dc7bf0c9b37c02c3a5058fd998e6376ec42b Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/migration_2023.png differ diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/mpi_issues.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/mpi_issues.md index ccb34da378591594991ab746915fd90e9847920b..95f6eb58990233e85c5dfa535e0c1bde0c29ade6 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/mpi_issues.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/mpi_issues.md @@ -2,7 +2,33 @@ This pages holds known issues observed with MPI and concrete MPI implementations. -## Mpirun on partition `alpha`and `ml` +## OpenMPI v4.1.x - Performance Loss with MPI-IO-Module OMPIO + +OpenMPI v4.1.x introduced a couple of major enhancements, e.g., the `OMPIO` module is now the +default module for MPI-IO on **all** filesystems incl. Lustre (cf. +[NEWS file in OpenMPI source code](https://raw.githubusercontent.com/open-mpi/ompi/v4.1.x/NEWS)). +Prior to this, `ROMIO` was the default MPI-IO module for Lustre. + +Colleagues of ZIH have found that some MPI-IO access patterns suffer a significant performance loss +using `OMPIO` as MPI-IO module with OpenMPI/4.1.x modules on ZIH systems. At the moment, the root +cause is unclear and needs further investigation. + +**A workaround** for this performance loss is to use "old", i.e., `ROMIO` MPI-IO-module. This +is achieved by setting the environment variable `OMPI_MCA_io` before executing the application as +follows + +```console +export OMPI_MCA_io=^ompio +srun ... +``` + +or setting the option as argument, in case you invoke `mpirun` directly + +```console +mpirun --mca io ^ompio ... +``` + +## Mpirun on partition `alpha` and `ml` Using `mpirun` on partitions `alpha` and `ml` leads to wrong resource distribution when more than one node is involved. This yields a strange distribution like e.g. `SLURM_NTASKS_PER_NODE=15,1` diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md index d3e9674ced87e57dc6f45ccafccd71383c86d921..20a542d3abed3cd59b299c5d6560bc451f3eead0 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md @@ -1,7 +1,7 @@ # HPC Resources and Jobs -ZIH operates a high performance computing (HPC) system with more than 60.000 cores, 720 GPUs, and a -flexible storage hierarchy with about 16 PB total capacity. The HPC system provides an optimal +ZIH operates high performance computing (HPC) systems with more than 90.000 cores, 500 GPUs, and +a flexible storage hierarchy with about 20 PB total capacity. The HPC system provides an optimal research environment especially in the area of data analytics and machine learning as well as for processing extremely large data sets. Moreover it is also a perfect platform for highly scalable, data-intensive and compute-intensive applications. @@ -58,12 +58,12 @@ automatically select a suitable partition depending on your memory and GPU requi **MPI jobs:** For MPI jobs typically allocates one core per task. Several nodes could be allocated if it is necessary. The batch system [Slurm](slurm.md) will automatically find suitable hardware. -Normal compute nodes are perfect for this task. **OpenMP jobs:** SMP-parallel applications can only run **within a node**, so it is necessary to include the [batch system](slurm.md) options `-N 1` and `-n 1`. Using `--cpus-per-task N` Slurm will start one task and you will have `N` CPUs. The maximum number of processors for an SMP-parallel -program is 896 on partition `julia`, see [partitions](partitions_and_limits.md). +program is 896 on partition `julia`, see [partitions](partitions_and_limits.md) (be aware that +the application has to be developed with that large number of threads in mind). Partitions with GPUs are best suited for **repetitive** and **highly-parallel** computing tasks. If you have a task with potential [data parallelism](../software/gpu_programming.md) most likely that @@ -71,7 +71,9 @@ you need the GPUs. Beyond video rendering, GPUs excel in tasks such as machine simulations and risk modeling. Use the partitions `gpu2` and `ml` only if you need GPUs! Otherwise using the x86-based partitions most likely would be more beneficial. -**Interactive jobs:** Slurm can forward your X11 credentials to the first node (or even all) for a job +**Interactive jobs:** An interactive job is the best choice for testing and development. See + [interactive-jobs](slurm.md). +Slurm can forward your X11 credentials to the first node (or even all) for a job with the `--x11` option. To use an interactive job you have to specify `-X` flag for the ssh login. ## Interactive vs. Batch Mode diff --git a/doc.zih.tu-dresden.de/docs/misc/HPC-Introduction.pdf b/doc.zih.tu-dresden.de/docs/misc/HPC-Introduction.pdf index 955c681758a3830afa88f053d7342809c9785b26..3d2e0beb8eb0ce083218e7fd28a7e2993bbea191 100644 Binary files a/doc.zih.tu-dresden.de/docs/misc/HPC-Introduction.pdf and b/doc.zih.tu-dresden.de/docs/misc/HPC-Introduction.pdf differ diff --git a/doc.zih.tu-dresden.de/docs/misc/migration2023.drawio b/doc.zih.tu-dresden.de/docs/misc/migration2023.drawio new file mode 100644 index 0000000000000000000000000000000000000000..9679fffb77fa898a7e01fb939c258ec991808b6a --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/misc/migration2023.drawio @@ -0,0 +1 @@ +<mxfile host="Electron" modified="2023-06-22T11:44:28.169Z" agent="5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/20.8.16 Chrome/106.0.5249.199 Electron/21.4.0 Safari/537.36" etag="Y2qdBmcmrf7lYsppJ_dS" version="20.8.16" type="device"><diagram name="Seite-1" id="-EQ8Qw6o7c4L1Kgc9zBM"></diagram></mxfile> \ No newline at end of file diff --git a/doc.zih.tu-dresden.de/docs/misc/migration2023.png b/doc.zih.tu-dresden.de/docs/misc/migration2023.png new file mode 100644 index 0000000000000000000000000000000000000000..6f74947fa72520a8d3273a2e58ef17aeb478d745 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/misc/migration2023.png differ diff --git a/doc.zih.tu-dresden.de/docs/software/cicd.md b/doc.zih.tu-dresden.de/docs/software/cicd.md new file mode 100644 index 0000000000000000000000000000000000000000..7294622a292acb18d586ccee3108332f6e555272 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/software/cicd.md @@ -0,0 +1,126 @@ +# CI/CD on HPC + +We provide a **GitLab Runner** that allows you to run a GitLab pipeline on the ZIH systems. With +that you can continuously build, test, and benchmark your HPC software in the target environment. + +## Requirements + +- You (and ideally every involved developer) need an [HPC-Login](../application/overview.md). +- You manage your source code in a repository at the [TU Chemnitz GitLab instance](https://gitlab.hrz.tu-chemnitz.de) + +## Setup process + +1. Open your repository in the browser. + +2. Hover *Settings* and then click on *CI/CD* + +  + { align=center } + +3. *Expand* the *Runners* section + +  + { align=center } + +4. Copy the *registration token* + +  + { align=center } + +5. Now, you can request the registration of your repository with the + [HPC-Support](../support/support.md). In the ticket, you need to add the URL of the GitLab + repository and the registration token. + +!!! warning + + At the moment, only repositories hosted at the TU Chemnitz GitLab are supported. + +## GitLab pipelines + +As the ZIH provides the CI/CD as an GitLab runner, you can run any pipeline already working on other +runners with the CI/CD at the ZIH systems. This also means, to configure the actual steps performed +once your pipeline runs, you need to define the `.gitlab-ci.yml` file in the root of your +repository. There is a [comprehensive +documentation](https://gitlab.hrz.tu-chemnitz.de/help/ci/index.md) and a [reference for the +`.gitlab-ci.yml` file](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index) available at every +GitLab instance. There's also a [quick start +guide](https://gitlab.hrz.tu-chemnitz.de/help/ci/quick_start/index.md). + +The main difference to other GitLab runner is that every pipeline jobs will be scheduled as an +individual HPC job on the ZIH systems. Therefore, an important aspect is the possibility to set +Slurm parameters. While scheduling jobs allows to run code directly on the target system, it also +means that a single pipeline has to wait for resource allocation. Hence, you want to restrict, +which commits will run the complete pipeline, or which commits only run a part of the pipeline. + +### Passing Slurm parameters + +You can pass Slurm parameters via the [`variables` +keyword](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index#variables), either globally for the +whole yaml file, or on a per-job base. + +Use the variable `SCHEDULER_PARAMETERS` and define the same parameters you would use for [`srun` or +`sbatch`](../jobs_and_resources/slurm.md). + +!!! warning + + The parameters `--job-name`, `--output`, and `--wait` are handled by the GitLab runner and must + not be used. If used, the run will fail. + +!!! tip + + Make sure to set the `--account` such that the allocation of HPC resources is accounted + correctly. + +!!! example + + The following YAML file defines a configuration section `.test-job`, and two jobs, + `test-job-haswell` and `test-job-ml`, extending from that. The two job share the + `before_script`, `script`, and `after_script` configuration, but differ in the + `SCHEDULER_PARAMETERS`. The `test-job-haswell` and `test-job-ml` are scheduled on the partition + `haswell` and partition `ml`, respectively. + + ``` yaml + .test-job: + before_script: + - date + - pwd + - hostname + script: + - date + - pwd + - hostname + after_script: + - date + - pwd + - hostname + + test-job-haswell: + extends: .test-job + variables: + SCHEDULER_PARAMETERS: -p haswell + + + test-job-ml: + extends: .test-job + variables: + SCHEDULER_PARAMETERS: -p ml + ``` + +## Current limitations + +- Every runner job is currently limited to **one hour**. Once this time limit passes, the runner job + gets canceled regardless of the requested runtime from Slurm. This time *includes* the waiting + time for HPC resources. + +## Pitfalls and Recommendations + +- While the [`before_script`](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index#before_script) + and [`script`](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index#script) array of commands are + executed on the allocated resources, the + [`after_script`](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index#after_script) runs on the + GitLab runner node. We recommend that you do not use `after_script`. + +- It is likely that all your runner jobs will be executed in a slightly different directory on the + shared filesystem. Some build systems, for example CMake, expect that the configure and build is + executed in the same directory. In this case, we recommend to use one job for configure and + build. diff --git a/doc.zih.tu-dresden.de/docs/software/containers.md b/doc.zih.tu-dresden.de/docs/software/containers.md index b93971011acbc7bf55eb790b00f57d15a16fc982..e40242e9a6531512965693e2a610c46e8eff02ef 100644 --- a/doc.zih.tu-dresden.de/docs/software/containers.md +++ b/doc.zih.tu-dresden.de/docs/software/containers.md @@ -165,12 +165,13 @@ https://github.com/singularityware/singularity/tree/master/examples. !!! hint As opposed to bootstrapping a container, importing from Docker does **not require root - privileges** and therefore works on ZIH systems directly. + privileges** and therefore works on ZIH systems directly. Please note, that the singularity commands + are only available on the compute nodes and not on the login nodes. You can import an image directly from the Docker repository (Docker Hub): ```console -marie@login$ singularity build my-container.sif docker://ubuntu:latest +marie@compute$ singularity build my-container.sif docker://ubuntu:latest ``` Creating a singularity container directly from a local docker image is possible but not @@ -247,7 +248,7 @@ There are some notable changes between Singularity definitions and Dockerfiles: A read-only shell can be entered as follows: ```console -marie@login$ singularity shell my-container.sif +marie@compute$ singularity shell my-container.sif ``` !!! note @@ -259,7 +260,7 @@ marie@login$ singularity shell my-container.sif automatically and instead set up your binds manually via `-B` parameter. Example: ```console - marie@login$ singularity shell --contain -B /scratch,/my/folder-on-host:/folder-in-container my-container.sif + marie@compute$ singularity shell --contain -B /scratch,/my/folder-on-host:/folder-in-container my-container.sif ``` You can write into those folders by default. If this is not desired, add an `:ro` for read-only to @@ -287,7 +288,7 @@ While the `shell` command can be useful for tests and setup, you can also launch inside the container directly using "exec": ```console -marie@login$ singularity exec my-container.sif /opt/myapplication/bin/run_myapp +marie@compute$ singularity exec my-container.sif /opt/myapplication/bin/run_myapp ``` This can be useful if you wish to create a wrapper script that transparently calls a containerized @@ -328,20 +329,20 @@ singularity build my-container.sif example.def Then you can run your application via ```console -marie@login$ singularity run my-container.sif first_arg 2nd_arg +marie@compute$ singularity run my-container.sif first_arg 2nd_arg ``` Alternatively you can execute the container directly which is equivalent: ```console -marie@login$ ./my-container.sif first_arg 2nd_arg +marie@compute$ ./my-container.sif first_arg 2nd_arg ``` With this you can even masquerade an application with a singularity container as if it was an actual program by naming the container just like the binary: ```console -marie@login$ mv my-container.sif myCoolAp +marie@compute$ mv my-container.sif myCoolAp ``` ### Use-Cases @@ -353,6 +354,6 @@ binary-distributed applications didn't work on that anymore. You can use one of 7 container images (`/scratch/singularity/centos7.img`) to circumvent this problem. Example: ```console -marie@login$ singularity exec /scratch/singularity/centos7.img ldd --version +marie@compute$ singularity exec /scratch/singularity/centos7.img ldd --version ldd (GNU libc) 2.17 ``` diff --git a/doc.zih.tu-dresden.de/docs/software/misc/menu12_en.png b/doc.zih.tu-dresden.de/docs/software/misc/menu12_en.png new file mode 100644 index 0000000000000000000000000000000000000000..a46d3b590299ef478d3afb077b45fe8697480570 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/menu12_en.png differ diff --git a/doc.zih.tu-dresden.de/docs/software/misc/menu3_en.png b/doc.zih.tu-dresden.de/docs/software/misc/menu3_en.png new file mode 100644 index 0000000000000000000000000000000000000000..54356995d0a898fe8673d7c6d13ce57b8749df00 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/menu3_en.png differ diff --git a/doc.zih.tu-dresden.de/docs/software/misc/menu4_en.png b/doc.zih.tu-dresden.de/docs/software/misc/menu4_en.png new file mode 100644 index 0000000000000000000000000000000000000000..caebd286dd951d5865e5baf16596877e1e31c1d4 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/menu4_en.png differ diff --git a/doc.zih.tu-dresden.de/docs/software/nanoscale_simulations.md b/doc.zih.tu-dresden.de/docs/software/nanoscale_simulations.md index d7a90eb4bf0af3fe417fe9e6c89d7c44a400be28..392124c49f16e5fc4c2dcb1774782d70180682c6 100644 --- a/doc.zih.tu-dresden.de/docs/software/nanoscale_simulations.md +++ b/doc.zih.tu-dresden.de/docs/software/nanoscale_simulations.md @@ -128,6 +128,8 @@ However hereafter we have an example on how that might look like for Gaussian: #SBATCH --ntasks=1 #SBATCH --constraint=fs_lustre_ssd #SBATCH --cpus-per-task=24 + #SBATCH --mem-per-cpu 2050 + # only 2050 MB RAM for haswell, as Gaussian somehow crashes if we try using the full 2541 of it # Load the software you need here module purge diff --git a/doc.zih.tu-dresden.de/docs/software/performance_engineering_overview.md b/doc.zih.tu-dresden.de/docs/software/performance_engineering_overview.md index 0807db9ac0b6474b9a8051639fbad6466a03df09..b8c79afa6c2ab2b224eed7361b84a1e8a57501ae 100644 --- a/doc.zih.tu-dresden.de/docs/software/performance_engineering_overview.md +++ b/doc.zih.tu-dresden.de/docs/software/performance_engineering_overview.md @@ -44,7 +44,6 @@ software performance engineering or application performance engineering within s | [Perf](#perf-tools) | Produce and visualize [profile](#profile) | easy | medium | low | (no)[^2] | | [PIKA](#pika) | Show performance [profile](#profile) and [trace](#trace) | very easy | low | very low | no | | [Score-P](#score-p) | Create performance [trace](#trace) | complex | high | variable | yes | -| [Slurm](#slurm-profiler) | Produce and visualize simple [trace](#trace)| easy | low | low | no | | [Vampir](#vampir) | Visualize performance [trace](#trace) | complex | high | n.a. | n.a. | [^2]: Re-compilation is not required. Yet, to obtain more details it is recommended to re-compile with the `-g` compiler option, which adds debugging information to the executable of an application. @@ -248,19 +247,6 @@ Many raw data sources are supported by Score-P. It requires some time, training, and practice to fully benefit from the tool's features. See [Score-P](scorep.md) for further details. -### Slurm Profiler - -!!! hint "Easy to use performance visualization of entire batch jobs" - -The [Slurm Profiler](../jobs_and_resources/slurm_profiling.md) gathers performance data from every -task/node of a given [batch job](../jobs_and_resources/slurm.md). -It records a coarse-grained [trace](#trace) for subsequent analysis. -[Instrumentation](#instrumentation) of the applications under test is not needed. -The data analysis of the given set of system metrics needs to be initiated by the user with a -command line interface. -The resulting performance metrics are accessible in a simple graphical front-end that provides -time/performance graphs. - ### Vampir !!! hint "Complex and powerful performance data visualization of parallel applications" diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index 3e424e9894e8ee37e91de55edfd3ecc4d679a239..48a697afe5fede6bdadb7e4d033081e3d5a9f4bb 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -54,6 +54,7 @@ nav: - Singularity for Power9 Architecture: software/singularity_power9.md - Virtual Machines: software/virtual_machines.md - GPU-accelerated Containers for Deep Learning (NGC Containers): software/ngc_containers.md + - CI/CD: software/cicd.md - External Licenses: software/licenses.md - Computational Fluid Dynamics (CFD): software/cfd.md - Mathematics Applications: software/mathematics.md @@ -92,7 +93,6 @@ nav: - Produce Performance Overview with Perf: software/perf_tools.md - Track Slurm Jobs with PIKA: software/pika.md - Record Course of Events with Score-P: software/scorep.md - - Profile Jobs with Slurm: jobs_and_resources/slurm_profiling.md - Study Course of Events with Vampir: software/vampir.md - Measure Energy Consumption: software/energy_measurement.md - Compare System Performance with SPEChpc: software/spec.md @@ -101,6 +101,11 @@ nav: - Overview: jobs_and_resources/overview.md - HPC Resources: - Overview: jobs_and_resources/hardware_overview.md + - New Systems 2023: + - Architectural Re-Design 2023: jobs_and_resources/architecture_2023.md + - Overview 2023: jobs_and_resources/hardware_overview_2023.md + - Migration 2023: jobs_and_resources/migration_2023.md + - Tests 2023: jobs_and_resources/barnard_test.md - AMD Rome Nodes: jobs_and_resources/rome_nodes.md - NVMe Storage: jobs_and_resources/nvme_storage.md - Alpha Centauri: jobs_and_resources/alpha_centauri.md @@ -112,7 +117,6 @@ nav: - Partitions and Limits: jobs_and_resources/partitions_and_limits.md - Slurm Job File Generator: jobs_and_resources/slurm_generator.md - Checkpoint/Restart: jobs_and_resources/checkpoint_restart.md - - Job Profiling: jobs_and_resources/slurm_profiling.md - Binding and Distribution of Tasks: jobs_and_resources/binding_and_distribution_of_tasks.md - User Support: support/support.md - Archive: @@ -125,7 +129,9 @@ nav: - Platform LSF: archive/platform_lsf.md - BeeGFS Filesystem on Demand: archive/beegfs_on_demand.md - Jupyter Installation: archive/install_jupyter.md + - Profile Jobs with Slurm: archive/slurm_profiling.md - Switched-Off Systems: + - Overview 2022: archive/hardware_overview_2022.md - Overview: archive/systems_switched_off.md - Migration From Deimos to Atlas: archive/migrate_to_atlas.md - System Altix: archive/system_altix.md