Skip to content
Snippets Groups Projects
Commit d180f03a authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Spell: Make it InfiniBand (capital B)

parent f288c57e
No related branches found
No related tags found
2 merge requests!919Automated merge from preview to main,!915Spell: Make it InfiniBand (capital B)
Showing
with 16 additions and 16 deletions
...@@ -20,7 +20,7 @@ search: ...@@ -20,7 +20,7 @@ search:
At the moment when parts of the IB stop we will start batch system plugins to parse for this batch At the moment when parts of the IB stop we will start batch system plugins to parse for this batch
system option: `--comment=NO_IB`. Jobs with this option set can run on nodes without system option: `--comment=NO_IB`. Jobs with this option set can run on nodes without
Infiniband access if (and only if) they have set the `--tmp`-option as well: InfiniBand access if (and only if) they have set the `--tmp`-option as well:
*From the Slurm documentation:* *From the Slurm documentation:*
......
...@@ -17,7 +17,7 @@ Here are the major changes from the user's perspective: ...@@ -17,7 +17,7 @@ Here are the major changes from the user's perspective:
| Red Hat Enterprise Linux (RHEL) | 6.x | 7.x | | Red Hat Enterprise Linux (RHEL) | 6.x | 7.x |
| Linux kernel | 2.26 | 3.10 | | Linux kernel | 2.26 | 3.10 |
| glibc | 2.12 | 2.17 | | glibc | 2.12 | 2.17 |
| Infiniband stack | OpenIB | Mellanox | | InfiniBand stack | OpenIB | Mellanox |
| Lustre client | 2.5 | 2.10 | | Lustre client | 2.5 | 2.10 |
## Host Keys ## Host Keys
......
...@@ -14,7 +14,7 @@ The following data can be gathered: ...@@ -14,7 +14,7 @@ The following data can be gathered:
* Task data, such as CPU frequency, CPU utilization, memory consumption (RSS and VMSize), I/O * Task data, such as CPU frequency, CPU utilization, memory consumption (RSS and VMSize), I/O
* Energy consumption of the nodes * Energy consumption of the nodes
* Infiniband data (currently deactivated) * InfiniBand data (currently deactivated)
* Lustre filesystem data (currently deactivated) * Lustre filesystem data (currently deactivated)
The data is sampled at a fixed rate (i.e. every 5 seconds) and is stored in a HDF5 file. The data is sampled at a fixed rate (i.e. every 5 seconds) and is stored in a HDF5 file.
......
...@@ -32,7 +32,7 @@ node has 180 GB local disk space for scratch mounted on `/tmp`. The jobs for the ...@@ -32,7 +32,7 @@ node has 180 GB local disk space for scratch mounted on `/tmp`. The jobs for the
scheduled by the [Platform LSF](platform_lsf.md) batch system from the login nodes scheduled by the [Platform LSF](platform_lsf.md) batch system from the login nodes
`atlas.hrsk.tu-dresden.de` . `atlas.hrsk.tu-dresden.de` .
A QDR Infiniband interconnect provides the communication and I/O infrastructure for low latency / A QDR InfiniBand interconnect provides the communication and I/O infrastructure for low latency /
high throughput data traffic. high throughput data traffic.
Users with a login on the [SGI Altix](system_altix.md) can access their home directory via NFS Users with a login on the [SGI Altix](system_altix.md) can access their home directory via NFS
......
...@@ -29,7 +29,7 @@ mounted on `/tmp`. The jobs for the compute nodes are scheduled by the ...@@ -29,7 +29,7 @@ mounted on `/tmp`. The jobs for the compute nodes are scheduled by the
[Platform LSF](platform_lsf.md) [Platform LSF](platform_lsf.md)
batch system from the login nodes `deimos.hrsk.tu-dresden.de` . batch system from the login nodes `deimos.hrsk.tu-dresden.de` .
Two separate Infiniband networks (10 Gb/s) with low cascading switches provide the communication and Two separate InfiniBand networks (10 Gb/s) with low cascading switches provide the communication and
I/O infrastructure for low latency / high throughput data traffic. An additional gigabit Ethernet I/O infrastructure for low latency / high throughput data traffic. An additional gigabit Ethernet
network is used for control and service purposes. network is used for control and service purposes.
......
...@@ -25,7 +25,7 @@ All nodes share a 4.4 TB SAN. Each node has additional local disk space mounted ...@@ -25,7 +25,7 @@ All nodes share a 4.4 TB SAN. Each node has additional local disk space mounted
jobs for the compute nodes are scheduled by a [Platform LSF](platform_lsf.md) batch system running on jobs for the compute nodes are scheduled by a [Platform LSF](platform_lsf.md) batch system running on
the login node `phobos.hrsk.tu-dresden.de`. the login node `phobos.hrsk.tu-dresden.de`.
Two separate Infiniband networks (10 Gb/s) with low cascading switches provide the infrastructure Two separate InfiniBand networks (10 Gb/s) with low cascading switches provide the infrastructure
for low latency / high throughput data traffic. An additional GB/Ethernetwork is used for control for low latency / high throughput data traffic. An additional GB/Ethernetwork is used for control
and service purposes. and service purposes.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
With the replacement of the Taurus system by the cluster `Barnard` in 2023, With the replacement of the Taurus system by the cluster `Barnard` in 2023,
the rest of the installed hardware had to be re-connected, both with the rest of the installed hardware had to be re-connected, both with
Infiniband and with Ethernet. InfiniBand and with Ethernet.
![Architecture overview 2023](../jobs_and_resources/misc/architecture_2023.png) ![Architecture overview 2023](../jobs_and_resources/misc/architecture_2023.png)
{: align=center} {: align=center}
......
...@@ -12,7 +12,7 @@ This Arm HPC Developer kit offers: ...@@ -12,7 +12,7 @@ This Arm HPC Developer kit offers:
* 512G DDR4 memory (8x 64G) * 512G DDR4 memory (8x 64G)
* 6TB SAS/ SATA 3.5″ * 6TB SAS/ SATA 3.5″
* 2x NVIDIA A100 GPU * 2x NVIDIA A100 GPU
* 2x NVIDIA BlueField-2 E-Series DPU: 200GbE/HDR single-port, both connected to the Infiniband network * 2x NVIDIA BlueField-2 E-Series DPU: 200GbE/HDR single-port, both connected to the InfiniBand network
## Further Information ## Further Information
......
...@@ -10,7 +10,7 @@ The new HPC system "Barnard" from Bull comes with these main properties: ...@@ -10,7 +10,7 @@ The new HPC system "Barnard" from Bull comes with these main properties:
* 630 compute nodes based on Intel Sapphire Rapids * 630 compute nodes based on Intel Sapphire Rapids
* new Lustre-based storage systems * new Lustre-based storage systems
* HDR Infiniband network large enough to integrate existing and near-future non-Bull hardware * HDR InfiniBand network large enough to integrate existing and near-future non-Bull hardware
* To help our users to find the best location for their data we now use the name of * To help our users to find the best location for their data we now use the name of
animals (size, speed) as mnemonics. animals (size, speed) as mnemonics.
...@@ -24,7 +24,7 @@ To lower this hurdle we now create homogenous clusters with their own Slurm inst ...@@ -24,7 +24,7 @@ To lower this hurdle we now create homogenous clusters with their own Slurm inst
cluster specific login nodes running on the same CPU. Job submission is possible only cluster specific login nodes running on the same CPU. Job submission is possible only
from within the cluster (compute or login node). from within the cluster (compute or login node).
All clusters will be integrated to the new Infiniband fabric and have then the same access to All clusters will be integrated to the new InfiniBand fabric and have then the same access to
the shared filesystems. This recabling requires a brief downtime of a few days. the shared filesystems. This recabling requires a brief downtime of a few days.
[Details on architecture](/jobs_and_resources/architecture_2023). [Details on architecture](/jobs_and_resources/architecture_2023).
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
- 8x Intel NVMe Datacenter SSD P4610, 3.2 TB - 8x Intel NVMe Datacenter SSD P4610, 3.2 TB
- 3.2 GB/s (8x 3.2 =25.6 GB/s) - 3.2 GB/s (8x 3.2 =25.6 GB/s)
- 2 Infiniband EDR links, Mellanox MT27800, ConnectX-5, PCIe x16, 100 - 2 InfiniBand EDR links, Mellanox MT27800, ConnectX-5, PCIe x16, 100
Gbit/s Gbit/s
- 2 sockets Intel Xeon E5-2620 v4 (16 cores, 2.10GHz) - 2 sockets Intel Xeon E5-2620 v4 (16 cores, 2.10GHz)
- 64 GB RAM - 64 GB RAM
......
...@@ -102,6 +102,6 @@ case on Rome. You might want to try `-mavx2 -fma` instead. ...@@ -102,6 +102,6 @@ case on Rome. You might want to try `-mavx2 -fma` instead.
### Intel MPI ### Intel MPI
We have seen only half the theoretical peak bandwidth via Infiniband between two nodes, whereas We have seen only half the theoretical peak bandwidth via InfiniBand between two nodes, whereas
Open MPI got close to the peak bandwidth, so you might want to avoid using Intel MPI on partition Open MPI got close to the peak bandwidth, so you might want to avoid using Intel MPI on partition
`rome` if your application heavily relies on MPI communication until this issue is resolved. `rome` if your application heavily relies on MPI communication until this issue is resolved.
...@@ -22,7 +22,7 @@ project's quota can be increased or dedicated volumes of up to the full capacity ...@@ -22,7 +22,7 @@ project's quota can be increased or dedicated volumes of up to the full capacity
- Granularity should be a socket (28 cores) - Granularity should be a socket (28 cores)
- Can be used for OpenMP applications with large memory demands - Can be used for OpenMP applications with large memory demands
- To use Open MPI it is necessary to export the following environment - To use Open MPI it is necessary to export the following environment
variables, so that Open MPI uses shared-memory instead of Infiniband variables, so that Open MPI uses shared-memory instead of InfiniBand
for message transport: for message transport:
``` ```
...@@ -31,4 +31,4 @@ project's quota can be increased or dedicated volumes of up to the full capacity ...@@ -31,4 +31,4 @@ project's quota can be increased or dedicated volumes of up to the full capacity
``` ```
- Use `I_MPI_FABRICS=shm` so that Intel MPI doesn't even consider - Use `I_MPI_FABRICS=shm` so that Intel MPI doesn't even consider
using Infiniband devices itself, but only shared-memory instead using InfiniBand devices itself, but only shared-memory instead
...@@ -128,7 +128,7 @@ marie@login$ srun -n 4 --ntasks-per-node 2 --time=00:10:00 singularity exec ubun ...@@ -128,7 +128,7 @@ marie@login$ srun -n 4 --ntasks-per-node 2 --time=00:10:00 singularity exec ubun
* Chosen CUDA version depends on installed driver of host * Chosen CUDA version depends on installed driver of host
* Open MPI needs PMI for Slurm integration * Open MPI needs PMI for Slurm integration
* Open MPI needs CUDA for GPU copy-support * Open MPI needs CUDA for GPU copy-support
* Open MPI needs `ibverbs` library for Infiniband * Open MPI needs `ibverbs` library for InfiniBand
* `openmpi-mca-params.conf` required to avoid warnings on fork (OK on ZIH systems) * `openmpi-mca-params.conf` required to avoid warnings on fork (OK on ZIH systems)
* Environment variables `SLURM_VERSION` and `OPENMPI_VERSION` can be set to choose different * Environment variables `SLURM_VERSION` and `OPENMPI_VERSION` can be set to choose different
version when building the container version when building the container
......
...@@ -175,7 +175,7 @@ iDataPlex ...@@ -175,7 +175,7 @@ iDataPlex
ifort ifort
ImageNet ImageNet
img img
Infiniband InfiniBand
InfluxDB InfluxDB
init init
inode inode
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment