Skip to content
Snippets Groups Projects
Commit 27fcb953 authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Update

parent 5e78b7c5
No related branches found
No related tags found
2 merge requests!920Automated merge from preview to main,!918Barnard cleanup
......@@ -31,8 +31,8 @@ Please also find out the other ways you could contribute in our
## News
* **2023-11-06** [Substantial update on "How-To: Migration to Barnard](jobs_and_resources/migration_to_barnard.md)
* **2023-10-16** [Open MPI 4.1.x - Workaround for MPI-IO Performance Loss](jobs_and_resources/mpi_issues/#performance-loss-with-mpi-io-module-ompio)
* **2023-10-04** [User tests on Barnard](jobs_and_resources/barnard_test.md)
* **2023-06-01** [New hardware and complete re-design](jobs_and_resources/architecture_2023.md)
* **2023-01-04** [New hardware: NVIDIA Arm HPC Developer Kit](jobs_and_resources/arm_hpc_devkit.md)
......
# Architectural Re-Design 2023
With the replacement of the Taurus system by the cluster [Barnard](hardware_overview_2023.md#barnard-intel-sapphire-rapids-cpus)
in 2023, the rest of the installed hardware had to be re-connected, both with
InfiniBand and with Ethernet.
Over the last decade we have been running our HPC system of high heterogeneity with a single
Slurm batch system. This made things very complicated, especially to inexperienced users.
With the replacement of the Taurus system by the cluster
[Barnard](hardware_overview_2023.md#barnard-intel-sapphire-rapids-cpus)
we **now create homogeneous clusters with their own Slurm instances and with cluster specific login
nodes** running on the same CPU. Job submission will be possible only from within the cluster
(compute or login node).
All clusters will be integrated to the new InfiniBand fabric and have then the same access to
the shared filesystems. This recabling will require a brief downtime of a few days.
![Architecture overview 2023](../jobs_and_resources/misc/architecture_2023.png)
{: align=center}
......@@ -54,5 +61,11 @@ storages.
## Migration Phase
For about one month, the new cluster Barnard, and the old cluster Taurus
will run side-by-side - both with their respective filesystems. You can find a comprehensive
[description of the migration phase here](migration_2023.md).
will run side-by-side - both with their respective filesystems. We provide a comprehensive
[description of the migration to Barnard](migration_to_barnard.md).
The follwing figure provides a graphical overview of the overall process (red: user action
required):
![Migration timeline 2023](../jobs_and_resources/misc/migration_2023.png)
{: align=center}
# Migration 2023
## Brief Overview over Coming Changes
All components of Taurus will be dismantled step by step.
### New Hardware
The new HPC system [Barnard](hardware_overview_2023.md#barnard-intel-sapphire-rapids-cpus) from Bull
comes with these main properties:
* 630 compute nodes based on Intel Sapphire Rapids
* new Lustre-based storage systems
* HDR InfiniBand network large enough to integrate existing and near-future non-Bull hardware
* To help our users to find the best location for their data we now use the name of
animals (size, speed) as mnemonics.
### New Architecture
Over the last decade we have been running our HPC system of high heterogeneity with a single
Slurm batch system. This made things very complicated, especially to inexperienced users.
To lower this hurdle we **now create homogeneous clusters with their own Slurm instances and with
cluster specific login nodes** running on the same CPU. Job submission is possible only
from within the cluster (compute or login node).
All clusters will be integrated to the new InfiniBand fabric and have then the same access to
the shared filesystems. This recabling requires a brief downtime of a few days.
Please refer to the overview page [Architectural Re-Design 2023](architecture_2023.md)
for details on the new architecture.
### New Software
The new nodes run on Linux RHEL 8.7. For a seamless integration of other compute hardware,
all operating system will be updated to the same versions of operating system, Mellanox and Lustre
drivers. With this all application software was re-built consequently using Git and CI/CD pipelines
for handling the multitude of versions.
We start with `release/23.10` which is based on software requests from user feedbacks of our
HPC users. Most major software versions exist on all hardware platforms.
## Migration Path
Please make sure to have read the details on the [Architectural Re-Design 2023](architecture_2023.md)
before further reading.
!!! note
The migration can only be successful as a joint effort of HPC team and users.
Here is a description of the action items.
|When?|TODO ZIH |TODO users |Remark |
|---|---|---|---|
| done (May 2023) |first sync `/scratch` to `/data/horse/old_scratch2`| |copied 4 PB in about 3 weeks|
| done (June 2023) |enable access to Barnard| |initialized LDAP tree with Taurus users|
| done (July 2023) | |install new software stack|tedious work |
| ASAP | |adapt scripts|new Slurm version, new resources, no partitions|
| August 2023 | |test new software stack on Barnard|new versions sometimes require different prerequisites|
| August 2023| |test new software stack on other clusters|a few nodes will be made available with the new software stack, but with the old filesystems|
| ASAP | |prepare data migration|The small filesystems `/beegfs` and `/lustre/ssd`, and `/home` are mounted on the old systems "until the end". They will *not* be migrated to the new system.|
| July 2023 | sync `/warm_archive` to new hardware| |using datamover nodes with Slurm jobs |
| September 2023 |prepare re-cabling of older hardware (Bull)| |integrate other clusters in the IB infrastructure |
| Autumn 2023 |finalize integration of other clusters (Bull)| |**~2 days downtime**, final rsync and migration of `/projects`, `/warm_archive`|
| Autumn 2023 ||transfer last data from old filesystems | `/beegfs`, `/lustre/scratch`, `/lustre/ssd` are no longer available on the new systems|
### Data Migration
Why do users need to copy their data? Why only some? How to do it best?
* The sync of hundreds of terabytes can only be done planned and carefully.
(`/scratch`, `/warm_archive`, `/projects`). The HPC team will use multiple syncs
to not forget the last bytes. During the downtime, `/projects` will be migrated.
* User homes (`/home`) are relatively small and can be copied by the scientists.
Keeping in mind that maybe deleting and archiving is a better choice.
* For this, datamover nodes are available to run transfer jobs under Slurm. Please refer to the
section [Transfer Data to New Home Directory](../barnard_test#transfer-data-to-new-home-directory)
for more detailed instructions.
### A Graphical Overview
(red: user action required):
![Migration timeline 2023](../jobs_and_resources/misc/migration_2023.png)
{: align=center}
# How-To: Migration to Barnard
All HPC users are cordially invited to migrate to our new HPC system **Barnard** and prepare your
software and workflows for production there.
!!! note "Migration Phase"
Please make sure to have read the details on the overall
[Architectural Re-Design 2023](architecture_2023.md#migration-phase) before further reading.
The migration from Taurus to Barnard comprises the following steps:
* [Prepare login to Barnard](#login-to-barnard)
* [Data management and data transfer to new filesystems](#data-management-and-data-transfer)
* [Update job scripts and workflow to new software](#software)
* [Update job scripts and workflow w.r.t. Slurm](#slurm)
!!! note
We highly recommand to first read the entire page carefully, and then execute the steps.
The migration can only be successful as a joint effort of HPC team and users.
We value your feedback. Please provide it directly via our ticket system. For better processing,
please add "Barnard:" as a prefix to the subject of the [support ticket](../support/support.md).
## Login to Barnard
!!! hint
All users and projects from Taurus now can work on Barnard.
You use `login[1-4].barnard.hpc.tu-dresden.de` to access the system
from campus (or VPN). In order to verify the SSH fingerprints of the login nodes, please refer to
the page [Fingerprints](/access/key_fingerprints/#barnard).
All users have **new empty HOME** file systems, this means you have first to ...
??? "... install your public SSH key on Barnard"
- Please create a new SSH keypair with ed25519 encryption, secured with
a passphrase. Please refer to this
[page for instructions](../../access/ssh_login#before-your-first-connection).
- After login, add the public key to your `.ssh/authorized_keys` file on Barnard.
## Data Management and Data Transfer
!!! warning
All data in the `/home` directory or in workspaces on the BeeGFS or Lustre SSD file
systems will be deleted by the end of 2023, since these filesystems will be decommissioned.
Existing Taurus users that would like to keep some of their data need to copy them to the new system
manually, using the [steps described below](#data-management-and-data-transfer).
### Filesystems on Barnard
Our new HPC system Barnard also comes with **two new Lustre filesystems**, namely `/data/horse` and
`/data/walrus`. Both have a capacity of 20 PB, but differ in performance and intended usage, see
below. In order to support the data life cycle management, the well-known
[workspace concept](#workspaces-on-barnard) is applied.
* The `/project` filesystem is the same on Taurus and Barnard
(mounted read-only on the compute nodes).
* The new work filesystem is `/data/horse`.
* The slower `/data/walrus` can be considered as a substitute for the old
`/warm_archive`- mounted **read-only** on the compute nodes.
It can be used to store e.g. results.
!!! Warning
All old filesystems, i.e., `ssd`, `beegfs`, and `scratch`, will be shutdown by the end of 2023.
To work with your data from Taurus you might have to move/copy them to the new storages.
Please, carefully read the following documentation and instructions.
### Workspaces on Barnard
The filesystems `/data/horse` and `/data/walrus` can only be accessed via workspaces. Please refer
to the [workspace page](../../data_lifecycle/workspaces/), if you are not familiar with the
workspace concept and the corresponding commands. The following table provides the settings for
workspaces on these two filesystems.
| Filesystem (use with parameter `--filesystem=<filesystem>`) | Max. Duration in Days | Extensions | Keeptime in Days |
|:-------------------------------------|---------------:|-----------:|--------:|
| `/data/horse` (default) | 100 | 10 | 30 |
| `/data/walrus` | 365 | 2 | 60 |
{: summary="Settings for Workspace Filesystem `/data/horse` and `/data/walrus`."}
### Data Migration to New Filesystems
Since all old filesystems of Taurus will be shutdown by the end of 2023, your data needs to be
migrated to the new filesystems on Barnard. This migration comprises
* your personal `/home` directory,
* your workspaces on `/ssd`, `/beegfs` and `/scratch`.
!!! note "It's your turn"
**You are responsible for the migration of your data**. With the shutdown of the old
filesystems, all data will be deleted.
!!! note "Make a plan"
We highly recommand to **take some minutes for planing the transfer process**. Do not act with
precipitation.
Please **do not copy your entire data** from the old to the new filesystems, but consider this
opportunity for **cleaning up your data**. E.g., it might make sense to delete outdated scripts,
old log files, etc., and move other files, e.g., results, to the `/data/walrus` filesystem.
!!! hint "Generic login"
In the following we will use the generic login `marie` and workspace `numbercrunch`
([cf. content rules on generic names](../contrib/content_rules.md#data-privacy-and-generic-names)).
**Please make sure to replace it with your personal login.**
We have four new [datamover nodes](/data_transfer/datamover) that have mounted all storages
of the old Taurus and new Barnard system. Do not use the datamovers from Taurus, i.e., all data
transfer need to be invoked from Barnard! Thus, the very first step is to
[login to Barnard](#login-to-barnard).
The command `dtinfo` will provide you the mountpoints of the old filesystems
```console
marie@barnard$ dtinfo
[...]
directory on datamover mounting clusters directory on cluster
/data/old/home Taurus /home
/data/old/lustre/scratch2 Taurus /scratch
/data/old/lustre/ssd Taurus /lustre/ssd
[...]
```
In the following, we will provide instructions with comprehensive examples for the data transfer of
your data to the new `/home` filesystem, as well as the working filesystems `/data/horse` and
`/data/walrus`.
??? "Migration of Your Home Directory"
Your personal (old) home directory at Taurus will not be automatically transferred to the new
Barnard system. Please do not copy your entire home, but clean up your data. E.g., it might
make sense to delete outdated scripts, old log files, etc., and move other files to an archive
filesystem. Thus, please transfer only selected directories and files that you need on the new
system.
The steps are as follows:
1. Login to Barnard, i.e.,
```
ssh login[1-4].barnard.tu-dresden.de
```
1. The command `dtinfo` will provide you the mountpoint
```console
marie@barnard$ dtinfo
[...]
directory on datamover mounting clusters directory on cluster
/data/old/home Taurus /home
[...]
```
1. Use the `dtls` command to list your files on the old home directory
```
marie@barnard$ dtls /data/old/home/marie
[...]
```
1. Use the `dtcp` command to invoke a transfer job, e.g.,
```console
marie@barnard$ dtcp --recursive /data/old/home/marie/<useful data> /home/marie/
```
**Note**, please adopt the source and target paths to your needs. All available options can be
queried via `dtinfo --help`.
!!! warning
Please be aware that there is **no synchronisation process** between your home directories
at Taurus and Barnard. Thus, after the very first transfer, they will become divergent.
Please follow this instructions for transferring you data from `ssd`, `beegfs` and `scratch` to the
new filesystems. The instructions and examples are divided by the target not the source filesystem.
This migration task requires a preliminary step: You need to allocate workspaces on the
target filesystems.
??? Note "Preliminary Step: Allocate a workspace"
Both `/data/horse/` and `/data/walrus` can only be used with
[workspaces](../data_lifecycle/workspaces.md). Before you invoke any data transer from the old
working filesystems to the new ones, you need to allocate a workspace first.
The command `ws_list -l` lists the available and the default filesystem for workspaces.
```
marie@barnard$ ws_list --list
available filesystems:
horse (default)
walrus
```
As you can see, `/data/horse` is the default workspace filesystem at Barnard. I.e., if you
want to allocate, extend or release a workspace on `/data/walrus`, you need to pass the
option `--filesystem=walrus` explicitly to the corresponding workspace commands. Please
refer to our [workspace documentation](../data_lifecycle/workspaces.md), if you need refresh
your knowledge.
The most simple command to allocate a workspace is as follows
```
marie@barnard$ ws_allocate numbercrunch 90
```
Please refer to the table holding the settings
(cf. [subection workspaces on Barnard](#workspaces-on-barnard)) for the max. duration and
`ws_allocate --help` for all available options.
??? "Migration to work filesystem `/data/horse`"
=== "Source: old `/scratch`"
If you transfer data from the old `/scratch` to `/data/horse`, it is sufficient to use
`dtmv` instead of `dtcp` since this data has already been copied to a special directory on
the new `horse` filesystem. Thus, you just need to move it to the right place (the Lustre
metadata system will update the correspoding entries).
```console
marie@barnard$ dtmv /data/horse/lustre/scratch2/0/marie-numbercrunch /data/horse/ws/marie-numbercrunch
```
=== "Source: old `/ssd`"
The old `ssd` filesystem is mounted at `/data/old/lustre/ssd` on the datamover nodes and the
workspaces are within the subdirectory `ws/`. A corresponding data transfer using `dtcopy`
looks like
```console
marie@barnard$ dtcp --recursive /data/old/lustre/ssd/ws/marie-numbercrunch /data/horse/ws/marie-numbercrunch
```
=== "Source: old `/beegfs`"
The old `beegfs` filesystem is mounted at `/data/old/beegfs` on the datamover nodes and the
workspaces are within the subdirectories `ws/0` and `ws/1`, respectively. A corresponding
data transfer using `dtcp` looks like
```console
marie@barnard$ dtcp --recursive /data/old/beegfs/ws/0/marie-numbercrunch /data/horse/ws/marie-numbercrunch
```
??? "Migration to `/data/walrus`"
=== "Source: old `/scratch`"
The old `scratch` filesystem is mounted at `/data/old/lustre/scratch2` on the datamover
nodes and the workspaces are within the subdirectories `ws/0` and `ws/1`, respectively. A
corresponding data transfer using `dtcopy` looks like
```console
marie@barnard$ dtcp --recursive /data/old/lustre/scratch2/ws/0/marie-numbercrunch /data/walrus/ws/marie-numbercrunch
```
=== "Source: old `/ssd`"
The old `ssd` filesystem is mounted at `/data/old/lustre/ssd` on the datamover nodes and the
workspaces are within the subdirectory `ws/`. A corresponding data transfer using `dtcopy`
looks like
```console
marie@barnard$ dtcp --recursive /data/old/lustre/ssd/ /data/walrus/ws/marie-numbercrunch
```
=== "Source: old `/beegfs`"
The old `beegfs` filesystem is mounted at `/data/old/beegfs` on the datamover nodes and the
workspaces are within the subdirectories `ws/0` and `ws/1`, respectively. A corresponding
data transfer using `dtcopy` looks like
```console
marie@barnard$ dtcp --recursive /data/old/beegfs/ws/0/marie-numbercrunch /data/walrus/ws/marie-numbercrunch
```
??? "Migration from `/lustre/ssd` or `/beegfs`"
**You** are entirely responsible for the transfer of these data to the new location.
Start the dtrsync process as soon as possible. (And maybe repeat it at a later time.)
??? "Migration from `/lustre/scratch2` aka `/scratch`"
We are synchronizing this (**last: October 18**) to `/data/horse/lustre/scratch2/`.
Please do **NOT** copy those data yourself. Instead check if it is already sychronized
to `/data/walrus/warm_archive/ws`.
In case you need to update this (Gigabytes, not Terabytes!) please run `dtrsync` like in
`dtrsync -a /data/old/lustre/scratch2/ws/0/my-workspace/newest/ /data/horse/lustre/scratch2/ws/0/my-workspace/newest/`
??? "Migration from `/warm_archive`"
The process of syncing data from `/warm_archive` to `/data/walrus/warm_archive` is still ongoing.
Please do **NOT** copy those data yourself. Instead check if it is already sychronized
to `/data/walrus/warm_archive/ws`.
In case you need to update this (Gigabytes, not Terabytes!) please run `dtrsync` like in
`dtrsync -a /data/old/warm_archive/ws/my-workspace/newest/ /data/walrus/warm_archive/ws/my-workspace/newest/`
When the last compute system will have been migrated the old file systems will be
set write-protected and we start a final synchronization (sratch+walrus).
The target directories for synchronization `/data/horse/lustre/scratch2/ws` and
`/data/walrus/warm_archive/ws/` will not be deleted automatically in the mean time.
## Software
Please use `module spider` to identify the software modules you need to load.
The default release version is 23.10.
The new nodes run on Linux RHEL 8.7. For a seamless integration of other compute hardware,
all operating system will be updated to the same versions of operating system, Mellanox and Lustre
drivers. With this all application software was re-built consequently using Git and CI/CD pipelines
for handling the multitude of versions.
We start with `release/23.10` which is based on software requests from user feedbacks of our
HPC users. Most major software versions exist on all hardware platforms.
## Slurm
* We are running the most recent Slurm version.
* You must not use the old partition names.
* Not all things are tested.
## Updates after your feedback (state: October 19)
* A **second synchronization** from `/scratch` has started on **October, 18**, and is
now nearly done.
* A first, and incomplete synchronization from `/warm_archive` has been done (see above).
With support from NEC we are transferring the rest in the next weeks.
* The **data transfer tools** now work fine.
* After fixing too tight security restrictions, **all users can login** now.
* **ANSYS** now starts: please check if your specific use case works.
* **login1** is under construction, do not use it at the moment. Workspace creation does
not work there.
......@@ -103,9 +103,8 @@ nav:
- Overview: jobs_and_resources/hardware_overview.md
- New Systems 2023:
- Architectural Re-Design 2023: jobs_and_resources/architecture_2023.md
- Overview 2023: jobs_and_resources/hardware_overview_2023.md
- Migration 2023: jobs_and_resources/migration_2023.md
- "How-To: Migration to Barnard": jobs_and_resources/barnard_test.md
- HPC Resources Overview 2023: jobs_and_resources/hardware_overview_2023.md
- "How-To: Migration to Barnard": jobs_and_resources/migration_to_barnard.md
- AMD Rome Nodes: jobs_and_resources/rome_nodes.md
- NVMe Storage: jobs_and_resources/nvme_storage.md
- Alpha Centauri: jobs_and_resources/alpha_centauri.md
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment