diff --git a/doc.zih.tu-dresden.de/docs/archive/filesystems.md b/doc.zih.tu-dresden.de/docs/archive/filesystems.md index dfceff411489113c04b179f954f688c07cf26fb4..089e1fa75b10fea0f438bee8ab27925b926246ff 100644 --- a/doc.zih.tu-dresden.de/docs/archive/filesystems.md +++ b/doc.zih.tu-dresden.de/docs/archive/filesystems.md @@ -57,3 +57,45 @@ set accordingly. !!! hint For certain projects, specific filesystems are provided. For those, additional features are available, like `fs_beegfs_<projectname>`. + +## Warm Archive + +!!! danger "Warm Archive is End of Life" + + The `warm_archive` storage system will be decommissioned for good together with the Taurus + system (end of 2023). Thus, please **do not use** `warm_archive` any longer and **migrate you + data from** `warm_archive` to the new filesystems. We provide a quite comprehensive + documentation on the + [data migration process to the new filesystems](../jobs_and_resources/barnard.md#data-migration-to-new-filesystems). + + You should consider the new `walrus` storage as an substitue for jobs with moderately low + bandwidth, low IOPS. + +The warm archive is intended as a storage space for the duration of a running HPC project. +It does **not** substitute a long-term archive, though. + +This storage is best suited for large files (like `tgz`s of input data data or intermediate results). + +The hardware consists of 20 storage nodes with a net capacity of 10 PiB on spinning disks. +We have seen an total data rate of 50 GiB/s under benchmark conditions. + +A project can apply for storage space in the warm archive. +This is limited in capacity and +duration. + +## Access + +### As Filesystem + +On ZIH systems, users can access the warm archive via [workspaces](../data_lifecycle/workspaces.md)). +Although the lifetime is considerable long, please be aware that the data will be +deleted as soon as the user's login expires. + +!!! attention + + These workspaces can **only** be written to from the login or export nodes. + On all compute nodes, the warm archive is mounted read-only. + +### S3 + +A limited S3 functionality is available. diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md index 8355168e7b9f62570c39c46c564606af75f998b8..adf0792cac19bb8ebc1bd2bd6d302fe2bfc58fe3 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md @@ -20,7 +20,7 @@ properly: * use a `/home` directory for the limited amount of personal data, simple examples and the results of calculations. The home directory is not a working directory! However, `/home` filesystem is - [backed up](#backup) using snapshots; + [backed up](#backup); * use `workspaces` as a place for working data (i.e. data sets); Recommendations of choosing the correct storage system for workspace is presented below. @@ -34,11 +34,10 @@ filesystems. !!! hint "Recommendations to choose of storage system" - * For data that seldom changes but consumes a lot of space, the - [warm_archive](warm_archive.md) can be used. - (Note that this is mounted **read-only** on the compute nodes). * For a series of calculations that works on the same data please use a `scratch` based [workspace](workspaces.md). + * For data that seldom changes but consumes a lot of space, the + [`walrus` filesystem](working.md) can be used. * **SSD**, in its turn, is the fastest available filesystem made only for large parallel applications running with millions of small I/O (input, output operations). * If the batch job needs a directory for temporary data then **SSD** is a good choice as well. @@ -51,8 +50,8 @@ important data should be [archived (long-term preservation)](longterm_preservati ### Backup The backup is a crucial part of any project. Organize it at the beginning of the project. The -backup mechanism on ZIH systems covers **only** the `/home` and `/projects` filesystems. Backed up -files can be restored directly by users, see [Snapshots](permanent.md#snapshots). +backup mechanism on ZIH systems covers **only** the filesystems `/home` and `/projects`. The section +[Backup](permanent.md#backup) holds additional information. !!! warning diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md index fd300eee4121f752ca6eaa85e12961342b821924..6ad3e771e633a4a0ef20e20a75e282aada8c3d3b 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md @@ -10,13 +10,12 @@ | Filesystem Name | Usable Directory | Availability | Type | Quota | |:------------------|:------------------|:-------------|:---------|:-------------------| -| Home | `/home` | global (w/o Power9) | Lustre | per user: 20 GB | -| Projects | `/projects` | global (w/o Power9) | Lustre | per project | -| (Taurus/old) Home | `/home` | [Power9](../jobs_and_resources/power9.md) | NFS | per user: 20 GB | +| Home | `/home` | global | Lustre | per user: 50 GB | +| Projects | `/projects` | global | NFS | per project | ## Global /home Filesystem -Each user has 20 GiB in a `/home` directory independent of the granted capacity for the project. +Each user has 50 GiB in a `/home` directory independent of the granted capacity for the project. The home directory is mounted with read-write permissions on all nodes of the ZIH system. Hints for the usage of the global home directory: @@ -44,35 +43,11 @@ It can only be written to on the login and export nodes. On compute nodes, `/projects` is mounted as read-only, because it must not be used as work directory and heavy I/O. -## Snapshots - -A changed file can always be recovered as it was at the time of the snapshot. -These snapshots are taken (subject to changes): - -- from Monday through Saturday between 06:00 and 18:00 every two hours and kept for one day - (7 snapshots) -- from Monday through Saturday at 23:30 and kept for two weeks (12 snapshots) -- every Sunday st 23:45 and kept for 26 weeks. - -To restore a previous version of a file: - -1. Go to the parent directory of the file you want to restore. -1. Run `cd .snapshot` (this subdirectory exists in every directory on the `/home` filesystem - although it is not visible with `ls -a`). -1. List the snapshots with `ls -l`. -1. Just `cd` into the directory of the point in time you wish to restore and copy the file you - wish to restore to where you want it. - -!!! note - - The `.snapshot` directory is embedded in a different directory structure. An `ls ../..` will not - show the directory where you came from. Thus, for your `cp`, you should *use an absolute path* - as destination. - ## Backup Just for the eventuality of a major filesystem crash, we keep tape-based backups of our -permanent filesystems for 180 days. +permanent filesystems for 180 days. Please send a +[ticket to the HPC support team](mailto:hpc-support@tu-dresden.de) in case you need backuped data. ## Quotas @@ -98,5 +73,5 @@ In case a quota is above its limits: - *Systematically* handle your important data: - For later use (weeks...months) at the ZIH systems, build and zip tar archives with meaningful names or IDs and store them, e.g., in a workspace in the - [warm archive](warm_archive.md) or an [archive](intermediate_archive.md) + [`walrus` filesystem](working.md) or an [archive](intermediate_archive.md) - Refer to the hints for [long-term preservation of research data](longterm_preservation.md) diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/warm_archive.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/warm_archive.md deleted file mode 100644 index 04d84c53d38f3868179b88844c6678bfa0e66f2b..0000000000000000000000000000000000000000 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/warm_archive.md +++ /dev/null @@ -1,41 +0,0 @@ -# Warm Archive - -!!! danger "Warm Archive is End of Life" - - The `warm_archive` storage system will be decommissioned for good together with the Taurus - system (end of 2023). Thus, please **do not use** `warm_archive` any longer and **migrate you - data from** `warm_archive` to the new filesystems. We provide a quite comprehensive - documentation on the - [data migration process to the new filesystems](../jobs_and_resources/barnard.md#data-migration-to-new-filesystems). - - You should consider the new `walrus` storage as an substitue for jobs with moderately low - bandwidth, low IOPS. - -The warm archive is intended as a storage space for the duration of a running HPC project. -It does **not** substitute a long-term archive, though. - -This storage is best suited for large files (like `tgz`s of input data data or intermediate results). - -The hardware consists of 20 storage nodes with a net capacity of 10 PiB on spinning disks. -We have seen an total data rate of 50 GiB/s under benchmark conditions. - -A project can apply for storage space in the warm archive. -This is limited in capacity and -duration. - -## Access - -### As Filesystem - -On ZIH systems, users can access the warm archive via [workspaces](workspaces.md)). -Although the lifetime is considerable long, please be aware that the data will be -deleted as soon as the user's login expires. - -!!! attention - - These workspaces can **only** be written to from the login or export nodes. - On all compute nodes, the warm archive is mounted read-only. - -### S3 - -A limited S3 functionality is available. diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md index 180b425f58ee32ef06ec3c8a11db28f25c06bb58..d5a11bdfa5b0bb8d2ac648506734af38e1f9a590 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md @@ -4,20 +4,6 @@ As soon as you have access to ZIH systems, you have to manage your data. Several available. Each filesystem serves for special purpose according to their respective capacity, performance and permanence. -!!! danger "End of life of `scratch` and `ssd`" - - The filesystem `/lustre/scratch` and `/lustre/ssd` will be turned off on January 3 2024 for good - (no data access afterwards!). - - The `/beegfs` filesystem will remain available to - [Alpha Centauri](../jobs_and_resources/hardware_overview.md#alpha-centauri) - and - [Power](../jobs_and_resources/hardware_overview.md#power9) - users only. - - All others need to migrate your data to Barnard’s new filesystem `/horse`. Please follow these - detailed instruction on how to [migrate to Barnard](../jobs_and_resources/barnard.md). - | Filesystem Type | Usable Directory | Capacity | Availability | Remarks | |:----------------|:------------------|:---------|:-------------------|:----------------------------------------------------------| | `Lustre` | `/data/horse` | 20 PB | global | Only accessible via [Workspaces](workspaces.md). **The(!)** working directory to meet almost all demands | @@ -27,13 +13,6 @@ performance and permanence. | `BeeGFS` | `/beegfs/.global1` | 232 TB | [Alpha](../jobs_and_resources/alpha_centauri.md) and [Power9](../jobs_and_resources/power9.md) | Only accessible via [Workspaces](workspaces.md). Fastest available filesystem, only for large parallel applications running with millions of small I/O operations | | `ext4` | `/tmp` | 95 GB | node local | Systems: tbd. Is cleaned up after the job automatically. | -??? "Outdated filesystems `/lustre/scratch` and `/lustre/ssd`" - - | Filesystem | Usable directory | Capacity | Availability | Backup | Remarks | - |:------------|:------------------|:---------|:-------------|:-------|:---------------------------------------------------------------------------------| - | `Lustre` | `/scratch/` | 4 PB | global | No | Only accessible via [Workspaces](workspaces.md). Not made for billions of files! | - | `Lustre` | `/lustre/ssd` | 40 TB | global | No | Only accessible via [Workspaces](workspaces.md). For small I/O operations | - ## Recommendations for Filesystem Usage To work as efficient as possible, consider the following points @@ -53,8 +32,7 @@ Getting high I/O-bandwidth ## Cheat Sheet for Debugging Filesystem Issues -Users can select from the following commands to get some idea about -their data. +Users can select from the following commands to get some idea about their data. ### General diff --git a/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md b/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md index cd6c63d6973532cb205f86d8650cd9c83fe26dee..71955aebc44f6f674a4270e69539c4903a562802 100644 --- a/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md +++ b/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md @@ -29,17 +29,14 @@ and jobs. To identify the mount points of the different filesystems on the data transfer machine, use `dtinfo`. It shows an output like this: -| ZIH system | Local directory | Directory on data transfer machine | -|:-------------------|:---------------------|:-----------------------------------| -| **Barnard** | `/data/horse` | `/data/horse` | -| | `/data/walrus` | `/data/walrus` | -| *outdated: Taurus* | `/home` | `/data/old/home` | -| | `/scratch/ws` | `/data/old/lustre/scratch2/ws` | -| | `/ssd/ws` | `/data/old/lustre/ssd/ws` | -| | `/beegfs/global0/ws` | `/data/old/beegfs/global0/ws` | -| | `/warm_archive/ws` | `/data/old/warm_archive/ws` | -| | `/projects` | `/projects` | -| **Archive** | | `/data/archiv` | +| Directory on Datamover | Mounting Clusters | Directory on Cluster | +|:----------- |:--------- |:-------- | +| `/home` | Alpha,Barnard,Julia,Power9,Romeo | `/home` | +| `/projects` | Alpha,Barnard,Julia,Power9,Romeo | `/projects` | +| `/data/horse` | Alpha,Barnard,Julia,Power9,Romeo | `/data/horse` | +| `/data/walrus` | Alpha,Barnard,Julia,Power9 | `/data/walrus` | +| `/data/octopus` | Alpha,Barnard,Power9,Romeo | `/data/octopus` | +| `/data/archiv` | | | ## Usage of Datamover @@ -82,8 +79,7 @@ To identify the mount points of the different filesystems on the data transfer m !!! note - The [warm archive](../data_lifecycle/warm_archive.md) and the `projects` filesystem are not - writable from within batch jobs. + The `projects` filesystem is not writable from within batch jobs. However, you can store the data in the [`walrus` filesystem](../data_lifecycle/working.md) using the Datamover nodes via `dt*` commands. diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md index c20878e60565d66b976f82f24ba18bec258077e0..fee65e5638a56503f4ab489067585a6dec614285 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md @@ -8,8 +8,7 @@ memory or in very fast NVMe memory. The former HPC system Taurus is partly switched-off and partly split up into separate clusters until the end of 2023. One such upcoming separate cluster is what you have known as partition -`julia` so far. Since February 2024, `Julia` is now a -stand-alone cluster with +`julia` so far. Since February 2024, `Julia` is now a stand-alone cluster with * homogenous hardware resources available at `julia.hpc.tu-dresden.de`, * and own Slurm batch system. @@ -19,6 +18,12 @@ stand-alone cluster with The hardware specification is documented on the page [HPC Resources](hardware_overview.md#julia). +!!! note + + `Julia` has been partitioned at the end of October 2024. A quarter of the hardware ressources + (CPUs and memory) are now in exclusive operation for the + [DZA](https://www.deutscheszentrumastrophysik.de/). + ## Local Temporary on NVMe Storage There are 370 TB of NVMe devices installed. For immediate access for all projects, a volume of 87 TB diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index abab47ea226b5aade67b3902bf24780efbc01946..9e7e34bbd99eaf528b0a2f5f574802d2d916f6d5 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -37,7 +37,6 @@ nav: - Working Filesystems: data_lifecycle/working.md - Lustre: data_lifecycle/lustre.md - BeeGFS: data_lifecycle/beegfs.md - - Warm Archive: data_lifecycle/warm_archive.md - Intermediate Archive: data_lifecycle/intermediate_archive.md - Workspaces: data_lifecycle/workspaces.md - Long-Term Preservation of Research Data: data_lifecycle/longterm_preservation.md