diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md index 61cde983f6bd1a0305d53c6f3e1395177c3bcd89..6214609ae7fb647edae6d9a2ba460eda459389a2 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md @@ -2,8 +2,8 @@ Storage systems differ in terms of capacity, streaming bandwidth, IOPS rate, etc. Price and efficiency don't allow to have it all in one. That is why fast parallel filesystems at ZIH have -restrictions with regards to **age of files** and [quota](permanent.md#quotas). The mechanism of -workspaces enables you to better manage your HPC data. It is common and used at a large number +restrictions with regards to **lifetime** and volume **[quota](permanent.md#quotas)**. The mechanism of +using _workspaces_ enables you to better manage your HPC data. It is common and used at a large number of HPC centers. !!! note @@ -25,11 +25,11 @@ times. ## Workspace Management -### Settings for Workspaces +### Workspace Lifetimes -Since the workspace filesystems are differ in the intended use, the settings for workspace settings -are not equal across all filesystems. The following table provides the settings for the different -filesystems. +Since the workspace filesystems are intended for different use cases and thus differ in +performance, their granted timespans differ accordingly. The maximum lifetime and number of +renewals are provided in the following table. | Filesystem (use with parameter `--filesystem=<filesystem>`) | Max. Duration in Days | Extensions | Keeptime | [Filesystem Feature](../jobs_and_resources/slurm.md#filesystem-features) | |:------------------------------------------------------------|---------------:|-----------:|---------:|:-------------------------------------------------------------------------| @@ -39,7 +39,12 @@ filesystems. | `beegfs` | 30 | 2 | 30 | `fs_beegfs` | {: summary="Settings for Workspace Filesystems."} -!!! warning "End-of-life filesystems" +!!! note + + Currently, not all filesystems are available on all of our five clusters. The page + [Working Filesystems](working.md) provides the necessary information. + +??? warning "End-of-life filesystems" The filesystems `warm_archive`, `ssd` and `scratch` will be switched off end of 2023. Do not use them anymore! @@ -50,11 +55,6 @@ filesystems. | `ssd` | 30 | 2 | `fs_lustre_ssd` | High-IOPS filesystem (`/lustre/ssd`, symbolic link: `/ssd`) on SSDs. | | `warm_archive` | 365 | 2 | 30 | `fs_warm_archive_ws` | Capacity filesystem based on spinning disks | -!!! note - - Currently, not all filesystems are available on all of our five clusters. The page - [Working Filesystems](working.md) provides the necessary information. - ### List Available Filesystems To list all available filesystems for using workspaces, you can either invoke `ws_list -l` or @@ -71,25 +71,25 @@ provides information which filesystem is available on which cluster. walrus ``` -=== "Taurus" +=== "Alpha Centauri" ```console - marie@login.taurus$ ws_list -l - scratch (default) - warm_archive + marie@login.alpha$ ws_list -l + available filesystems: ssd beegfs_global0 - beegfs + beegfs (default) ``` -=== "Alpha Centauri" +=== "Taurus (deprecated)" ```console - marie@login.alpha$ ws_list -l - available filesystems: + marie@login.taurus$ ws_list -l + scratch (default) + warm_archive ssd beegfs_global0 - beegfs (default) + beegfs ``` !!! note "Default filesystem" @@ -107,12 +107,12 @@ The command `ws_list` lists all your currently active (,i.e, not expired) worksp ```console marie@login$ ws_list id: test-workspace - workspace directory : /scratch/ws/0/marie-test-workspace - remaining time : 89 days 23 hours - creation time : Thu Jul 29 10:30:04 2021 - expiration date : Wed Oct 27 10:30:04 2021 - filesystem name : scratch - available extensions : 10 + workspace directory : /data/horse/ws/marie-test-workspace + remaining time : 89 days 23 hours + creation time : Wed Dec 6 14:46:12 2023 + expiration date : Tue Mar 5 14:46:12 2024 + filesystem name : horse + available extensions : 10 ``` The output of `ws_list` can be customized via several options. The following switch tab provides a @@ -121,14 +121,13 @@ overview of some of these options. All available options can be queried by `ws_l === "Certain filesystem" ``` - marie@login$ ws_list --filesystem=scratch_fast - id: numbercrunch - workspace directory : /lustre/ssd/ws/marie-numbercrunch - remaining time : 2 days 23 hours - creation time : Thu Mar 2 14:15:33 2023 - expiration date : Sun Mar 5 14:15:33 2023 - filesystem name : ssd - available extensions : 2 + id: marie-numbercrunch + workspace directory : /data/walrus/ws/marie-numbercrunch + remaining time : 89 days 23 hours + creation time : Wed Dec 6 14:49:55 2023 + expiration date : Tue Mar 5 14:49:55 2024 + filesystem name : walrus + available extensions : 2 ``` === "Verbose output" @@ -136,15 +135,15 @@ overview of some of these options. All available options can be queried by `ws_l ``` marie@login$ ws_list -v id: test-workspace - workspace directory : /scratch/ws/0/marie-test-workspace - remaining time : 89 days 23 hours - creation time : Thu Jul 29 10:30:04 2021 - expiration date : Wed Oct 27 10:30:04 2021 - filesystem name : scratch - available extensions : 10 - acctcode : p_numbercrunch - reminder : Sat Oct 20 10:30:04 2021 - mailaddress : marie@tu-dresden.de + workspace directory : /data/horse/ws/0/marie-test-workspace + remaining time : 89 days 23 hours + creation time : Wed Dec 6 14:46:12 2023 + expiration date : Tue Mar 5 14:46:12 2024 + filesystem name : scratch + available extensions : 10 + acctcode : p_numbercrunch + reminder : Tue Feb 27 14:46:12 2024 + mailaddress : marie@tu-dresden.de ``` === "Terse output" @@ -152,13 +151,13 @@ overview of some of these options. All available options can be queried by `ws_l ``` marie@login$ ws_list -t id: test-workspace - workspace directory : /scratch/ws/0/marie-test-workspace - remaining time : 89 days 23 hours - available extensions : 10 - id: foo - workspace directory : /scratch/ws/0/marie-foo - remaining time : 3 days 22 hours - available extensions : 10 + workspace directory : /data/horse/ws/marie-test-workspace + remaining time : 89 days 23 hours + available extensions : 10 + id: numbercrunch + workspace directory : /data/walrus/ws/marie-numbercrunch + remaining time : 89 days 23 hours + available extensions : 2 ``` === "Show only names" @@ -166,7 +165,7 @@ overview of some of these options. All available options can be queried by `ws_l ``` marie@login$ ws_list -s test-workspace - foo + numbercrunch ``` === "Sort by remaining time" @@ -177,13 +176,13 @@ overview of some of these options. All available options can be queried by `ws_l ``` marie@login$ ws_list -R -t id: test-workspace - workspace directory : /scratch/ws/0/marie-test-workspace + workspace directory : /data/horse/ws/0/marie-test-workspace remaining time : 89 days 23 hours available extensions : 10 - id: foo - workspace directory : /scratch/ws/0/marie-foof - remaining time : 3 days 22 hours - available extensions : 10 + id: marie-numbercrunch + workspace directory : /data/walrus/ws/marie-numbercrunch + remaining time : 89 days 23 hours + available extensions : 2 ``` ### Allocate a Workspace @@ -208,7 +207,13 @@ Options: -c [ --comment ] arg comment ``` -!!! example "Simple workspace allocation" +!!! Note "Name of a workspace" + + The workspace name should help you to remember the experiment and data stored here. It has to + be unique on a certain filesystem. On the other hand it is possible to use the very same name + for workspaces on different filesystems. + +=== "Simple allocation" The simple way to allocate a workspace is calling `ws_allocate` command with two arguments, where the first specifies the workspace name and the second the duration. This allocates a @@ -217,44 +222,39 @@ Options: ```console marie@login$ ws_allocate test-workspace 90 Info: creating workspace. - /scratch/ws/marie-test-workspace + /data/horse/ws/marie-test-workspace remaining extensions : 10 remaining time in days: 90 ``` -!!! example "Workspace allocation on specific filesystem" +=== "Specific filesystem" In order to allocate a workspace on a non-default filesystem, the option `--filesystem=<filesystem>` is required. ```console - marie@login$ ws_allocate --filesystem=scratch_fast test-workspace 3 + marie@login$ ws_allocate --filesystem=walrus test-workspace 99 Info: creating workspace. /lustre/ssd/ws/marie-test-workspace remaining extensions : 2 - remaining time in days: 3 + remaining time in days: 99 ``` -!!! example "Workspace allocation with e-mail reminder" +=== "with e-mail reminder" - This command will create a workspace with the name `test-workspace` on the `/scratch` filesystem - with a duration of 90 days and send an e-mail reminder. The e-mail reminder will be sent every + This command will create a workspace with the name `test-workspace` on the `/horse` filesystem + (default) + with a duration of 99 days and send an e-mail reminder. The e-mail reminder will be sent every day starting 7 days prior to expiration. We strongly recommend setting this e-mail reminder. ```console - marie@login$ ws_allocate --reminder=7 --mailaddress=marie.testuser@tu-dresden.de test-workspace 90 + marie@login$ ws_allocate --reminder=7 --mailaddress=marie.testuser@tu-dresden.de test-workspace 99 Info: creating workspace. - /scratch/ws/marie-test-workspace + /horse/ws/marie-test-workspace remaining extensions : 10 - remaining time in days: 90 + remaining time in days: 99 ``` -!!! Note "Name of a workspace" - - The workspace name should help you to remember the experiment and data stored here. It has to - be unique on a certain filesystem. On the other hand it is possible to use the very same name - for workspaces on different filesystems. - Please refer to the [section Cooperative Usage](#cooperative-usage-group-workspaces) for group workspaces. @@ -263,14 +263,14 @@ group workspaces. The lifetime of a workspace is finite and different filesystems (storage systems) have different maximum durations. The life time of a workspace can be adjusted multiple times, depending on the filesystem. You can find the concrete values in the -[section settings for workspaces](#settings-for-workspaces). +[section settings for workspaces](#workspace-lifetimes). -Use the command `ws_extend` to extend your workspace: +Use the command `ws_extend [-F filesystem] workspace days` to extend your workspace: ```console marie@login$ ws_extend -F scratch test-workspace 100 Info: extending workspace. -/scratch/ws/marie-test-workspace +/data/horse/ws/marie-test-workspace remaining extensions : 1 remaining time in days: 100 ``` @@ -286,10 +286,10 @@ workspace, too. This means when you extend a workspace that expires in 90 days with the command ```console -marie@login$ ws_extend -F scratch my-workspace 40 +marie@login$ ws_extend -F scratch test-workspace 40 ``` -it will now expire in 40 days **not** 130 days. +it will now expire in 40 days, **not** in 130 days! ### Send Reminder for Workspace Expiration Date @@ -310,8 +310,10 @@ See the [example above](#allocate-a-workspace) for reference. If you missed setting an e-mail reminder at workspace allocation, you can add a reminder later, e.g. ``` +# initial allocation marie@login$ ws_allocate --name=FancyExp --duration=17 [...] +# add e-mail reminder marie@login$ ws_allocate --name=FancyExp --duration=17 --reminder=7 --mailaddress=marie@dlr.de --extension ``` @@ -326,7 +328,7 @@ The command `ws_send_ical` sends you an ical event on the expiration date of a s as follows: ```console - ws_send_ical --filesystem=<filesystem> --mail=<e-mail-address> --workspace=<workspace name> + ws_send_ical [--filesystem <filesystem>] --mail <e-mail-address> --workspace <workspace name> ``` ### Deletion of a Workspace @@ -335,29 +337,24 @@ To delete a workspace use the `ws_release` command. It is mandatory to specify t workspace and the filesystem in which it is located: ```console -marie@login$ ws_release --filesystem=scratch --name=my-workspace +marie@login$ ws_release --filesystem=horse --name=test-workspace ``` You can list your already released or expired workspaces using the `ws_restore --list` command. ```console marie@login$ ws_restore --list -warm_archive: -scratch: -marie-my-workspace-1665014486 - unavailable since Thu Oct 6 02:01:26 2022 -marie-foo-647085320 - unavailable since Sat Mar 12 12:42:00 2022 -ssd: -marie-bar-1654074660 - unavailable since Wen Jun 1 11:11:00 2022 -beegfs_global0: -beegfs: +horse: +marie-test-workspace-1701873807 + unavailable since Wed Dec 6 15:43:27 2023 +walrus: +marie-numbercrunch-1701873907 + unavailable since Wed Dec 6 15:45:07 2023 ``` -In this example, the user `marie` has three inactive, i.e., expired, workspaces namely -`my-workspace` in `scratch`, as well as `foo` and `bar` in `ssd` filesystem. The command -`ws_restore --list` lists the name of the workspace and the expiration date. As you can see, the +In this example, the user `marie` has two inactive, i.e., expired, workspaces namely +`test-workspace` in `horse`, as well as numbercrunch in the `walrus` filesystem. The command +`ws_restore --list` lists the name of the workspace and its expiration date. As you can see, the expiration date is added to the workspace name as Unix timestamp. !!! hint "Deleting data in in an expired workspace" @@ -367,44 +364,47 @@ expiration date is added to the workspace name as Unix timestamp. rights remain unchanged. I.e., you can delete the data inside the workspace directory but you must not delete the workspace directory itself! -#### Expirer Process +#### Expire Process The clean up process of expired workspaces is automatically handled by a so-called expirer process. It performs the following steps once per day and filesystem: - Check for remaining life time of all workspaces. - - If the workspaces expired, move it to a hidden directory so that it becomes inactive. + - If the workspaces expired, move it to a hidden directory so that it becomes inactive. - Send reminder e-mails to users if the reminder functionality was configured for their particular workspaces. - Scan through all workspaces in grace period. - - If a workspace exceeded the grace period, the workspace and its data are deleted. + - If a workspace exceeded the grace period, the workspace and its data are permanently deleted. ### Restoring Expired Workspaces -At expiration time your workspace will be moved to a special, hidden directory. For a month (in -warm_archive: 2 months), you can still restore your data **into an existing workspace**. +At expiration time your workspace will be moved to a special, hidden directory. For a month, +you can still restore your data **into an existing workspace**. !!! warning When you release a workspace **by hand**, it will not receive a grace period and be **permanently deleted** the **next day**. The advantage of this design is that you can create - and release workspaces inside jobs and not swamp the filesystem with data no one needs anymore + and release workspaces inside jobs and not flood the filesystem with data no one needs anymore in the hidden directories (when workspaces are in the grace period). Use ```console -marie@login$ ws_restore --list --filesystem=scratch -scratch: -marie-my-workspace-1665014486 - unavailable since Thu Oct 6 02:01:26 2022 +marie@login$ ws_restore --list --filesystem=horse +horse: +marie-test-workspace-1701873807 + unavailable since Wed Dec 6 15:43:27 2023 +walrus: +marie-numbercrunch-1701873907 + unavailable since Wed Dec 6 15:45:07 2023 ``` to get a list of your expired workspaces, and then restore them like that into an existing, active workspace 'new_ws': ```console -marie@login$ ws_restore --filesystem=scratch marie-my-workspace-1665014486 new_ws +marie@login$ ws_restore --filesystem=horse marie-test-workspace-1701873807 new_ws ``` The expired workspace has to be specified by its full name as listed by `ws_restore --list`, @@ -447,16 +447,15 @@ the following example (which works [for the program g16](../software/nanoscale_s it to your needs and workflow, e.g. * adopt Slurm options for ressource specification, - * inserting the path to your input file, - * what software you want to [load](../software/modules.md), - * and calling the actual software to do your computation. + * insert the path to your input file, + * specify what software you want to [load](../software/modules.md), + * and call the actual software to do your computation. !!! example "Using temporary workspaces for I/O intensive tasks" ```bash #!/bin/bash - #SBATCH --partition=haswell #SBATCH --time=48:00:00 #SBATCH --nodes=1 #SBATCH --ntasks=1 @@ -480,7 +479,7 @@ the following example (which works [for the program g16](../software/nanoscale_s # Allocate workspace for this job. Adjust time span to time limit of the job (-d <N>). WSNAME=computation_$SLURM_JOB_ID - export WSDDIR=$(ws_allocate --filesystem=ssd --name=${WSNAME} --duration=2) + export WSDDIR=$(ws_allocate --filesystem=horse --name=${WSNAME} --duration=2) echo ${WSDIR} # Check allocation @@ -520,26 +519,26 @@ the following example (which works [for the program g16](../software/nanoscale_s ### Data for a Campaign For a series of jobs or calculations that work on the same data, you should allocate a workspace -once, e.g., in `scratch` for 100 days: +once, e.g., in `horse` for 100 days: ```console -marie@login$ ws_allocate --filesystem=scratch my_scratchdata 100 +marie@login$ ws_allocate --filesystem=horse my_scratchdata 100 Info: creating workspace. -/scratch/ws/marie-my_scratchdata -remaining extensions : 2 +/data/horse/ws/marie-my_scratchdata +remaining extensions : 10 remaining time in days: 99 ``` You can grant your project group access rights: ``` -chmod g+wrx /scratch/ws/marie-my_scratchdata +chmod g+wrx /data/horse/ws/marie-my_scratchdata ``` And verify it with: ```console -marie@login$ ls -la /scratch/ws/marie-my_scratchdata +marie@login$ ls -la /data/horse/ws/marie-my_scratchdata total 8 drwxrwx--- 2 marie hpcsupport 4096 Jul 10 09:03 . drwxr-xr-x 5 operator adm 4096 Jul 10 09:01 .. @@ -547,39 +546,44 @@ drwxr-xr-x 5 operator adm 4096 Jul 10 09:01 .. ### Mid-Term Storage -For data that seldom changes but consumes a lot of space, the warm archive can be used. Note that -this is mounted read-only on the compute nodes, so you cannot use it as a work directory for your -jobs! +<!-- TODO: to be confirmed - is walrus really intended for this purpose? --> +For data that rarely changes but consumes a lot of space, the `walrus` filesystem can be used. Note +that this is mounted read-only on the compute nodes, so you cannot use it as a work directory for +your jobs! ```console -marie@login$ ws_allocate --filesystem=warm_archive my_inputdata 365 -/warm_archive/ws/marie-my_inputdata +marie@login$ ws_allocate --filesystem=walrus my_inputdata 100 +/data/walrus/ws/marie-my_inputdata remaining extensions : 2 -remaining time in days: 365 +remaining time in days: 100 ``` +<!-- TODO to be confirmed for walrus / warm_archive replacement !!!Attention The warm archive is not built for billions of files. There is a quota for 100.000 files per group. Please archive data. +--> +<!-- TODO command not found - not available yet for walrus?! To see your active quota use ```console -marie@login$ qinfo quota /warm_archive/ws/ +marie@login$ qinfo quota /data/walrus/ws/ ``` Note that the workspaces reside under the mountpoint `/warm_archive/ws/` and not `/warm_archive` anymore. +--> ## Cooperative Usage (Group Workspaces) When a workspace is created with the option `-g, --group`, it gets a group workspace that is visible to others (if in the same group) via `ws_list -g`. -!!! hint "Chose group" +!!! hint "Choose group" - If you are member of multiple groups, than the group workspace is visible for your primary + If you are member of multiple groups, then the group workspace is visible for your primary group. You can list all groups you belong to via `groups`, and the first entry is your primary group. @@ -600,7 +604,7 @@ to others (if in the same group) via `ws_list -g`. ```console marie@login$ ws_allocate --group --name=numbercrunch --duration=30 Info: creating workspace. - /scratch/ws/0/marie-numbercrunch + /data/horse/ws/0/marie-numbercrunch remaining extensions : 10 remaining time in days: 30 ``` @@ -608,8 +612,8 @@ to others (if in the same group) via `ws_list -g`. This workspace directory is readable for the group, e.g., ```console - marie@login$ ls -ld /scratch/ws/0/marie-numbercrunch - drwxr-x--- 2 marie p_number_crunch 4096 Mar 2 15:24 /scratch/ws/0/marie-numbercrunch + marie@login$ ls -ld /data/horse/ws/0/marie-numbercrunch + drwxr-x--- 2 marie p_number_crunch 4096 Mar 2 15:24 /data/horse/ws/0/marie-numbercrunch ``` All members of the project group `p_number_crunch` can now list this workspace using @@ -618,7 +622,7 @@ to others (if in the same group) via `ws_list -g`. ```console martin@login$ ws_list -g -t id: numbercrunch - workspace directory : /scratch/ws/0/marie-numbercrunch + workspace directory : /data/horse/ws/0/marie-numbercrunch remaining time : 29 days 23 hours available extensions : 10 ``` @@ -651,10 +655,11 @@ workspace. **A**: The workspace you want to restore into is either not on the same filesystem or you used the wrong name. Use only the short name that is listed after `id:` when using `ws_list`. +See section [restoring expired workspaces](#restoring-expired-workspaces). ---- -**Q**: I forgot to specify an e-mail alert when allocating my workspace. How can I add the +**Q**: I forgot to specify an e-mail reminder when allocating my workspace. How can I add the e-mail alert functionality to an existing workspace? **A**: You can add the e-mail alert by "overwriting" the workspace settings via