diff --git a/doc.zih.tu-dresden.de/docs/access/jupyterhub_teaching_example.md b/doc.zih.tu-dresden.de/docs/access/jupyterhub_teaching_example.md index da1a2e1506eeba1086784b749e2bdbd1f6138a02..0da7fd596f42e6e30ea0c9ca8088cfe03c6b8304 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyterhub_teaching_example.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyterhub_teaching_example.md @@ -50,7 +50,7 @@ folder and add the file to the repository. === "virtualenv" ```console - marie@compute$ git clone git@gitlab.hrz.tu-chemnitz.de:zih/projects/p_lv_jupyter_course/clone_marie/jupyterlab_course.git + marie@compute$ git clone git@gitlab.hrz.tu-chemnitz.de:zih/jupyterlab_course.git /projects/p_lv_jupyter_course/clone_marie/ [...] marie@compute$ cp requirements.txt /projects/p_lv_jupyter_course/clone_marie/jupyterlab_course marie@compute$ cd /projects/p_lv_jupyter_course/clone_marie/jupyterlab_course @@ -61,7 +61,7 @@ folder and add the file to the repository. ``` === "conda" ```console - marie@compute$ git clone git@gitlab.hrz.tu-chemnitz.de:zih/projects/p_lv_jupyter_course/clone_marie/jupyterlab_course.git + marie@compute$ git clone git@gitlab.hrz.tu-chemnitz.de:zih/jupyterlab_course.git /projects/p_lv_jupyter_course/clone_marie/ [...] marie@compute$ cp requirements.txt /projects/p_lv_jupyter_course/clone_marie/jupyterlab_course marie@compute$ cd /projects/p_lv_jupyter_course/clone_marie/jupyterlab_course diff --git a/doc.zih.tu-dresden.de/docs/contrib/content_rules.md b/doc.zih.tu-dresden.de/docs/contrib/content_rules.md index 7ded65dcd5782ae242607313c14feb8373c9cbb9..3442d3b86079f0c993e4140279ce9350323174d8 100644 --- a/doc.zih.tu-dresden.de/docs/contrib/content_rules.md +++ b/doc.zih.tu-dresden.de/docs/contrib/content_rules.md @@ -41,7 +41,7 @@ or via [e-mail](mailto:hpc-support@tu-dresden.de). * Use spaces (not tabs) both in Markdown files and in `mkdocs.yml`. * Respect the line length limit of 100 characters (exception: links). * Do not add large binary files or high-resolution images to the repository (cf. - [adding images and attachments](#graphics-and-attachments)). + [adding images and attachments](#graphics-and-videos)). * [Admonitions](#special-feature-admonitions) may be actively used for longer code examples, warnings, tips, important information, etc. * Respect the [writing style](#writing-style) and the rules for @@ -50,6 +50,7 @@ or via [e-mail](mailto:hpc-support@tu-dresden.de). * Use [syntax highlighting and appropriate prompts](#code-blocks-and-command-prompts). * Respect [data privacy](#data-privacy-and-generic-names). * Stick to the [rules on optional and required arguments](#code-styling-rules). +* Save attachments, graphics and videos within the respective `misc` subdirectory. ## Detailed Overview @@ -131,39 +132,126 @@ Markdown dialects. Further tips can be found on this [cheat sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). -#### Graphics and Attachments +#### Attachments -Please use images and graphics for illustration purposes and to improve comprehensibility. +Of course, you can provide attachments in sections and pages. +Such attachment documents may contain information that are more detailed and go far beyond the +scope of the compendium, e.g. user manuals for application-specific software. + +Save attachments within the `misc` subdirectory of the corresponding section. + +!!! note "Syntax for attachments" + + The syntax for attachments is the very same as for links. As the attachment is within the `misc` + subdirectory, you can refer to it as local file. + + ```markdown + [<description>](misc/<attachment_file_name>) + ``` + + Since the `<description>` is rendered as link text, you should choose a clear and precise text: + + ```markdown + [slides of HPC introduction](misc/HPC-Introduction.pdf) + ``` + +#### Graphics and Videos + +Please use graphics and videos for illustration purposes and to improve comprehensibility. All graphics and attachments are saved within `misc` directory of the respective subdirectory in `docs`. -For video attachments please use either webm or mp4 format. +For video attachments please use either webm or mp4 format. We make use of the +[mkdocs-video extension](https://github.com/soulless-viewer/mkdocs-video). -!!! example "Syntax" +!!! note "Syntax for graphics and videos" - The syntax to insert a graphic or attachment into a page is + The syntax to insert a **graphic** into a page is ```markdown  - {: align="center"} ``` - The syntax to insert a video attachment into a page is + The syntax to insert a **video** attachment into a page is ```html  ``` -The attribute `align` is optional. By default, graphics are left-aligned. **Note:** It is crucial to -have `{: align="center"}` on a new line. It is possible to add captions for tables and figures using `{: summary="This is a table caption"}`. The `summary` and `align` parameters can be combined as well: `{: summary="This is a table caption" align="top"}`. +##### Resizing and Alignment of Graphics + +In general, graphics and images should be added to the repository with the desired size. + !!! warning Do not add large binary files or high-resolution images to the repository. See this valuable document for [image optimization](https://web.dev/fast/#optimize-your-images). +We recommend the well-know Linux package [ImageMagick](https://imagemagick.org/) for resizing +graphics. + +!!! example "Resize image using ImageMagick" + + The command + + ```console + marie@local$ magick cluster.jpeg -resize 600 cluster_600.jpeg + ``` + + will resize the graphic `cluster.jpeg` to a width of 600 pixels keeping the aspect ratio. + Depending on the resolution of the original file, the resulting file can be way smaller in terms + of memory foot print. + +Nevertheless you can explicitly specify the size a graphic. The syntax is as follows + +```markdown +{: style="width:150px"} +``` + +By default, graphics are left-aligned. In most cases, this is not elegant and you probably wish to +center-align your graphics. **Alignment** of graphics can be controlled via the `{: align=<value>}` +attribute. Possible values are `left`, `right` and `center`. **Note:** It is crucial to +have `{: align=center}` on a new line and the value without quotation marks. + +Resize and alignment specification can be combined as depicted in the following example. + +!!! example "Resize image to 150px width and specify alignment" + + The three tabs show the Markdown syntax to resize the image of the beautiful + [cluster `Barnard`](../jobs_and_resources/hardware_overview.md#barnard) to a height of 150 + pixels keeping the aspect ratio and left, center and right-align it, respectively. + + === "Scale and default-align" + + ```markdown + {: style="height:150px"} + ``` + + {: style="height:150px"} + + === "Scale and center-align" + + ```markdown + {: style="height:150px"} + {: align="center"} + ``` + + {: style="height:150px"} + {: align="center"} + + === "Scale and right-align" + + ```markdown + {: style="height:150px"} + {: align="right"} + ``` + + {: style="height:150px"} + {: align="right"} + #### Special Feature: Admonitions [Admonitions](https://squidfunk.github.io/mkdocs-material/reference/admonitions/), also known as @@ -411,15 +499,17 @@ This should help to avoid errors. | Localhost | `marie@local$` | | Login nodes | `marie@login$` | | Arbitrary compute node | `marie@compute$` | +| Compute node `Capella` | `marie@capella$` | +| Login node `Capella` | `marie@login.capella$` | | Compute node `Barnard` | `marie@barnard$` | | Login node `Barnard` | `marie@login.barnard$` | -| Compute node `Power9` | `marie@power9$` | -| Login node `Power9` | `marie@login.power9$` | | Compute node `Alpha` | `marie@alpha$` | | Login node `Alpha` | `marie@login.alpha$` | +| Node `Julia` | `marie@julia$` | | Compute node `Romeo` | `marie@romeo$` | | Login node `Romeo` | `marie@login.romeo$` | -| Node `Julia` | `marie@julia$` | +| Compute node `Power9` | `marie@power9$` | +| Login node `Power9` | `marie@login.power9$` | | Partition `dcv` | `marie@dcv$` | * **Always use a prompt**, even if there is no output provided for the shown command. diff --git a/doc.zih.tu-dresden.de/docs/contrib/misc/barnard.jpeg b/doc.zih.tu-dresden.de/docs/contrib/misc/barnard.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..1bae02e04d794267caadf346e35568f10dfee40d Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/contrib/misc/barnard.jpeg differ diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md index 40aeead13825e40a02047f0c696f87bc7569edd3..a5ad6c87fda1efc0f6415913398ecc244b343a17 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md @@ -31,6 +31,21 @@ submit jobs into the batch system. Running jobs are not affected. We have no feasible way to get the contribution of a single user to a project's disk usage. +Some applications and frameworks are known to store cache or temporary data at places where quota +applies. You can change the default places using environment variables. We suggest to put such data +in `/tmp` or workspaces. +We cannot list all applications that do this, but some known ones are + +| Application | Environment variable | +|:-----------------|:----------------------------------| +| Singularity | `SINGULARITY_CACHEDIR` | +| pip | `PIP_CACHE_DIR` | +| Hugging Face | `HF_HOME` and `TRANSFORMERS_CACHE`| +| Torch Extensions | `TORCH_EXTENSIONS_DIR` | + +Python virtual environments and conda directories can grow quickly, +so they should also be placed inside workspaces. + ## Global /projects Filesystem For project data, we have a global project directory, that allows better collaboration between the diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md index 3de2beed08c8804998da85c0c78c87278bfda59f..ed0611d83b45298b0bfa4e42b0ba3e8c92fae9fa 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md @@ -335,7 +335,12 @@ ws_send_ical --filesystem horse --mail <your.email>@tu-dresden.de --workspace te ### Deletion of a Workspace -To delete a workspace use the `ws_release` command. It is mandatory to specify the name of the +There is an [Expire process](#expire-process) for every workspace filesystem running on a daily +basis. These processes check the lifetime of all workspaces and move expired workspaces into the +grace period. + +In addition to this automatic process, you also have the option of **explicitly releasing +workspaces** using the `ws_release` command. It is mandatory to specify the name of the workspace and the filesystem in which it is allocated: ```console @@ -355,17 +360,24 @@ marie-numbercrunch-1701873907 ``` In this example, the user `marie` has two inactive, i.e., expired, workspaces namely -`test-workspace` in `horse`, as well as numbercrunch in the `walrus` filesystem. The command +`test-workspace` in `horse`, as well as `numbercrunch` in the `walrus` filesystem. The command `ws_restore --list` lists the name of the workspace and its expiration date. As you can see, the -expiration date is added to the workspace name as Unix timestamp. +expiration date in Unix timestamp format is added to the workspace name. -!!! hint "Deleting data in in an expired workspace" +!!! hint "Deleting data in an expired workspace" If you are short on quota, you might want to delete data in expired workspaces since it counts to your quota. Expired workspaces are moved to a hidden directory named `.removed`. The access rights remain unchanged. I.e., you can delete the data inside the workspace directory but you must not delete the workspace directory itself! +!!! warning + + When you release a workspace **manually**, it will not receive a grace period and be + **permanently deleted** the **next day**. The advantage of this design is that you can create + and release workspaces inside jobs and not pollute the filesystem with data no one needs anymore + in the hidden directories (when workspaces are in the grace period). + #### Expire Process The clean up process of expired workspaces is automatically handled by a so-called expirer process. @@ -381,42 +393,62 @@ It performs the following steps once per day and filesystem: ### Restoring Expired Workspaces At expiration time your workspace will be moved to a special, hidden directory. For a month, -you can still restore your data **into an existing workspace**. +you can still restore your data **into an existing workspace** using the command `ws_restore`. -!!! warning +The expired workspace has to be specified by its full name as listed by `ws_restore --list`, +including username prefix and timestamp suffix. Otherwise, it cannot be uniquely identified. The +target workspace, on the other hand, must be given with just its short name, as listed by `ws_list`, +without the username prefix. - When you release a workspace **by hand**, it will not receive a grace period and be - **permanently deleted** the **next day**. The advantage of this design is that you can create - and release workspaces inside jobs and not flood the filesystem with data no one needs anymore - in the hidden directories (when workspaces are in the grace period). +Both workspaces must be on the **very same filesystem**. The data from the old workspace will be +moved into a directory in the new workspace with the name of the old one. This means a newly +allocated workspace works as well as a workspace that already contains data. -Use +!!! note "Steps for restoring a workspace" -```console -marie@login$ ws_restore --list -horse: -marie-test-workspace-1701873807 - unavailable since Wed Dec 6 15:43:27 2023 -walrus: -marie-numbercrunch-1701873907 - unavailable since Wed Dec 6 15:45:07 2023 -``` + 1. Use `ws_restore --list` to list all your expired workspaces and get the correct identifier + string for the expired workspace. The identifier string of an expired and an active workspace + are different! + 1. (Optional) Allocate a new workspace on the very same filesystem using `ws_allocate`. + 1. Then, you can invoke `ws_restore <workspace_name> <target_name>` to restore the expired + workspace into the active workspace. -to get a list of your expired workspaces, and then restore them like that into an existing, active -workspace 'new_ws': +??? example "Restore workspace `number_crunch` into new workspace `long_computations`" -```console -marie@login$ ws_restore --filesystem horse marie-test-workspace-1701873807 new_ws -``` + This example depictes the necessary steps to restore the expired workspace `number_crunch` into + a newly allocated workspace named `long_computations.` -The expired workspace has to be specified by its full name as listed by `ws_restore --list`, -including username prefix and timestamp suffix (otherwise, it cannot be uniquely identified). The -target workspace, on the other hand, must be given with just its short name, as listed by `ws_list`, -without the username prefix. + **First step**: List expired workspaces and retrieve correct identifier for the expired workspace. + In this example, `marie` has two expired workspaces, namely `test-workspace` and `number_crunch` + both in the `horse` filesystem. The identifier for the restore command is + `marie-number_crunch-1701873907`. + + ```console + marie@login$ ws_restore --list + horse: + marie-test-workspace-1701873807 + unavailable since Wed Dec 6 15:43:27 2023 + marie-number_crunch-1701873907 + unavailable since Wed Dec 6 15:45:07 2023 + walrus: + ``` + + **Second step:** Allocate new workspace `long_compuations` on the very same filesystem. Please + refer to the documentation of the [`ws_allocate` command](#allocate-a-workspace) for + additional useful options. -Both workspaces must be on the same filesystem. The data from the old workspace will be moved into -a directory in the new workspace with the name of the old one. This means a fresh workspace works as -well as a workspace that already contains data. + ```console + marie@login$ ws_allocate --filesystem horse --name long_computations --duration 60 + ``` + + **Third step:** Invoke the command `ws_restore`. + + ```console + marie@login$ ws_restore --filesystem horse marie-number_crunch-1701873907 long_computations + to verify that you are human, please type 'menunesarowo': menunesarowo + you are human + Info: restore successful, database entry removed. + ``` ## Linking Workspaces in HOME diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md index 021daf68c7f1757f28e9e287b48ecb7b509cf36b..12c32aaec57dab6dd85ceb57444d83271e5a5c1f 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md @@ -24,7 +24,7 @@ permanent filesystems on the page [Filesystems](../data_lifecycle/file_systems.m  {: align=center} -HPC resources at ZIH comprise a total of the **six systems**: +HPC resources at ZIH comprise a total of **six systems**: | Name | Description | Year of Installation | DNS | | ----------------------------------- | ----------------------| -------------------- | --- | @@ -32,7 +32,7 @@ HPC resources at ZIH comprise a total of the **six systems**: | [`Barnard`](#barnard) | CPU cluster | 2023 | `n[1001-1630].barnard.hpc.tu-dresden.de` | | [`Alpha Centauri`](#alpha-centauri) | GPU cluster | 2021 | `i[8001-8037].alpha.hpc.tu-dresden.de` | | [`Julia`](#julia) | Single SMP system | 2021 | `julia.hpc.tu-dresden.de` | -| [`Romeo`](#romeo) | CPU cluster | 2020 | `i[8001-8190].romeo.hpc.tu-dresden.de` | +| [`Romeo`](#romeo) | CPU cluster | 2020 | `i[7001-7186].romeo.hpc.tu-dresden.de` | | [`Power9`](#power9) | IBM Power/GPU cluster | 2018 | `ml[1-29].power9.hpc.tu-dresden.de` | All clusters will run with their own [Slurm batch system](slurm.md) and job submission is possible diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md index e1eb5a71987bf67a7a3117c4dc45d741970f5e03..338ce78b8dc3f7347f4f9a68c9f6e61cc45fb048 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md @@ -76,12 +76,12 @@ following table depics the resource limits for [all our HPC systems](hardware_ov | HPC System | Nodes | # Nodes | Cores per Node | Threads per Core | Memory per Node [in MB] | Memory per (SMT) Core [in MB] | GPUs per Node | Cores per GPU | Job Max Time [in days] | |:-----------|:------|--------:|---------------:|-----------------:|------------------------:|------------------------------:|--------------:|--------------:|-------------:| -| [`Barnard`](hardware_overview.md#barnard) | `n[1001-1630].barnard` | 630 | 104 | 2 | 515,000 | 4,951 | - | - | 7 | -| [`Capella`](hardware_overview.md#capella) | `c[1-144].capella` |144 | 64 | 1 | 768,000 | 13,438 | 4 | 14 | 7 | -| [`Power9`](hardware_overview.md#power9) | `ml[1-29].power9` | 29 | 44 | 4 | 254,000 | 1,443 | 6 | - | 7 | -| [`Romeo`](hardware_overview.md#romeo) | `i[8001-8190].romeo` | 190 | 128 | 2 | 505,000 | 1,972 | - | - | 7 | -| [`Julia`](hardware_overview.md#julia) | `julia` | 1 | 896 | 1 | 48,390,000 | 54,006 | - | - | 7 | -| [`Alpha Centauri`](hardware_overview.md#alpha-centauri) | `i[8001-8037].alpha` | 37 | 48 | 2 | 990,000 | 10,312 | 8 | 6 | 7 | +| [`Capella`](hardware_overview.md#capella) | `c[1-144].capella` | 144 | 64 | 1 | 768,000 | 13,438 | 4 | 14 | 7 | +| [`Barnard`](hardware_overview.md#barnard) | `n[1001-1630].barnard` | 630 | 104 | 2 | 515,000 | 4,951 | - | - | unlimited | +| [`Alpha Centauri`](hardware_overview.md#alpha-centauri) | `i[8001-8037].alpha` | 37 | 48 | 2 | 990,000 | 10,312 | 8 | 6 | unlimited | +| [`Julia`](hardware_overview.md#julia) | `julia` | 1 | 896 | 1 | 48,390,000 | 54,006 | - | - | unlimited | +| [`Romeo`](hardware_overview.md#romeo) | `i[7001-7186].romeo` | 186 | 128 | 2 | 505,000 | 1,972 | - | - | unlimited | +| [`Power9`](hardware_overview.md#power9) | `ml[1-29].power9` | 29 | 44 | 4 | 254,000 | 1,443 | 6 | - | unlimited | {: summary="Slurm resource limits table" align="bottom"} All HPC systems have Simultaneous Multithreading (SMT) enabled. You request for this diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index fa60a2901bf4870d833c87449fab236b634591f4..23b96778659d78de2d4a1f6045959718a7f26f71 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -175,6 +175,7 @@ icc icpc iDataPlex ifort +ImageMagick ImageNet img InfiniBand