Skip to content
Snippets Groups Projects
Commit 6853fbe0 authored by Gitlab Bot's avatar Gitlab Bot
Browse files

Merge branch 'preview' into merge-preview-in-main

parents 127090f8 a557bdcd
No related branches found
No related tags found
2 merge requests!739Main,!729Automated merge from preview to main
...@@ -112,7 +112,7 @@ You can combine both features in a single link: ...@@ -112,7 +112,7 @@ You can combine both features in a single link:
https://taurus.hrsk.tu-dresden.de/jupyter/hub/user-redirect/git-pull?repo=https://github.com/jdwittenauer/ipython-notebooks&urlpath=/tree/ipython-notebooks/notebooks/language/Intro.ipynb#/~(partition~'interactive~environment~'test) https://taurus.hrsk.tu-dresden.de/jupyter/hub/user-redirect/git-pull?repo=https://github.com/jdwittenauer/ipython-notebooks&urlpath=/tree/ipython-notebooks/notebooks/language/Intro.ipynb#/~(partition~'interactive~environment~'test)
``` ```
![URL with quickstart parameters](misc/url-quick-start.png) ![URL with quickstart parameters](misc/url-git-pull-and-quick-start.png)
{: align="center"} {: align="center"}
## Open a Notebook Automatically with a Single Link ## Open a Notebook Automatically with a Single Link
......
doc.zih.tu-dresden.de/docs/access/misc/url-git-pull-and-quick-start.png

50.8 KiB

...@@ -46,11 +46,9 @@ All steps for an application are documented in detail below. ...@@ -46,11 +46,9 @@ All steps for an application are documented in detail below.
Since January 2021 ZIH, TU Dresden is a NHR-center (Nationales Hochleistungsrechnen). Since January 2021 ZIH, TU Dresden is a NHR-center (Nationales Hochleistungsrechnen).
More details can be found in [https://tu-dresden.de/zih/hochleistungsrechnen/nhr-center](https://tu-dresden.de/zih/hochleistungsrechnen/nhr-center). More details can be found in [https://tu-dresden.de/zih/hochleistungsrechnen/nhr-center](https://tu-dresden.de/zih/hochleistungsrechnen/nhr-center).
At ZIH, TU Dresden we have two different kinds of application At ZIH, TU Dresden for applying for HPC resources on NHR:
for applying for HPC resources: NHR and TUD/TIER3.
- [NHR](https://projects.hpc.tu-dresden.de/application/login.php?appkind=nhr) - [NHR](https://projects.hpc.tu-dresden.de/application/login.php?appkind=nhr)
- [TUD/Tier3](https://projects.hpc.tu-dresden.de/application/login.php?appkind=tier3)
![HPC Project Application][37] ![HPC Project Application][37]
......
...@@ -46,6 +46,12 @@ Lustre offers a number of commands that are suited to its architecture. ...@@ -46,6 +46,12 @@ Lustre offers a number of commands that are suited to its architecture.
| `ls -l <filename>` | `ls -l` | | `ls -l <filename>` | `ls -l` |
| `ls` | `ls --color` | | `ls` | `ls --color` |
In case commands such as `du` are needed, for example to identify large
directories, these commands should be applied to as little data as
possible. You should not just query the main directory in general, you
should try to work in the sub directories first. The deeper in the
structure, the better.
## Useful Commands for Lustre ## Useful Commands for Lustre
These commands work for Lustre filesystems `/scratch` and `/ssd`. These commands work for Lustre filesystems `/scratch` and `/ssd`.
......
...@@ -158,6 +158,47 @@ workspace and the filesystem in which it is located: ...@@ -158,6 +158,47 @@ workspace and the filesystem in which it is located:
marie@login$ ws_release -F scratch my-workspace marie@login$ ws_release -F scratch my-workspace
``` ```
You can list your already released or expired workspaces using the `ws_restore -l` command.
```console
marie@login$ ws_restore -l
warm_archive:
scratch:
marie-my-workspace-1665014486
unavailable since Thu Oct 6 02:01:26 2022
marie-foo-647085320
unavailable since Sat Mar 12 12:42:00 2022
ssd:
marie-bar-1654074660
unavailable since Wen Jun 1 11:11:00 2022
beegfs_global0:
beegfs:
```
In this example, the user `marie` has three inactive, i.e., expired, workspaces namely
`my-workspace` in `scratch`, as well as `foo` and `bar` in `ssd` filesystem. The command `ws_restore
-l` lists the name of the workspace and the expiration date. As you can see, the expiration date is
added to the workspace name as Unix timestamp.
!!! hint "Deleting data in in an expired workspace"
If you are short on quota, you might want to delete data in expired workspaces since it counts
to your quota. Expired workspaces are moved to a hidden directory named `.removed`. The access
rights remain unchanged. I.e., you can delete the data inside the workspace directory but you
must not delete the workspace directory itself!
#### Expirer Process
The clean up process of expired workspaces is automatically handled by a so-called expirer process.
It performs the following steps once per day and filesystem:
- Check for remaining life time of all workspaces.
- If the workspaces expired, move it to a hidden directory so that it becomes inactive.
- Send reminder Emails to users if the reminder functionality was configured for their particular
workspaces.
- Scan through all workspaces in grace period.
- If a workspace exceeded the grace period, the workspace and its data are deleted.
### Restoring Expired Workspaces ### Restoring Expired Workspaces
At expiration time your workspace will be moved to a special, hidden directory. For a month (in At expiration time your workspace will be moved to a special, hidden directory. For a month (in
...@@ -174,13 +215,16 @@ Use ...@@ -174,13 +215,16 @@ Use
```console ```console
marie@login$ ws_restore -l -F scratch marie@login$ ws_restore -l -F scratch
scratch:
marie-my-workspace-1665014486
unavailable since Thu Oct 6 02:01:26 2022
``` ```
to get a list of your expired workspaces, and then restore them like that into an existing, active to get a list of your expired workspaces, and then restore them like that into an existing, active
workspace 'new_ws': workspace 'new_ws':
```console ```console
marie@login$ ws_restore -F scratch marie-test-workspace-1234567 new_ws marie@login$ ws_restore -F scratch marie-my-workspace-1665014486 new_ws
``` ```
The expired workspace has to be specified by its full name as listed by `ws_restore -l`, including The expired workspace has to be specified by its full name as listed by `ws_restore -l`, including
......
...@@ -277,8 +277,10 @@ in Linux. ...@@ -277,8 +277,10 @@ in Linux.
## Software Environment ## Software Environment
The [software](../software/overview.md) on the ZIH HPC system is not installed system-wide, The [software](../software/overview.md) on the ZIH HPC system is not installed system-wide,
but is provided within the so-called [modules](../software/modules.md). In order to use specific but is provided within so-called [modules](../software/modules.md).
software you need to "load" the respective module. In order to use specific software you need to "load" the respective module.
This modifies the current environment (so only for the current user in the current session)
such that the software becomes available.
!!! note !!! note
...@@ -380,18 +382,31 @@ For additional information refer to the detailed documentation on [modules](../s ...@@ -380,18 +382,31 @@ For additional information refer to the detailed documentation on [modules](../s
!!! hint "Special hints on different software" !!! hint "Special hints on different software"
Special hints on different software can be in the section "Environment and Software", e.g. See also the section "Applications and Software" for more information on e.g.
for [Python](../software/data_analytics_with_python.md), [R](../software/data_analytics_with_r.md), [Python](../software/data_analytics_with_python.md),
[R](../software/data_analytics_with_r.md),
[Mathematica/MatLab](../software/mathematics.md), etc. [Mathematica/MatLab](../software/mathematics.md), etc.
!!! hint "Hint on Python packages" !!! hint "Tip for Python packages"
The usage of virtual environments and, therefore, the usage of workspaces is recommended, The use of [Virtual Environments](../software/python_virtual_environments.md)
especially for Python. Please check out the module system, even for specific Python packages, (best in [workspaces](../data_lifecycle/workspaces.md)) is recommended.
e.g. `tqdm`, `torchvision`, `tensorboard`, etc. to get a better idea of what is available.
The Python (and other) package ecosystem is very heterogeneous and dynamic, with daily updates. Please check the module system, even for specific Python packages,
The central update cycle for software on the ZIH HPC system occurs approximately every six e.g. `numpy`, `tensorflow` or `pytorch`.
months. Those modules may provide much better performance than the packages found on PyPi
(installed via `pip`) which have to work on any system while our installation is optimized for
the ZIH system to make the best use of the specific CPUs and GPUs found here.
However the Python package ecosystem (like others) is very heterogeneous and dynamic,
with daily updates.
The central update cycle for software on the ZIH HPC system is approximately every six months.
So the software installed as modules might be a bit older.
!!! warning
When explicitely loading multiple modules you need to make sure that they are compatible.
So try to stick to modules using the same toolchain.
See the [Toolchains section](../software/modules.md#Toolchains) for more information.
## Running a Program/Job ## Running a Program/Job
......
...@@ -227,6 +227,96 @@ In some cases a desired software is available as an extension of a module. ...@@ -227,6 +227,96 @@ In some cases a desired software is available as an extension of a module.
2.4.1 2.4.1
``` ```
## Toolchains
A program or library may break in various ways
(e.g. not starting, crashing or producing wrong results)
when it is used with a software of a different version than it expects.
So each module specifies the exact other modules it depends on.
They get loaded automatically when the dependent module is loaded.
Loading a single module is easy as there can't be any conflicts between dependencies.
However when loading multiple modules they can require different versions of the same software.
This conflict is currently handled in that loading the same software with a different version
automatically unloads the earlier loaded module.
As the dependents of that module are **not** automatically unloaded this means they now have a
wrong dependency (version) which can be a problem (see above).
To avoid this there are (versioned) toolchains and for each toolchain there is (usually) at most
one version of each software.
A "toolchain" is a set of modules used to build the software for other modules.
The most common one is the `foss`-toolchain comprising of `GCC`, `OpenMPI`, `OpenBLAS` & `FFTW`.
!!! info
Modules are named like `<Softwarename>/<Version>-<Toolchain>` so `Python/3.6.6-foss-2019a`
uses the `foss-2019a` toolchain.
This toolchain can be broken down into a sub-toolchain called `gompi` comprising of only
`GCC` & `OpenMPI`, or further to `GCC` (the compiler and linker)
and even further to `GCCcore` which is only the runtime libraries required to run programs built
with the GCC standard library.
!!! hint
As toolchains are regular modules you can display their parts via `module show foss/2019a`.
This way the toolchains form a hierarchy and adding more modules makes them "higher" than another.
Examples:
| Toolchain | Components |
| --------- | ---------- |
| `foss` | `GCC` `OpenMPI` `OpenBLAS` `FFTW` |
| `gompi` | `GCC` `OpenMPI` |
| `GCC` | `GCCcore` `binutils` |
| `GCCcore` | none |
| `intel` | `intel-compilers` `impi` `imkl` |
| `iimpi` | `intel-compilers` `impi` |
| `intel-compilers` | `GCCcore` `binutils` |
As you can see `GCC` and `intel-compilers` are on the same level, as are `gompi` and `iimpi`
although they are one level higher than the former.
You can load and use modules from a lower toolchain with modules from
one of its parent toolchains.
For example `Python/3.6.6-foss-2019a` can be used with `Boost/1.70.0-gompi-2019a`.
But you cannot combine toolchains or toolchain versions.
So `QuantumESPRESSO/6.5-intel-2019a` and `OpenFOAM/8-foss-2020a`
are both incompatible with `Python/3.6.6-foss-2019a`.
However `LLVM/7.0.1-GCCcore-8.2.0` can be used with either
`QuantumESPRESSO/6.5-intel-2019a` or `Python/3.6.6-foss-2019a`
because `GCCcore-8.2.0` is a sub-toolchain of `intel-2019a` and `foss-2019a`.
For [modenv/hiera](#modenvhiera) it is much easier to avoid loading incompatible
modules as modules from other toolchains cannot be directly loaded
and don't show up in `module av`.
So the concept if this hierarchical toolchains is already built into this module environment.
In the other module environments it is up to you to make sure the modules you load are compatible.
So watch the output when you load another module as a message will be shown when loading a module
causes other modules to be loaded in a different version:
??? example "Module reload"
```console
marie@login$ ml OpenFOAM/8-foss-2020a
Module OpenFOAM/8-foss-2020a and 72 dependencies loaded.
marie@login$ ml Biopython/1.78-foss-2020b
The following have been reloaded with a version change:
1) FFTW/3.3.8-gompi-2020a => FFTW/3.3.8-gompi-2020b 15) binutils/2.34-GCCcore-9.3.0 => binutils/2.35-GCCcore-10.2.0
2) GCC/9.3.0 => GCC/10.2.0 16) bzip2/1.0.8-GCCcore-9.3.0 => bzip2/1.0.8-GCCcore-10.2.0
3) GCCcore/9.3.0 => GCCcore/10.2.0 17) foss/2020a => foss/2020b
[...]
```
!!! info
The higher toolchains have a year and letter as their version corresponding to their release.
So `2019a` and `2020b` refer to the first half of 2019 and the 2nd half of 2020 respectively.
## Per-Architecture Builds ## Per-Architecture Builds
Since we have a heterogeneous cluster, we do individual builds of some of the software for each Since we have a heterogeneous cluster, we do individual builds of some of the software for each
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment