Skip to content
Snippets Groups Projects
Commit 40ef16f6 authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Merge remote-tracking branch 'origin/preview' into gupa977e--tu-dresden.de-preview-patch-23104

parents 2c9315ff a08aae03
No related branches found
No related tags found
2 merge requests!820Automated merge from preview to main,!771Update binding_and_distribution_of_tasks.md
Showing
with 258 additions and 78 deletions
...@@ -6,7 +6,7 @@ SHELL ["/bin/bash", "-c"] ...@@ -6,7 +6,7 @@ SHELL ["/bin/bash", "-c"]
# Base # # Base #
######## ########
RUN pip install mkdocs>=1.1.2 mkdocs-material>=7.1.0 mkdocs-htmlproofer-plugin==0.8.0 mkdocs-video==1.3.0 RUN pip install mkdocs>=1.1.2 mkdocs-material==8.5.11 mkdocs-htmlproofer-plugin==0.8.0 mkdocs-video==1.3.0
########## ##########
# Linter # # Linter #
...@@ -14,7 +14,7 @@ RUN pip install mkdocs>=1.1.2 mkdocs-material>=7.1.0 mkdocs-htmlproofer-plugin== ...@@ -14,7 +14,7 @@ RUN pip install mkdocs>=1.1.2 mkdocs-material>=7.1.0 mkdocs-htmlproofer-plugin==
RUN apt-get update && apt-get install -y nodejs npm aspell git git-lfs RUN apt-get update && apt-get install -y nodejs npm aspell git git-lfs
RUN npm install -g markdownlint-cli markdown-link-check RUN npm install -g markdownlint-cli@0.32.2 markdown-link-check
########################################### ###########################################
# prepare git for automatic merging in CI # # prepare git for automatic merging in CI #
...@@ -38,6 +38,9 @@ RUN echo 'test \! -e /docs/tud_theme/javascripts/mermaid.min.js && test -x /docs ...@@ -38,6 +38,9 @@ RUN echo 'test \! -e /docs/tud_theme/javascripts/mermaid.min.js && test -x /docs
RUN echo 'exec "$@"' >> /entrypoint.sh RUN echo 'exec "$@"' >> /entrypoint.sh
RUN chmod u+x /entrypoint.sh RUN chmod u+x /entrypoint.sh
# Workaround https://gitlab.com/gitlab-org/gitlab-runner/-/issues/29022
RUN git config --global --add safe.directory /docs
WORKDIR /docs WORKDIR /docs
CMD ["mkdocs", "build", "--verbose", "--strict"] CMD ["mkdocs", "build", "--verbose", "--strict"]
......
# Creating and Using a Custom Environment for JupyterHub # Custom Environments for JupyterHub
!!! info !!! info
......
# Connecting via terminal # Connecting via Terminal (Linux, Mac, Windows)
Connecting via terminal works on every operating system. For Linux and Mac operating systems Connecting via terminal works on every operating system. For Linux and Mac operating systems
no additional software is required. For users of a Windows OS a recent version of Windows is no additional software is required. For users of a Windows OS a recent version of Windows is
......
# Connecting from Windows with MobaXterm # Connecting with MobaXterm (Windows)
MobaXterm is an enhanced terminal for Windows with an X11 server, a tabbed SSH client, network [MobaXterm](https://mobaxterm.mobatek.net) is an enhanced terminal for Windows with an X11 server,
tools and more. a tabbed SSH client, network tools and more.
Visit its homepage for more information (https://mobaxterm.mobatek.net).
## Download and install ## Download and install
To download go to [MobaXterm homepage](https://mobaxterm.mobatek.net/download-home-edition.html) To download go to [MobaXterm download page](https://mobaxterm.mobatek.net/download-home-edition.html)
and download a free home edition. and download a free home edition.
![Downloading MobaXterm](misc/mobaxterm1_download.png) ![Downloading MobaXterm](misc/mobaxterm1_download.png)
......
# Connecting from Windows with PuTTY # Connecting with PuTTY (Windows)
PuTTY is a free and open-source terminal emulator, serial console and network file transfer PuTTY is a free and open-source terminal emulator, serial console and network file transfer
application, supports several network protocols, including SCP, SSH. Visit the application, supports several network protocols, including SCP, SSH. Visit the
......
# Acknowledgment # Acknowledgement
To provide you with modern and powerful HPC systems in future as well, we have to show that these To provide you with modern and powerful HPC systems in future as well, we have to show that these
systems help to advance research. For that purpose we rely on your help. In most cases, the results systems help to advance research. For that purpose we rely on your help. In most cases, the results
......
# Terms Of Use / Nutzungsbedingungen # Terms of Use
!!! attention !!! attention
......
# BeeGFS Filesystem (Outdated) # BeeGFS Filesystem on Demand (Outdated)
!!! warning !!! warning
......
# Jobs without Infiniband (Outdated) # Jobs without InfiniBand (Outdated)
!!! warning !!! warning
......
# Hardware (Outdated) # Switched-Off Systems (Outdated)
HPC at ZIH has a quite long history and several systems have been installed and operated. HPC at ZIH has a quite long history and several systems have been installed and operated.
Documentation on former systems for future reference can be found on the following pages: Documentation on former systems for future reference can be found on the following pages:
......
...@@ -108,7 +108,7 @@ line mode within this documentation. ...@@ -108,7 +108,7 @@ line mode within this documentation.
!!! hint "Filesystem vs. Path" !!! hint "Filesystem vs. Path"
If you provide a path to the lfs commands instead of a filesystem, the lfs option is applied to If you provide a path to the lfs commands instead of a filesystem, the lfs option is applied to
the filesystem this path is in. Thus, the provied information refers to the whole filesystem, the filesystem this path is in. Thus, the passed information refers to the whole filesystem,
not the path. not the path.
You can retrieve a complete list of available options: You can retrieve a complete list of available options:
......
...@@ -28,7 +28,8 @@ times. ...@@ -28,7 +28,8 @@ times.
### List Available Filesystems ### List Available Filesystems
To list all available filesystems for using workspaces, use: To list all available filesystems for using workspaces, you can either invoke `ws_list -l` or
`ws_find -l`, e.g.,
```console ```console
marie@login$ ws_find -l marie@login$ ws_find -l
...@@ -42,12 +43,13 @@ beegfs ...@@ -42,12 +43,13 @@ beegfs
!!! note "Default is `scratch`" !!! note "Default is `scratch`"
The default filesystem is `scratch`. If you prefer another filesystem, provide the option `-F The default filesystem is `scratch`. If you prefer another filesystem (cf. section
<fs>` to the workspace commands. [List Available Filesystems](#list-available-filesystems)), you have to explictly
provide the option `-F <fs>` to the workspace commands.
### List Current Workspaces ### List Current Workspaces
To list all workspaces you currently own, use: The command `ws_list` lists all your currently active (,i.e, not expired) workspaces, e.g.
```console ```console
marie@login$ ws_list marie@login$ ws_list
...@@ -60,13 +62,84 @@ id: test-workspace ...@@ -60,13 +62,84 @@ id: test-workspace
available extensions : 10 available extensions : 10
``` ```
The output of `ws_list` can be customized via several options. The following switch tab provides a
overview of some of these options. All available options can be queried by `ws_list --help`.
=== "Certain filesystem"
```
marie@login$ ws_list --filesystem scratch_fast
id: numbercrunch
workspace directory : /lustre/ssd/ws/marie-numbercrunch
remaining time : 2 days 23 hours
creation time : Thu Mar 2 14:15:33 2023
expiration date : Sun Mar 5 14:15:33 2023
filesystem name : ssd
available extensions : 2
```
=== "Verbose output"
```
marie@login$ ws_list -v
id: test-workspace
workspace directory : /scratch/ws/0/marie-test-workspace
remaining time : 89 days 23 hours
creation time : Thu Jul 29 10:30:04 2021
expiration date : Wed Oct 27 10:30:04 2021
filesystem name : scratch
available extensions : 10
acctcode : p_numbercrunch
reminder : Sat Oct 20 10:30:04 2021
mailaddress : marie@tu-dresden.de
```
=== "Terse output"
```
marie@login$ ws_list -t
id: test-workspace
workspace directory : /scratch/ws/0/marie-test-workspace
remaining time : 89 days 23 hours
available extensions : 10
id: foo
workspace directory : /scratch/ws/0/marie-foo
remaining time : 3 days 22 hours
available extensions : 10
```
=== "Show only names"
```
marie@login$ ws_list -s
test-workspace
foo
```
=== "Sort by remaining time"
You can list your currently allocated workspace by remaining time. This is especially useful
for housekeeping tasks, such as extending soon expiring workspaces if necessary.
```
marie@login$ ws_list -R -t
id: test-workspace
workspace directory : /scratch/ws/0/marie-test-workspace
remaining time : 89 days 23 hours
available extensions : 10
id: foo
workspace directory : /scratch/ws/0/marie-foof
remaining time : 3 days 22 hours
available extensions : 10
```
### Allocate a Workspace ### Allocate a Workspace
To create a workspace in one of the listed filesystems, use `ws_allocate`. It is necessary to To allocate a workspace in one of the listed filesystems, use `ws_allocate`. It is necessary to
specify a unique name and the duration of the workspace. specify a unique name and the duration of the workspace.
```console ```console
marie@login$ ws_allocate: [options] workspace_name duration ws_allocate: [options] workspace_name duration
Options: Options:
-h [ --help] produce help message -h [ --help] produce help message
...@@ -95,10 +168,19 @@ Options: ...@@ -95,10 +168,19 @@ Options:
This will create a workspace with the name `test-workspace` on the `/scratch` filesystem for 90 This will create a workspace with the name `test-workspace` on the `/scratch` filesystem for 90
days with an email reminder for 7 days before the expiration. days with an email reminder for 7 days before the expiration.
!!! Note !!! Note "Email reminder"
Setting the reminder to `7` means you will get a reminder email on every day starting `7` days Setting the reminder to `7` means you will get a reminder email on every day starting `7` days
prior to expiration date. prior to expiration date. We strongly recommend to set this email reminder.
!!! Note "Name of a workspace"
The workspace name should help you to remember the experiment and data stored here. It has to
be unique on a certain filesystem. On the other hand it is possible to use the very same name
for workspaces on different filesystems.
Please refer to the section [section Cooperative Usage](#cooperative-usage-group-workspaces) for
group workspaces.
### Extension of a Workspace ### Extension of a Workspace
...@@ -202,7 +284,7 @@ It performs the following steps once per day and filesystem: ...@@ -202,7 +284,7 @@ It performs the following steps once per day and filesystem:
### Restoring Expired Workspaces ### Restoring Expired Workspaces
At expiration time your workspace will be moved to a special, hidden directory. For a month (in At expiration time your workspace will be moved to a special, hidden directory. For a month (in
warm_archive: 2 months), you can still restore your data into an existing workspace. warm_archive: 2 months), you can still restore your data **into an existing workspace**.
!!! warning !!! warning
...@@ -256,7 +338,7 @@ There are three typical options for the use of workspaces: ...@@ -256,7 +338,7 @@ There are three typical options for the use of workspaces:
### Per-Job Storage ### Per-Job Storage
The idea of a "workspace per-job storage" adresses the need of a batch job for a directory for The idea of a "workspace per-job storage" addresses the need of a batch job for a directory for
temporary data which can be deleted afterwards. To help you to write your own temporary data which can be deleted afterwards. To help you to write your own
[(Slurm) job file](../jobs_and_resources/slurm.md#job-files), suited to your needs, we came up with [(Slurm) job file](../jobs_and_resources/slurm.md#job-files), suited to your needs, we came up with
the following example (which works [for the program g16](../software/nanoscale_simulations.md)). the following example (which works [for the program g16](../software/nanoscale_simulations.md)).
...@@ -392,6 +474,57 @@ marie@login$ qinfo quota /warm_archive/ws/ ...@@ -392,6 +474,57 @@ marie@login$ qinfo quota /warm_archive/ws/
Note that the workspaces reside under the mountpoint `/warm_archive/ws/` and not `/warm_archive` Note that the workspaces reside under the mountpoint `/warm_archive/ws/` and not `/warm_archive`
anymore. anymore.
## Cooperative Usage (Group Workspaces)
When a workspace is created with the option `-g, --group`, it gets a group workspace that is visible
to others (if in the same group) via `ws_list -g`.
!!! hint "Chose group"
If you are member of multiple groups, than the group workspace is visible for your primary
group. You can list all groups you belong to via `groups`, and the first entry is your
primary group.
Nevertheless, you can create a group workspace for any of your groups following these two
steps:
1. Change to the desired group using `newgrp <other-group>`.
1. Create the group workspace as usual, i.e., `ws_allocate --group [...]`
The [page on Sharing Data](data_sharing.md) provides
information on how to grant access to certain colleagues and whole project groups.
!!! Example "Allocate and list group workspaces"
If Marie wants to share results and scripts in a workspace with all of her colleagues
in the project `p_number_crunch`, she can allocate a so-called group workspace.
```console
marie@login$ ws_allocate --group --name numbercrunch --duration 30
Info: creating workspace.
/scratch/ws/0/marie-numbercrunch
remaining extensions : 10
remaining time in days: 30
```
This workspace directory is readable for the group, e.g.,
```console
marie@login$ ls -ld /scratch/ws/0/marie-numbercrunch
drwxr-x--- 2 marie p_number_crunch 4096 Mar 2 15:24 /scratch/ws/0/marie-numbercrunch
```
All members of the project group `p_number_crunch` can now list this workspace using
`ws_list -g` and access the data (read-only).
```console
martin@login$ ws_list -g -t
id: numbercrunch
workspace directory : /scratch/ws/0/marie-numbercrunch
remaining time : 29 days 23 hours
available extensions : 10
```
## FAQ and Troubleshooting ## FAQ and Troubleshooting
**Q**: I am getting the error `Error: could not create workspace directory!` **Q**: I am getting the error `Error: could not create workspace directory!`
......
# Datamover - Data Transfer Inside ZIH Systems # Transfer Data Inside ZIH Systems with Datamover
With the **Datamover**, we provide a special data transfer machine for transferring data with best With the **Datamover**, we provide a special data transfer machine for transferring data with best
transfer speed between the filesystems of ZIH systems. The Datamover machine is not accessible transfer speed between the filesystems of ZIH systems. The Datamover machine is not accessible
......
# Export Nodes - Data Transfer to/from ZIH Systems # Transfer Data to/from ZIH Systems via Export Nodes
To copy large data to/from ZIH systems, the so-called **export nodes** should be used. While it is To copy large data to/from ZIH systems, the so-called **export nodes** should be used. While it is
possible to transfer small files directly via the login nodes, they are not intended to be used that possible to transfer small files directly via the login nodes, they are not intended to be used that
......
...@@ -31,7 +31,7 @@ users and the ZIH. ...@@ -31,7 +31,7 @@ users and the ZIH.
- 3.5 TB local memory on NVMe device at `/tmp` - 3.5 TB local memory on NVMe device at `/tmp`
- Hostnames: `taurusi[8001-8034]` - Hostnames: `taurusi[8001-8034]`
- Slurm partition: `alpha` - Slurm partition: `alpha`
- Further information on the usage is documented on the site [AMD Rome Nodes](rome_nodes.md) - Further information on the usage is documented on the site [Alpha Centauri Nodes](alpha_centauri.md)
## Island 7 - AMD Rome CPUs ## Island 7 - AMD Rome CPUs
......
# Known MPI-Usage Issues # Known Issues when Using MPI
This pages holds known issues observed with MPI and concrete MPI implementations. This pages holds known issues observed with MPI and concrete MPI implementations.
......
# Island 7 - AMD Rome Nodes # AMD Rome Nodes
The hardware specification is documented on the page The hardware specification is documented on the page
[HPC Resources](hardware_overview.md#island-7-amd-rome-cpus). [HPC Resources](hardware_overview.md#island-7-amd-rome-cpus).
......
...@@ -328,8 +328,8 @@ specifications for each component of the heterogeneous job should be separated w ...@@ -328,8 +328,8 @@ specifications for each component of the heterogeneous job should be separated w
Running a job step on a specific component is supported by the option `--het-group`. Running a job step on a specific component is supported by the option `--het-group`.
```console ```console
marie@login$ salloc --ntasks 1 --cpus-per-task 4 --partition <partition> --mem=200G : \ marie@login$ salloc --ntasks=1 --cpus-per-task=4 --partition <partition> --mem=200G : \
--ntasks 8 --cpus-per-task 1 --gres=gpu:8 --mem=80G --partition <partition> --ntasks=8 --cpus-per-task=1 --gres=gpu:8 --mem=80G --partition <partition>
[...] [...]
marie@login$ srun ./my_application <args for master tasks> : ./my_application <args for worker tasks> marie@login$ srun ./my_application <args for master tasks> : ./my_application <args for worker tasks>
``` ```
...@@ -340,16 +340,16 @@ components by a line containing the directive `#SBATCH hetjob`. ...@@ -340,16 +340,16 @@ components by a line containing the directive `#SBATCH hetjob`.
```bash ```bash
#!/bin/bash #!/bin/bash
#SBATCH --ntasks 1 #SBATCH --ntasks=1
#SBATCH --cpus-per-task 4 #SBATCH --cpus-per-task=4
#SBATCH --partition <partition> #SBATCH --partition=<partition>
#SBATCH --mem=200G #SBATCH --mem=200G
#SBATCH hetjob # required to separate groups #SBATCH hetjob # required to separate groups
#SBATCH --ntasks 8 #SBATCH --ntasks=8
#SBATCH --cpus-per-task 1 #SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:8 #SBATCH --gres=gpu:8
#SBATCH --mem=80G #SBATCH --mem=80G
#SBATCH --partition <partition> #SBATCH --partition=<partition>
srun ./my_application <args for master tasks> : ./my_application <args for worker tasks> srun ./my_application <args for master tasks> : ./my_application <args for worker tasks>
...@@ -474,7 +474,7 @@ at no extra cost. ...@@ -474,7 +474,7 @@ at no extra cost.
??? example "Show all jobs since the beginning of year 2021" ??? example "Show all jobs since the beginning of year 2021"
```console ```console
marie@login$ sacct -S 2021-01-01 [-E now] marie@login$ sacct --starttime 2021-01-01 [--endtime now]
``` ```
## Jobs at Reservations ## Jobs at Reservations
...@@ -501,24 +501,21 @@ as user to specify the requirements. These features should be thought of as chan ...@@ -501,24 +501,21 @@ as user to specify the requirements. These features should be thought of as chan
(e.g., a filesystem get stuck on a certain node). (e.g., a filesystem get stuck on a certain node).
A feature can be used with the Slurm option `-C, --constraint=<ARG>` like A feature can be used with the Slurm option `-C, --constraint=<ARG>` like
`srun --constraint=fs_lustre_scratch2 ...` with `srun` or `sbatch`. Combinations like `srun --constraint="fs_lustre_scratch2" [...]` with `srun` or `sbatch`.
`--constraint="fs_beegfs_global0`are allowed. For a detailed description of the possible
constraints, please refer to the [Slurm documentation](https://slurm.schedmd.com/srun.html). Multiple features can also be combined using AND, OR, matching OR, resource count etc.
E.g., `--constraint="fs_beegfs|fs_lustre_ssd"` requests for nodes with at least one of the
features `fs_beegfs` and `fs_lustre_ssd`. For a detailed description of the possible
constraints, please refer to the [Slurm documentation](https://slurm.schedmd.com/srun.html#OPT_constraint).
!!! hint !!! hint
A feature is checked only for scheduling. Running jobs are not affected by changing features. A feature is checked only for scheduling. Running jobs are not affected by changing features.
### Available Features ### Filesystem Features
| Feature | Description |
|:--------|:-------------------------------------------------------------------------|
| DA | Subset of Haswell nodes with a high bandwidth to NVMe storage (island 6) |
#### Filesystem Features
A feature `fs_*` is active if a certain filesystem is mounted and available on a node. Access to A feature `fs_*` is active if a certain filesystem is mounted and available on a node. Access to
these filesystems are tested every few minutes on each node and the Slurm features set accordingly. these filesystems are tested every few minutes on each node and the Slurm features are set accordingly.
| Feature | Description | [Workspace Name](../data_lifecycle/workspaces.md#extension-of-a-workspace) | | Feature | Description | [Workspace Name](../data_lifecycle/workspaces.md#extension-of-a-workspace) |
|:---------------------|:-------------------------------------------------------------------|:---------------------------------------------------------------------------| |:---------------------|:-------------------------------------------------------------------|:---------------------------------------------------------------------------|
......
...@@ -186,7 +186,7 @@ When `srun` is used within a submission script, it inherits parameters from `sba ...@@ -186,7 +186,7 @@ When `srun` is used within a submission script, it inherits parameters from `sba
`--ntasks=1`, `--cpus-per-task=4`, etc. So we actually implicitly run the following `--ntasks=1`, `--cpus-per-task=4`, etc. So we actually implicitly run the following
```bash ```bash
srun --ntasks=1 --cpus-per-task=4 ... --partition=ml some-gpu-application srun --ntasks=1 --cpus-per-task=4 [...] --partition=ml <some-gpu-application>
``` ```
Now, our goal is to run four instances of this program concurrently in a single batch script. Of Now, our goal is to run four instances of this program concurrently in a single batch script. Of
...@@ -237,7 +237,7 @@ inherited from the surrounding `sbatch` context. The following line would be suf ...@@ -237,7 +237,7 @@ inherited from the surrounding `sbatch` context. The following line would be suf
job in this example: job in this example:
```bash ```bash
srun --exclusive --gres=gpu:1 --ntasks=1 some-gpu-application & srun --exclusive --gres=gpu:1 --ntasks=1 <some-gpu-application> &
``` ```
Yet, it adds some extra safety to leave them in, enabling the Slurm batch system to complain if not Yet, it adds some extra safety to leave them in, enabling the Slurm batch system to complain if not
...@@ -278,7 +278,8 @@ use up all resources in the nodes: ...@@ -278,7 +278,8 @@ use up all resources in the nodes:
#SBATCH --exclusive # ensure that nobody spoils my measurement on 2 x 2 x 8 cores #SBATCH --exclusive # ensure that nobody spoils my measurement on 2 x 2 x 8 cores
#SBATCH --time=00:10:00 #SBATCH --time=00:10:00
#SBATCH --job-name=Benchmark #SBATCH --job-name=Benchmark
#SBATCH --mail-user=your.name@tu-dresden.de #SBATCH --mail-type=end
#SBATCH --mail-user=<your.email>@tu-dresden.de
srun ./my_benchmark srun ./my_benchmark
``` ```
...@@ -313,14 +314,14 @@ name specific to the job: ...@@ -313,14 +314,14 @@ name specific to the job:
```Bash ```Bash
#!/bin/bash #!/bin/bash
#SBATCH --array 0-9 #SBATCH --array=0-9
#SBATCH --output=arraytest-%A_%a.out #SBATCH --output=arraytest-%A_%a.out
#SBATCH --error=arraytest-%A_%a.err #SBATCH --error=arraytest-%A_%a.err
#SBATCH --ntasks=864 #SBATCH --ntasks=864
#SBATCH --time=08:00:00 #SBATCH --time=08:00:00
#SBATCH --job-name=Science1 #SBATCH --job-name=Science1
#SBATCH --mail-type=end #SBATCH --mail-type=end
#SBATCH --mail-user=your.name@tu-dresden.de #SBATCH --mail-user=<your.email>@tu-dresden.de
echo "Hi, I am step $SLURM_ARRAY_TASK_ID in this array job $SLURM_ARRAY_JOB_ID" echo "Hi, I am step $SLURM_ARRAY_TASK_ID in this array job $SLURM_ARRAY_JOB_ID"
``` ```
...@@ -338,36 +339,84 @@ Please read the Slurm documentation at https://slurm.schedmd.com/sbatch.html for ...@@ -338,36 +339,84 @@ Please read the Slurm documentation at https://slurm.schedmd.com/sbatch.html for
## Chain Jobs ## Chain Jobs
You can use chain jobs to create dependencies between jobs. This is often the case if a job relies You can use chain jobs to **create dependencies between jobs**. This is often useful if a job
on the result of one or more preceding jobs. Chain jobs can also be used if the runtime limit of the relies on the result of one or more preceding jobs. Chain jobs can also be used to split a long
batch queues is not sufficient for your job. Slurm has an option runnning job exceeding the batch queues limits into parts and chain these parts. Slurm has an option
`-d, --dependency=<dependency_list>` that allows to specify that a job is only allowed to start if `-d, --dependency=<dependency_list>` that allows to specify that a job is only allowed to start if
another job finished. another job finished.
Here is an example of how a chain job can look like, the example submits 4 jobs (described in a job In the following we provide two examples for scripts that submit chain jobs.
file) that will be executed one after each other with different CPU numbers:
!!! example "Script to submit jobs with dependencies" ??? example "Scaling experiment using chain jobs"
```Bash This scripts submits the very same job file `myjob.sh` four times, which will be executed one
after each other. The number of tasks is increased from job to job making this submit script a
good starting point for (strong) scaling experiments.
```Bash title="submit_scaling.sh"
#!/bin/bash #!/bin/bash
TASK_NUMBERS="1 2 4 8"
DEPENDENCY="" task_numbers="1 2 4 8"
JOB_FILE="myjob.slurm" dependency=""
job_file="myjob.sh"
for TASKS in $TASK_NUMBERS ; do
JOB_CMD="sbatch --ntasks=$TASKS" for tasks in ${task_numbers} ; do
if [ -n "$DEPENDENCY" ] ; then job_cmd="sbatch --ntasks=${tasks}"
JOB_CMD="$JOB_CMD --dependency afterany:$DEPENDENCY" if [ -n "${dependency}" ] ; then
job_cmd="${job_cmd} --dependency=afterany:${dependency}"
fi fi
JOB_CMD="$JOB_CMD $JOB_FILE" job_cmd="${job_cmd} ${job_file}"
echo -n "Running command: $JOB_CMD " echo -n "Running command: ${job_cmd} "
OUT=`$JOB_CMD` out="$(${job_cmd})"
echo "Result: $OUT" echo "Result: ${out}"
DEPENDENCY=`echo $OUT | awk '{print $4}'` dependency=$(echo "${out}" | awk '{print $4}')
done
```
The output looks like:
```console
marie@login$ sh submit_scaling.sh
Running command: sbatch --ntasks=1 myjob.sh Result: Submitted batch job 2963822
Running command: sbatch --ntasks=2 --dependency afterany:32963822 myjob.sh Result: Submitted batch job 2963823
Running command: sbatch --ntasks=4 --dependency afterany:32963823 myjob.sh Result: Submitted batch job 2963824
Running command: sbatch --ntasks=8 --dependency afterany:32963824 myjob.sh Result: Submitted batch job 2963825
```
??? example "Example to submit job chain via script"
This script submits three different job files, which will be executed one after each other. Of
course, the dependency reasons can be adopted.
```bash title="submit_job_chain.sh"
#!/bin/bash
declare -a job_names=("jobfile_a.sh" "jobfile_b.sh" "jobfile_c.sh")
dependency=""
arraylength=${#job_names[@]}
for (( i=0; i<arraylength; i++ )) ; do
job_nr=$((i + 1))
echo "Job ${job_nr}/${arraylength}: ${job_names[$i]}"
if [ -n "${dependency}" ] ; then
echo "Dependency: after job ${dependency}"
dependency="--dependency=afterany:${dependency}"
fi
job="sbatch ${dependency} ${job_names[$i]}"
out=$(${job})
dependency=$(echo "${out}" | awk '{print $4}')
done done
``` ```
The output looks like:
```console
marie@login$ sh submit_job_chains.sh
Job 1/3: jobfile_a.sh
Job 2/3: jobfile_b.sh
Dependency: after job 2963708
Job 3/3: jobfile_c.sh
Dependency: after job 2963709
```
## Array-Job with Afterok-Dependency and Datamover Usage ## Array-Job with Afterok-Dependency and Datamover Usage
In this example scenario, imagine you need to move data, before starting the main job. In this example scenario, imagine you need to move data, before starting the main job.
......
# Software Install with EasyBuild # Software Installation with EasyBuild
Sometimes the [modules](modules.md) installed in the cluster are not enough for your purposes and Sometimes the [modules](modules.md) installed in the cluster are not enough for your purposes and
you need some other software or a different version of a software. you need some other software or a different version of a software.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment