Skip to content
Snippets Groups Projects
Commit a1da0ebf authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Merge branch 'issue142' into 'preview'

Review: Fix typos, fix markdown, fix links

Closes #142

See merge request !261
parents 10a8b607 72354e57
No related branches found
No related tags found
3 merge requests!322Merge preview into main,!319Merge preview into main,!261Review: Fix typos, fix markdown, fix links
# Platform LSF
**This Page is deprecated!** The current bachsystem on Taurus is [Slurm][../jobs_and_resources/slurm.md]
!!! warning
This Page is deprecated!
The current bachsystem on ZIH systems is [Slurm](../jobs_and_resources/slurm.md).
The HRSK-I systems are operated with the batch system LSF running on *Mars*, *Atlas* resp..
......@@ -11,26 +13,26 @@ The job submission can be done with the command:
Some options of `bsub` are shown in the following table:
| bsub option | Description |
|:-------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `-n \<N> ` | set number of processors (cores) to N(default=1) |
| `-W \<hh:mm> ` | set maximum wall clock time to `<hh:mm>` |
| `-J \<name> ` | assigns the specified name to the job |
| `-eo \<errfile> ` | writes the standard error output of the job to the specified file (overwriting) |
| `-o \<outfile> ` | appends the standard output of the job to the specified file |
| `-R span\[hosts=1\]` | use only one SMP node (automatically set by the batch system) |
| `-R span\[ptile=2\]` | run 2 tasks per node |
| `-x ` | disable other jobs to share the node ( Atlas ). |
| `-m ` | specify hosts to run on ( [see below](#HostList)) |
| `-M \<M> ` | specify per-process (per-core) memory limit (in MB), the job's memory limit is derived from that number (N proc \* M MB); see examples and [Attn. #2](#AttentionNo2) below |
| `-P \<project> ` | specifiy project |
| Bsub Option | Description |
|:-------------------|:------------|
| `-n <N>` | set number of processors (cores) to N(default=1) |
| `-W <hh:mm>` | set maximum wall clock time to `<hh:mm>` |
| `-J <name>` | assigns the specified name to the job |
| `-eo <errfile>` | writes the standard error output of the job to the specified file (overwriting) |
| `-o <outfile>` | appends the standard output of the job to the specified file |
| `-R span[hosts=1]` | use only one SMP node (automatically set by the batch system) |
| `-R span[ptile=2]` | run 2 tasks per node |
| `-x` | disable other jobs to share the node ( Atlas ). |
| `-m` | specify hosts to run on ( [see below](#host-list)) |
| `-M <M>` | specify per-process (per-core) memory limit (in MB), the job's memory limit is derived from that number (N proc * M MB); see examples below |
| `-P <project>` | specify project |
You can use the `%J` -macro to merge the job ID into names.
It might be more convenient to put the options directly in a job file
which you can submit using
```Bash
```console
bsub <my_jobfile>
```
......@@ -44,7 +46,7 @@ The following example job file shows how you can make use of it:
#BSUB -n 4 # number of processors
#BSUB -M 500 # 500MB per core memory limit
#BSUB -o out.%J # output file
#BSUB -u name@tu-dresden.de # email address; works ONLY with @tu-dresden.de
#BSUB -u name@tu-dresden.de # email address; works ONLY with @tu-dresden.de
echo Starting Program
cd $HOME/work
......@@ -52,21 +54,21 @@ a.out # e.g. an OpenMP program
echo Finished Program
```
**Understanding memory limits** The option -M to bsub defines how much
**Understanding memory limits** The option `-M` to bsub defines how much
memory may be consumed by a single process of the job. The job memory
limit is computed taking this value times the number of processes
requested (-n). Therefore, having -M 600 and -n 4 results in a job
requested (`-n`). Therefore, having `-M 600` and `-n 4` results in a job
memory limit of 2400 MB. If any one of your processes consumes more than
600 MB memory OR if all processes belonging to this job consume more
than 2400 MB of memory in sum, then the job will be killed by LSF.
- For serial programs, the given limit is the same for the process and
the whole job, e.g. 500 MB `bsub -W 1:00 -n 1 -M 500 myprog`
- For MPI-parallel programs, the job memory limit is N processes \*
memory limit, e.g. 32\*800 MB = 25600 MB `bsub -W 8:00 -n 32 -M 800 mympiprog`
- For OpenMP-parallel programs, the same applies as with MPI-parallel
programs, e.g. 8\*2000 MB = 16000 MB
`bsub -W 4:00 -n 8 -M 2000 myompprog`
- For serial programs, the given limit is the same for the process and
the whole job, e.g. 500 MB `bsub -W 1:00 -n 1 -M 500 myprog`
- For MPI-parallel programs, the job memory limit is N processes \*
memory limit, e.g. 32*800 MB = 25600 MB `bsub -W 8:00 -n 32 -M 800 mympiprog`
- For OpenMP-parallel programs, the same applies as with MPI-parallel
programs, e.g. 8*2000 MB = 16000 MB
`bsub -W 4:00 -n 8 -M 2000 myompprog`
LSF sets the user environment according to the environment at the time of submission.
......@@ -78,14 +80,13 @@ of a job placement in a queue is therefore the ratio between used and granted CP
period.
**Attention:** If you do not give the maximum runtime of your program, the
default runtime for the specified queue is taken. This is way below the
maximal possible runtime (see table [below](#JobQueues)).
default runtime for the specified queue is taken.
**Attention 2:** Some systems enforce a limit on how much memory each process and your job as a
whole may allocate. If your job or any of its processes exceed this limit (N proc.\*limit for the
whole may allocate. If your job or any of its processes exceed this limit (N proc.*limit for the
job), your job will be killed. If memory limiting is in place, there also exists a default limit
which will be applied to your job if you do not specify one. Please find the limits along with the
description of the machines' [queues](#JobQueues) below.
description of the machines' [queues](#job-queues) below.
### Interactive Jobs
......@@ -98,52 +99,45 @@ extensive production runs!
Use the bsub options `-Is` for an interactive and, additionally on
*Atlas*, `-XF` for an X11 job like:
```Bash
```console
bsub -Is -XF matlab
```
or for an interactive job with a bash use
```Bash
bsub -Is -n 2 -W &lt;hh:mm&gt; -P &lt;project&gt; bash
```console
bsub -Is -n 2 -W <hh:mm> -P <project> bash
```
You can check the current usage of the system with the command `bhosts`
to estimate the time to schedule.
## ParallelJobs
### Parallel Jobs
For submitting parallel jobs, a few rules have to be understood and
followed. In general they depend on the type of parallelization and the
architecture.
For submitting parallel jobs, a few rules have to be understood and followed. In general they depend
on the type of parallelization and the architecture.
#### OpenMP Jobs
An SMP-parallel job can only run within a node (or a partition), so it
is necessary to include the option `-R "span[hosts=1]"` . The maximum
number of processors for an SMP-parallel program is 506 on a large Altix
partition, and 64 on *Atlas*. A simple example of a job file
for an OpenMP job can be found above (section [3.4](#LSF-OpenMP)).
An SMP-parallel job can only run within a node (or a partition), so it is necessary to include the
option `-R span[hosts=1]` . The maximum number of processors for an SMP-parallel program is 506 on a
large Altix partition, and 64 on *Atlas*.
#### MPI Jobs
There are major differences for submitting MPI-parallel jobs on the
systems at ZIH. Please refer to the HPC systems's section. It is
essential to use the same modules at compile- and run-time.
There are major differences for submitting MPI-parallel jobs on the systems at ZIH. Please refer to
the HPC systems's section. It is essential to use the same modules at compile- and run-time.
### Array Jobs
Array jobs can be used to create a sequence of jobs that share the same
executable and resource requirements, but have different input files, to
be submitted, controlled, and monitored as a single unit.
Array jobs can be used to create a sequence of jobs that share the same executable and resource
requirements, but have different input files, to be submitted, controlled, and monitored as a single
unit.
After the job array is submitted, LSF independently schedules and
dispatches the individual jobs. Each job submitted from a job array
shares the same job ID as the job array and are uniquely referenced
using an array index. The dimension and structure of a job array is
defined when the job array is created.
After the job array is submitted, LSF independently schedules and dispatches the individual jobs.
Each job submitted from a job array shares the same job ID as the job array and are uniquely
referenced using an array index. The dimension and structure of a job array is defined when the job
array is created.
Here is an example how an array job can looks like:
......@@ -151,20 +145,19 @@ Here is an example how an array job can looks like:
#!/bin/bash
#BSUB -W 00:10
#BSUB -n 1
#BSUB -n 1
#BSUB -J "myTask[1-100:2]" # create job array with 50 tasks
#BSUB -o logs/out.%J.%I # appends the standard output of the job to the specified file that
# contains the job information (%J) and the task information (%I)
#BSUB -e logs/err.%J.%I # appends the error output of the job to the specified file that
#BSUB -e logs/err.%J.%I # appends the error output of the job to the specified file that
# contains the job information (%J) and the task information (%I)
echo "Hello Job $LSB_JOBID Task $LSB_JOBINDEX"
```
Alternatively, you can use the following single command line to submit
an array job:
Alternatively, you can use the following single command line to submit an array job:
```Bash
```console
bsub -n 1 -W 00:10 -J "myTask[1-100:2]" -o "logs/out.%J.%I" -e "logs/err.%J.%I" "echo Hello Job \$LSB_JOBID Task \$LSB_JOBINDEX"
```
......@@ -172,15 +165,13 @@ For further details please read the LSF manual.
### Chain Jobs
You can use chain jobs to create dependencies between jobs. This is
often the case if a job relies on the result of one or more preceding
jobs. Chain jobs can also be used if the runtime limit of the batch
queues is not sufficient for your job.
You can use chain jobs to create dependencies between jobs. This is often the case if a job relies
on the result of one or more preceding jobs. Chain jobs can also be used if the runtime limit of the
batch queues is not sufficient for your job.
To create dependencies between jobs you have to use the option `-w`.
Since `-w` relies on the job id or the job name it is advisable to use
the option `-J` to create a user specified name for a single job. For
detailed information see the man pages of bsub with `man bsub`.
To create dependencies between jobs you have to use the option `-w`. Since `-w` relies on the job
id or the job name it is advisable to use the option `-J` to create a user specified name for a
single job. For detailed information see the man pages of bsub with `man bsub`.
Here is an example how a chain job can looks like:
......@@ -217,52 +208,47 @@ done
## Job Queues
With the command `bqueues [-l <queue name>]` you can get information
about available queues. With `bqueues -l` you get a detailed listing of
the queue properties.
With the command `bqueues [-l <queue name>]` you can get information about available queues. With
`bqueues -l` you get a detailed listing of the queue properties.
`Attention`: The queue `interactive` is the only one to accept
interactive jobs!
**Attention:** The queue `interactive` is the only one to accept interactive jobs!
## Job Monitoring
You can check the current usage of the system with the command `bhosts`
to estimate the time to schedule. Or to get an overview on *Atlas*,
lsfview shows the current usage of the system.
You can check the current usage of the system with the command `bhosts` to estimate the time to
schedule. Or to get an overview on *Atlas*, lsfview shows the current usage of the system.
The command `bhosts` shows the load on the hosts.
For a more convenient overview the command `lsfshowjobs` displays
information on the LSF status like this:
For a more convenient overview the command `lsfshowjobs` displays information on the LSF status like
this:
```Bash
```console
You have 1 running job using 64 cores
You have 1 pending job
```
and the command `lsfnodestat` displays the node and core status of
machine like this:
and the command `lsfnodestat` displays the node and core status of machine like this:
```Bash
```console
# -------------------------------------------
nodes available: 714/714 nodes damaged: 0
# -------------------------------------------
jobs running: 1797 \| cores closed (exclusive jobs): 94 jobs wait: 3361
\| cores closed by ADMIN: 129 jobs suspend: 0 \| cores working: 2068
jobs damaged: 0 \|
jobs running: 1797 | cores closed (exclusive jobs): 94 jobs wait: 3361
| cores closed by ADMIN: 129 jobs suspend: 0 | cores working: 2068
jobs damaged: 0 |
# -------------------------------------------
normal working cores: 2556 cores free for jobs: 265
```
The command `bjobs` allows to monitor your running jobs. It has the
following options:
The command `bjobs` allows to monitor your running jobs. It has the following options:
| bjobs option | Description |
| Bjobs Option | Description |
|:--------------|:----------------------------------------------------------------------------------------------------------------------------------|
| `-r` | Displays running jobs. |
| `-s` | Displays suspended jobs, together with the suspending reason that caused each job to become suspended. |
......@@ -270,26 +256,24 @@ following options:
| `-a` | Displays information on jobs in all states, including finished jobs that finished recently. |
| `-l [job_id]` | Displays detailed information for each job or for a particular job. |
## Checking the progress of your jobs
## Checking the Progress of Your Jobs
If you run code that regularily emits status or progress messages, using
the command
If you run code that regularly emits status or progress messages, using the command
```Bash
```console
watch -n10 tail -n2 '*out'
```
in your `$HOME/.lsbatch` directory is a very handy way to keep yourself
informed. Note that this only works if you did not use the `-o` option
of `bsub`, If you used `-o`, replace `*out` with the list of file names
you passed to this very option.
in your `$HOME/.lsbatch` directory is a very handy way to keep yourself informed. Note that this
only works if you did not use the `-o` option of `bsub`, If you used `-o`, replace `*out` with the
list of file names you passed to this very option.
## Host List
The `bsub` option `-m` can be used to specify a list of hosts for
execution. This is especially useful for memory intensive computations.
The `bsub` option `-m` can be used to specify a list of hosts for execution. This is especially
useful for memory intensive computations.
### Altix
Jupiter, saturn, and uranus have 4 GB RAM per core, mars only 1GB. So it
makes sense to specify '-m "jupiter saturn uranus".
Jupiter, Saturn, and Uranus have 4 GB RAM per core, mars only 1 GB. So it makes sense to specify
`-m "jupiter saturn uranus"`.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment