slurm.md

search:
  boost: 2.0
The batch system is the central organ of every HPC system users interact with its compute
resources. The batch system finds an adequate compute system (partition) for your compute jobs.
It organizes the queueing and messaging, if all resources are in use. If resources are available
for your job, the batch system allocates and connects to these resources, transfers runtime
environment, and starts the job.

A workflow could look like this:

```mermaid
sequenceDiagram
    user ->>+ login node: run programm
    login node ->> login node: kill after 5 min
    login node ->>- user: Killed!
    user ->> login node: salloc [...]
    login node ->> Slurm: Request resources
    Slurm ->> user: resources
    user ->>+ allocated resources: srun [options] [command]
    allocated resources ->> allocated resources: run command (on allocated nodes)
    allocated resources ->>- user: program finished
    user ->>+ allocated resources: srun [options] [further_command]
    allocated resources ->> allocated resources: run further command
    allocated resources ->>- user: program finished
    user ->>+ allocated resources: srun [options] [further_command]
    allocated resources ->> allocated resources: run further command
    Slurm ->> allocated resources: Job limit reached/exceeded
    allocated resources ->>- user: Job limit reached
```
At HPC systems, computational work and resource requirements are encapsulated into so-called
jobs. In order to allow the batch system an efficient job placement it needs these
specifications:

* requirements: number of nodes and cores, memory per core, additional resources (GPU)
* maximum run-time
* HPC project for accounting
* who gets an email on which occasion

Moreover, the [runtime environment](../software/overview.md) as well as the executable and
certain command-line arguments have to be specified to run the computational work.
On ZIH systems, `srun` is used to run your parallel application. The use of `mpirun` is provenly
broken on clusters `Power9` and `Alpha` for jobs requiring more than one node. Especially when
using code from github projects, double-check its configuration by looking for a line like
'submit command  mpirun -n $ranks ./app' and replace it with 'srun ./app'.

Otherwise, this may lead to wrong resource distribution and thus job failure, or tremendous
slowdowns of your application.