Commits · a25837870ebc8b2fa55454ae699c2808a5bd26c3 · tud-zih-energy / Slurm

Jun 04, 2013
- Add ability to specify host repitition count in the srun hostfile · a3ae22b7
  jette authored 11 years ago
  
  For example "host1*2" is equivalent to "host1,host1".
  a3ae22b7
Jun 03, 2013

Start NEWS for v2.5.8 · c795724d
Morris Jette authored 11 years ago

c795724d

restore max_nodes of desc to NO_VAL when checkpointing job · f82e0fb8

Hongjia Cao authored 11 years ago

We're having some trouble getting our slurm jobs to successfully
restart after a checkpoint.  For this test, I'm using sbatch and a
simple, single-threaded executable.  Slurm is 2.5.4, blcr is 0.8.5.
I'm submitting the job using sbatch:

$ sbatch -n 1 -t 12:00:00 bin/bowtie-ex.sh

I am able to create the checkpoint and vacate the node:

$ scontrol checkpoint create 137
.... time passes ....
$ scontrol vacate 137

At that point, I see the checkpoint file from blcr in the current
directory and the checkpoint file from Slurm
in /var/spool/slurm-llnl/checkpoint.  However, when I attempt to
restart the job:

$ scontrol checkpoint restart 137
scontrol_checkpoint error: Node count specification invalid

In slurmctld's log (at level 7) I see:

[2013-05-29T12:41:08-07:00] debug2: Processing RPC: REQUEST_CHECKPOINT(restart) from uid=*****
[2013-05-29T12:41:08-07:00] debug3: Version string in job_ckpt header is JOB_CKPT_002
[2013-05-29T12:41:08-07:00] _job_create: max_nodes == 0
[2013-05-29T12:41:08-07:00] _slurm_rpc_checkpoint restart 137: Node count specification invalid

f82e0fb8

May 31, 2013

Fix srun support for job step --memory specification (--mem-per-cpu OK) · f032de75

Morris Jette authored 11 years ago

Rename slurm_step_ctx_params_t field from "mem_per_cpu" to "pn_min_memory".
Job step now accepts memory specification in either per-cpu or per-node basis.

f032de75

May 30, 2013
- Select/cons_res - Fix bug resulting in held job · b574168b
  Morris Jette authored 11 years ago
  
  Uninitialized variables resulted in error of "cons_res: sync loop not progressing, holding job #"
  b574168b
May 29, 2013

Fix job step allocation with --exclusive and --hostlist option · 85cab0cb

jette authored 11 years ago

The most notable problem case is on a cray where a job step
specifically requests one or more node that are not the first
nodes in the job allocation

85cab0cb

May 24, 2013
- Added sbatch option "--ignore-pbs" to ignore "#PBS" options in the batch script. · d41ddb20
  Morris Jette authored 11 years ago
  
  d41ddb20
- Added "PriorityFlags" value of "SMALL_RELATIVE_TO_TIME". · d8ec48cd
  Morris Jette authored 11 years ago
  
  If set, the job's size component will be based upon not the job size alone, but the job's size divided by it's time limit.
  d8ec48cd
May 23, 2013
- sched/backfill - Modify logic to reduce overhead under heavy load. · 941a5ac9
  Morris Jette authored 11 years ago
  
  The problem we have observed is the backfill scheduler temporarily gives up its locks (one second), but then reclaims them before the backlog of work completes, basically keeping the backfill scheduler running for a really long time when under a heavy load. bug 297
  941a5ac9
- switch/nrt - Correct network_id use logic. · 7d4a8441
  Morris Jette authored 11 years ago
  
  7d4a8441
- Node reboot logic correction · 024787b6
  Morris Jette authored 11 years ago
  
  Defers (rather than forgets) reboot request with job running on the node within a reservation.
  024787b6
- switch/nrt - Correct network_id use logic. · 7faae23f
  Morris Jette authored 11 years ago
  
  7faae23f
- Node reboot logic correction · fcc63508
  Morris Jette authored 11 years ago
  
  Defers (rather than forgets) reboot request with job running on the node within a reservation.
  fcc63508
- CRAY - Support CLE 4.2.0 · b7b4b7d5
  Danny Auble authored 11 years ago
  
  b7b4b7d5
May 22, 2013
- BGQ - When --geo is requested do not impose the default conn_types. · 8f1d9c6b
  Danny Auble authored 11 years ago
  
  8f1d9c6b
- switch/nrt - Validate dynamic window allocation size. · 922251e5
  jette authored 11 years ago
  
  922251e5
May 21, 2013
- Permit "scontrol reboot_node" for nodes in MAINT reservation. · d3ab9ae0
  Morris Jette authored 11 years ago
  
  Previously nodes in a MAINT reservation would NOT reboot on demand.
  d3ab9ae0
May 18, 2013
- BGQ - Fix issue with preemption on sub-block jobs where a job would kill · 3a849f26
  Danny Auble authored 11 years ago
  
  all preemptable jobs on the midplane instead of just the ones it needed to.
  3a849f26
May 16, 2013
- Prevent clearing reason field for pending jobs. · 1f8e47ba
  Morris Jette authored 11 years ago
  
  This bug was introduced in commit f1cf6d2d fix for bug 290
  1f8e47ba
- POE - pack missing variable to allow fanout (more than 32 nodes) · f45b7e9a
  Danny Auble authored 11 years ago
  
  f45b7e9a
May 14, 2013
- Priority/multifactor - Avoid underflow in half-life calculation. · 5d70ccce
  Morris Jette authored 11 years ago
  
  5d70ccce
May 13, 2013
- Drain node on prolog or epilog failure, rather than downing the node · e43239ae
  Morris Jette authored 11 years ago
  
  Downing the node will kill all jobs allocated to the node, very bad on something like a BlueGene system
  e43239ae
May 11, 2013

Added MaxCPUsPerNode partition configuration parameter. · e33c5d57

Morris Jette authored 11 years ago

This can be especially useful to schedule GPUs. For example a node can be
associated with two Slurm partitions (e.g. "cpu" and "gpu") and the
partition/queue "cpu" could be limited to only a subset of the node's CPUs,
insuring that one or more CPUs would be available to jobs in the "gpu"
partition/queue.

e33c5d57

May 10, 2013
- Add NEWS line about acct_gather_profile/hdf5 · 9cf0f28e
  Danny Auble authored 11 years ago
  
  9cf0f28e
May 08, 2013
- Update NEWS file for Bug#284. · bae01305
  David Bigagli authored 11 years ago
  
  bae01305
- sview - Fix race condition where new information could of slipped past · 68f0f5db
  Danny Auble authored 11 years ago
  
  the node tab and we didn't notice.
  68f0f5db
May 02, 2013

Deprecate the SchedulerParameters value of "interval" use "bf_interval" · 82256d53
Morris Jette authored 11 years ago
```
The option "interval" was the original parameter, but has not been
documented for a year or more.
```
82256d53

POE - Fix logic binding tasks to CPUs. · 48e164e0

jette authored 11 years ago

Without this change pmdv12 was bound to one CPU and could not use
all of the resources allocated to the job step for the tasks that
it launches

48e164e0

POE - Correct task count for srun --launch-cmd option · d89d7cd9

jette authored 11 years ago

This only changes behaviour when the --ntasks option is not used,
but the --cpus-per-task option is use

d89d7cd9

May 01, 2013
- Added sacct format option of "ALL" to print all fields. · 0ce7c9aa
  Morris Jette authored 11 years ago
  
  Also add size specification of "%0" to not limit a field size. For example "sacct --format=%0ALL" to print everything.
  0ce7c9aa
- POE - Correct logic to support srun network instances count with POE. · 2fe37e32
  Morris Jette authored 11 years ago
  
  2fe37e32
- Accounting - Fix minor initialization error. · df0faeac
  Danny Auble authored 11 years ago
  
  df0faeac
- POE - Correct logic to support poe option "-euidevice sn_all" · 878d67f1
  Morris Jette authored 11 years ago
  
  also "-euidevice sn_single".
  878d67f1
- CRAY - Change logging of transient ALPS errors from error() to debug(). · fd456175
  Morris Jette authored 11 years ago
  
  fd456175
- Modify slurmctld data structure locking to interleave read and write locks · 2e99b99a
  Morris Jette authored 11 years ago
  
  Modify slurmctld data structure locking to interleave read and write locks rather than always favor write locks over read locks.
  2e99b99a
Apr 30, 2013

Change maximum delay for state save from 2 secs to 5 secs. · 5a2a76ff
Morris Jette authored 11 years ago
```
Make timeout configurable at build time by defining SAVE_MAX_WAIT.
```
5a2a76ff

added script to help manage native and symmetric MPI runs within SLURM · fdf56162

Olli-Pekka Lehto authored 11 years ago

Dear all,

As quick fix, I have put together this script to help manage native and symmetric MPI runs within SLURM. It's a bit bare-bones currently but I needed to get it working quickly :)

It does not provide tight integration between the scheduler and MPI daemons and requires a slot on the host, even when running fully on the MIC, so it's really far from an optimal solution but could be a stopgap.

It's inspired by the TACC Stampede documentation. They seem to have a similar script in place.

It's fairly simple, you provide the names of the MIC binary (with -m) and host binary (with -c). The host MPI/OpenMP parameters are given as usual and the Xeon Phi side parameters as environment variables (MIC_PPN, MIC_OMP_NUM_THREADS). Currently it supports only 1 card per host but extending it should be simple enough.

Here are a couple of links to documentation:

Our prototype cluster documentation:
https://confluence.csc.fi/display/HPCproto/HPC+Prototypes#HPCPrototypes-XeonPhiDevelopment
Presentation at the PRACE Spring School in Umeå earlier this week:
https://www.hpc2n.umu.se/sites/default/files/1.03%20CSC%20Cluster%20Introduction.pdf

Feel free to include this in the contribs -directory. It might need a bit of cleanup though and I don't know when I have the time to do this.

I have also added support for TotalView debugger (provided it's installed and configured properly for Xeon Phi usage).

Future ideas:

For the native MIC client, I've been testing it out a bit and looking at ways to minimize the changes needed for support. The two major challenges seem to be in scheduling and affinity:

I think it might be necessary to put it into a specific topology plugin, like the one for BG/Q, but it looks like a lot of work to do that.

Best regards,
Olli-Pekka

fdf56162

Accounting - make average by task not cpu. · 81ccec93
Danny Auble authored 11 years ago

81ccec93

Apr 29, 2013

Fix AdminHold bug with reservations · f7c388ba

Morris Jette authored 11 years ago

Avoid placing pending jobs in AdminHold state due to backfill scheduler
interactions with advanced reservation.
Specifically, if the backfill scheduler tests a pending job can be
scheduled after it's advanced reservation ends then the job was
assigned a priority of zero (AdminHold).

f7c388ba

Add missing symbols to the xlator.h · 6d18cd66
Danny Auble authored 11 years ago

6d18cd66