- Oct 11, 2014
-
-
Morris Jette authored
The power up/down request only takes effect after the ResumeTimeout or SuspendTimeout is reached in order to avoid a race condition.
-
- Oct 10, 2014
-
-
Danny Auble authored
-
Brian Christiansen authored
Bug #1143
-
Danny Auble authored
-
Dorian Krause authored
This commit fixes a bug we observed when combining select/linear with gres. If an allocation was requested with a --gres argument an srun execution within that allocation would stall indefinitely: -bash-4.1$ salloc -N 1 --gres=gpfs:100 salloc: Granted job allocation 384049 bash-4.1$ srun -w j3c017 -n 1 hostname srun: Job step creation temporarily disabled, retrying The slurmctld log showed: debug3: StepDesc: user_id=10034 job_id=384049 node_count=1-1 cpu_count=1 debug3: cpu_freq=4294967294 num_tasks=1 relative=65534 task_dist=1 node_list=j3c017 debug3: host=j3l02 port=33608 name=hostname network=(null) exclusive=0 debug3: checkpoint-dir=/home/user checkpoint_int=0 debug3: mem_per_node=62720 resv_port_cnt=65534 immediate=0 no_kill=0 debug3: overcommit=0 time_limit=0 gres=(null) constraints=(null) debug: Configuration for job 384049 complete _pick_step_nodes: some requested nodes j3c017 still have memory used by other steps _slurm_rpc_job_step_create for job 384049: Requested nodes are busy If srun --exclusive would have be used instead everything would work fine. The reason is that in exclusive mode the code properly checks whether memory is a reserved resource in the _pick_step_node() function. This commit modifies the alternate code path to do the same.
-
Morris Jette authored
-
Brian Christiansen authored
-
Danny Auble authored
(i.e ArchiveJobs PurgeJobs). This is only a cosmetic change.
-
Nicolas Joly authored
on slurmdbd startup.
-
Danny Auble authored
-
Danny Auble authored
lots of jobs.
-
Danny Auble authored
-
Danny Auble authored
-
- Oct 09, 2014
-
-
Danny Auble authored
did the ALPS reservation. Bug 1115
-
Morris Jette authored
-
Morris Jette authored
Take more job options into consideration to estimate its node count.
-
- Oct 08, 2014
-
-
Danny Auble authored
-
inodb authored
At work in Sweden we often fika (coffee+buns and what have u) at 3PM. I sometimes accidentally give a start time of 'teatime', so when I return from 'fika' I see my job's just getting started. This fix should make life even easier for the Swedes.
-
Nicolas Joly authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Nicolas Joly authored
-
Morris Jette authored
-
Morris Jette authored
-
- Oct 07, 2014
-
-
Morris Jette authored
This is a minor change to commit 4b8cdd4c from yesterday and is needed to work with launch/poe (which was broken by the commit).
-
Danny Auble authored
a reservation.
-
Danny Auble authored
which they have access to (rather then preventing them from seeing ANY reservation). Backport from 14.11 commit 77c2bd25.
-
Danny Auble authored
arbitrary layouts (test1.59).
-
Brian Christiansen authored
-
Brian Christiansen authored
option (since it isn't).
-
- Oct 04, 2014
-
-
Morris Jette authored
Do not cause it to be rebooted (powered up).
-
Morris Jette authored
This permits a sys admin to power down a node that should already be powered down, but avoids setting the NO_RESPOND bit in the node state. Doing so under some conditions prevented the node from being scheduled. The downside is that the node could possibly be allocated when it really isn't ready for use.
-
- Oct 03, 2014
-
-
Morris Jette authored
When a node's state is set to power_down, then execute SuspendProgram even if previously executed for that node.
-
Danny Auble authored
which protects against race conditions with the reservations.
-
Morris Jette authored
Fix logic determining when job configuration (i.e. running node power up logic) is complete. (Will look at better solution for v14.11).
-