Commits · e8b2393e6741502a2e3b7019d3961d8c32c1a333 · tud-zih-energy / Slurm

Sep 10, 2013
- Start NEWS for v2.6.3 · 6c79e0b7
  Morris Jette authored 11 years ago
  
  6c79e0b7
- If the OverTimeLimit is defined do not declare failed those jobs · 03455f57
  David Bigagli authored 11 years ago
  
  that ended in the OverTimeLimit interval.
  03455f57
- Update NEWS file. · fa352eb6
  David Bigagli authored 11 years ago
  
  fa352eb6
Sep 09, 2013
- Fix segfault if submitting to multiple partitions and holding the job. · bc188e8c
  Danny Auble authored 11 years ago
  
  bc188e8c
- CRAY - Make Slurm work with CLE 5.1.1 · 3b5539bd
  Danny Auble authored 11 years ago
  
  3b5539bd
Sep 06, 2013
- MySQL - Made Slurm compatible with 5.6 · 54c1584c
  Danny Auble authored 11 years ago
  
  54c1584c
- Switch/nrt - Prevent invalid memory reference · 97dac70e
  Morris Jette authored 11 years ago
  
  Caused by allocating single adapter per node of specific adapter type.
  97dac70e
Sep 04, 2013

Add 'show licenses' option to scontrol command. · 9d8c96d7
David Bigagli authored 11 years ago

9d8c96d7

Improve GRES support for CPU topology · 6f50943c

Morris Jette authored 11 years ago

Previous logic would pick CPUs then
reject jobs that can not match GRES to the allocated CPUs. New logic first
filters out CPUs that can not use the GRES, next picks CPUs for the job,
and finally picks the GRES that best match those CPUs.
bug 410

6f50943c

Aug 30, 2013
- Validate permissions of key directories at slurmctld startup · 368671b5
  Morris Jette authored 11 years ago
  
  Report anything that is world writable.
  368671b5
Aug 29, 2013
- remove last comment, it is documented in job_mgr.c as this... · ab64e75b
  Danny Auble authored 11 years ago
  
  /* Current code (<= 2.1) has it so we start the new * job with the next step id. This could be used * when restarting to figure out which step the * previous run of this job stopped on. */
  ab64e75b
- When a job is requeued reset the step id's back to 0. · 7e7edfca
  Danny Auble authored 11 years ago
  
  7e7edfca
- CRAY - Do not package Slurm's libpmi or libpmi2 libraries · 73940015
  Morris Jette authored 11 years ago
  
  The Cray version of those libraries must be used. bug 407
  73940015
- Enforce --ntasks-per-socket=1 job option when allocating by socket · 58dd480a
  Magnus Jonsson authored 11 years ago
  
  See https://groups.google.com/forum/#!topic/slurm-devel/j4izr0L4w8w
  58dd480a
- switch/generic - propagate switch info to slurmstepd · 602a578b
  Morris Jette authored 11 years ago
  
  switch/generic - propagate switch information from srun down to slurmd and slurmstepd. Previously the information was not going past the srun command.
  602a578b
- Validate that a hostlist file contains text (i.e. not a binary). · 7cd03b85
  Morris Jette authored 11 years ago
  
  7cd03b85
Aug 28, 2013

Fix for invalid memory reference · caa69594

Morris Jette authored 11 years ago

due to multiple free calls caused by job arrays submitted to
multiple partitions. The root cause is the job priority array
of the original job being re-used by the subsequent job array
entries. A similar problem that could be induced by the user
specifying a job accounting frequency when submitting a job
array is also fixed.
bug 401

caa69594

Make sure GrpCPURunMins is added when creating a user, account or QOS with · 2806f6d9
Danny Auble authored 11 years ago
```
sacctmgr.
```
2806f6d9

Aug 27, 2013

Reservation with CoreCnt: Avoid possible invalid memory reference · e0541f93

Morris Jette authored 11 years ago

If reservation create request included a CoreCnt value and more
nodes are required than configured, the logic in select/cons_res
could go off the end of the core_cnt array. This patch adds a
check for a zero value in the core_cnt array, which terminates
the user-specified array.
Back-port from master of commit 211c224b

e0541f93

Add new error code for attempt to create a reservation with duplicate name · 56d23d89
Morris Jette authored 11 years ago

56d23d89

Reservation with CoreCnt: Avoid possible invalid memory reference. · 211c224b

Morris Jette authored 11 years ago

If reservation create request included a CoreCnt value and more
nodes are required than configured, the logic in select/cons_res
could go off the end of the core_cnt array. This patch adds a
check for a zero value in the core_cnt array, which terminates
the user-specified array.

211c224b

CRAY - Add SelectTypeParameters NHC_NO_STEPS and NHC_NO which will disable · 51a4a11a
Danny Auble authored 11 years ago
```
the node health check script for steps and allocations respectfully.
```
51a4a11a

Aug 26, 2013

Add new job_state of JOB_BOOT_FAIL · ba54c8b4

Morris Jette authored 11 years ago

Used job terminations due to failure to boot it's allocated nodes
or BlueGene block.
bug 213

ba54c8b4

Aug 24, 2013
- If running jobacct_gather/none fix issue on unpacking step completion. · 33ff8dbc
  Danny Auble authored 11 years ago
  
  33ff8dbc
Aug 23, 2013

Correct value of min_nodes returned by loading job info · 98e24b0d

Morris Jette authored 11 years ago

This is a correction of a bug introduced in commit
https://github.com/SchedMD/slurm/commit/ac44db862c8d1f460e55ad09017d058942ff6499
That commit eliminated the need of reading the node state information
from squeue for performance reasons (mostly for large parallel systems
in which the Prolog ran squeue, which generates a lot of simultaneous
RPCs, slowing down the job launch process). It also assumed 1 CPU per
node. If a pending job specified a node count of 1 and a task count
larger than one, squeue was reporting the node count of the job as
the same as the task count. This patch moves that same calculation
of a pending job's minimum node count into slurmctld, so the squeue
still does not need to read the node information, but can report the
correct node count for pending jobs with minimal overhead.

98e24b0d

Aug 22, 2013
- BackupController - Make sure we have a connection to the DBD first thing · b62e729d
  Danny Auble authored 11 years ago
  
  to avoid it thinking we don't have a cluster name.
  b62e729d
- Add stdin/out/err to sview job output. · 81ff404e
  Nathan Yee authored 11 years ago
  
  81ff404e
- BackupController - Make sure we have a connection to the DBD first thing · 8e3ab25f
  Danny Auble authored 11 years ago
  
  to avoid it thinking we don't have a cluster name.
  8e3ab25f
- Add squeue output format options for job command and working director · 5cdfb028
  Nathan Yee authored 11 years ago
  
  %o and %Z respectively
  5cdfb028
- News for last update · 7da8e149
  Danny Auble authored 11 years ago
  
  7da8e149
Aug 21, 2013

Fix of wrong node/job state problem after reconfig · d80c8667

Hongjia Cao authored 11 years ago

If there are completing jobs, a reconfigure will set wrong job/node
state: all nodes of the completing job will be set allocated, and the
job will not be removed even if the completing nodes are released. The
state can only be restored by restarting slurmctld after the completing
nodes released.

d80c8667

Aug 20, 2013
- Fix issue with reconfig and GrpCPURunMins · 6d793189
  Danny Auble authored 11 years ago
  
  6d793189
- Added fields to "scontrol show job" output · 47e300a6
  Morris Jette authored 11 years ago
  
  Added boards_per_node, sockets_per_board, ntasks_per_node, ntasks_per_board, ntasks_per_socket, ntasks_per_core, and nice.
  47e300a6
Aug 19, 2013
- Added "JobAcctGatherParams" configuration parameter. · 22f253b3
  Chris Read authored 11 years ago
  
  22f253b3
- Updated NEWS and RELEASE_NOTES with the -I sh5util option. · 4fb32559
  David Bigagli authored 11 years ago
  
  4fb32559
- Clarify a change in NEWS · 7b650ddf
  Morris Jette authored 11 years ago
  
  7b650ddf
- begin NEWS for v13.12.0-pre2 · 7682f4de
  Morris Jette authored 11 years ago
  
  7682f4de
- Start NEWS for v2.6.2 · cb6b9ddd
  Morris Jette authored 11 years ago
  
  cb6b9ddd
Aug 17, 2013
- Start NEWS for v2.6.2 · 9f334c91
  Morris Jette authored 11 years ago
  
  9f334c91
Aug 16, 2013

Sched/backfill - Change default max_job_bf parameter from 50 to 100. · d3cdbf56

Morris Jette authored 11 years ago

This makes it consistent with the value of default_queue_depth.
The backfill scheduler should be able to easily handle this value
(or much higher for pretty much any configuration).

d3cdbf56