Commits · 50968b84964fbbe9e455e556725a1f8f4a14abd3 · tud-zih-energy / Slurm

Jan 18, 2012

Correction to --switch option implemenation · 8f1d9b57

Morris Jette authored 13 years ago

Fix bug in --switch option with topology resulting in bad switch count use.
Patch from Alejandro Lucero Palau (Barcelona Supercomputer Center).

8f1d9b57

Jan 13, 2012
- Fix for sacct printing CPUTime(RAW) where the the is greater than a 32 bit · adf582b0
  Danny Auble authored 13 years ago
  
  number.
  adf582b0
- minor updates for latest commit · 08854a56
  Morris Jette authored 13 years ago
  
  08854a56
- Let operators see reservation data even if private · 4c24fd7d
  Morris Jette authored 13 years ago
  
  Let operators see reservation data even if "PrivateData=reservations" flag is set in slurm.conf. Patch from Don Albert, Bull.
  4c24fd7d
Jan 09, 2012

Fix bug in srun --multi-prog configuration file · f59f6a27

Morris Jette authored 13 years ago

Fix bug in srun --multi-prog configuration file to avoid printing duplicate
record error when "*" is used at the end of the file for the task ID. It
means all task IDs not otherwise identified.

f59f6a27

Fix possible slurmd deadlock from sbast command. · cb3b9fb5

Morris Jette authored 13 years ago

Fix race condition where sbcast command can result in deadlock of slurmd
daemon. Patch by Don Albert, Bull.

cb3b9fb5

Dec 28, 2011
- Permit gres count configuration of zero. · 0d779c41
  Morris Jette authored 13 years ago
  
  0d779c41
Dec 21, 2011
- Modify PAM module to use same libslurm as built with · d46b33f6
  Morris Jette authored 13 years ago
  
  d46b33f6
Dec 19, 2011
- Fix bug in sview layout if node count less than configured grid_x_width. · be1f9868
  Morris Jette authored 13 years ago
  
  be1f9868
Dec 17, 2011
- Note recent code changes · f455c48a
  Morris Jette authored 13 years ago
  
  f455c48a
Dec 15, 2011

Prevent resetting a held job's priority · fa477448

Morris Jette authored 13 years ago

Prevent resetting a held job's priority when updating other job parameters.
Patch from Alejandro Lucero Palau, BSC.

fa477448

Dec 14, 2011
- Handle numeric suffix of "T" for terabyte units · f58a563f
  Morris Jette authored 13 years ago
  
  Patch from John Thiltges, University of Nebraska-Lincoln.
  f58a563f
Dec 09, 2011
- Add slashes in front of derived exit code when modifying a job. · fca0660c
  Danny Auble authored 13 years ago
  
  fca0660c
- Fixed issue with comment field being used in a job finishing before it · a178318f
  Danny Auble authored 13 years ago
  
  starts in accounting.
  a178318f
- Fixed issue with QOS preemption when adding new QOS. · 614cd5fb
  Danny Auble authored 13 years ago
  
  614cd5fb
- sacct search for jobs using filtering was ignoring wckey filter. · 66d68934
  Morris Jette authored 13 years ago
  
  66d68934
Dec 08, 2011
- BLUEGENE - Fixed preemption issue. · bcc3c6a9
  Danny Auble authored 13 years ago
  
  bcc3c6a9
Dec 06, 2011

Permit pending job to exeeded partition limit with QOS flag change. · 0e1abeda

Morris Jette authored 13 years ago

One of our testers discovered a regression in version 2.3.1.  If a job is
pending due to PartitionNodeLimit and the limit is relieved with a
'sacctmgr modify qos name=<qos name> set flags=partitionmaxnodes' new jobs
exceeding the partition limit (but not the QOS limit) are allowed to run.
However, the pending job is never allowed to run.  Attached is a patch to
address this problem.  FYI, this problem doesn't exist in version 2.4.
Patch from Bill Brophy, Bull.

0e1abeda

Dec 05, 2011
- Fix task/cgroup plugin error when used with GRES · 6443e89f
  Morris Jette authored 13 years ago
  
  Patch by Alexander Bersenev (Institute of Mathematics and Mechanics, Russia).
  6443e89f
- Update NEWS for start of v2.3.2 work · 75bd6efe
  Morris Jette authored 13 years ago
  
  75bd6efe
Dec 02, 2011

BLUEGNE - Fixed issue with handling HTC modes and rebooting. · 04bb8ebc

Danny Auble authored 13 years ago

There was also some bad code that would reset the conn_type of a block
to SMALL no matter what type of SMALL it was.

04bb8ebc

Dec 01, 2011

Fix for "fatal: cons_res: sync loop not progressing" · d70a9ac4

jette authored 13 years ago

This was due to a bug in select/cons_res with some configuration
optiions and job options, especially if there is more than one
thread per core and the job option includes "--threads-per-core=1".
Fixes problem reported by CSCS.

d70a9ac4

Nov 30, 2011
- Fixed if not enforcing associations but want QOS support for a default · 391d8e05
  Danny Auble authored 13 years ago
  
  qos on the cluster to fill that in correctly.
  391d8e05
- Fix issue in accounting where normalized shares could be updated · 2ac2662f
  Danny Auble authored 13 years ago
  
  incorrectly when getting fairshare from the parent.
  2ac2662f
Nov 23, 2011

Fixed race condition when using the DBD in accounting where if a job · a3bb2409

Danny Auble authored 13 years ago

wasn't started at the time the eligible message was sent but started
before the db_index was returned information like start time would be lost.

a3bb2409

Nov 22, 2011
- Fix for fatal error managing GRES. Patch by Carles Fenoy, BSC. · 837be271
  Morris Jette authored 13 years ago
  
  837be271
- Set SLURM_CPUS_PER_TASK=1 when user specifies --cpus-per-task=1 · 85a5a8d0
  Morris Jette authored 13 years ago
  
  85a5a8d0
Nov 21, 2011
- Run autogen.sh after Lua link patch · 8f222137
  Morris Jette authored 13 years ago
  
  8f222137
Nov 08, 2011

Avoid orphan job step if slurmctld is down when a job step completes · 9e71fd08

Morris Jette authored 13 years ago

Note this is an old bug. The new code keeps slurmstepd alive and it
keeps trying to send step completion message to slurmctld.

9e71fd08

Nov 07, 2011
- GRES allocation ignoring some job parameters · 90862249
  Morris Jette authored 13 years ago
  
  This make the same patch to select/linear as Carles Fenoy's patch to select/cons_res plugin.
  90862249
Nov 04, 2011
- Updated set_oomadj.c, replacing deprecated oom_adj reference with oom_score_adj. · 9820986e
  Morris Jette authored 13 years ago
  
  Patch 4f68cde5bd6b4fcf839f6694457373c81d9548ba from chaos/slurm by Don Lipari, LLNL
  9820986e
Nov 02, 2011
- Cray - Remove the "family" specification from the GPU reservation request. · ccb8b419
  Morris Jette authored 13 years ago
  
  ccb8b419
Oct 28, 2011

Add backfill scheduler resolution parameter · b86bc225

Morris Jette authored 13 years ago

Backfill scheduling - Add SchedulerParameters configuration parameter of
"bf_res" to control the resolution in the backfill scheduler's data about
when jobs begin and end. Default value is 60 seconds (used to be 1 second).

b86bc225

Don't drain node if job has UID not found · a183f2ed

Morris Jette authored 13 years ago

Do not drain the compute or front-end node when trying to start a job
for which the UID is not found

a183f2ed

Oct 27, 2011

Add configure option of "--without-rpath" · 52ab5b44

Morris Jette authored 13 years ago

Add configure option of "--without-rpath" which builds SLURM tools without
the rpath option, which will work if Munge and BlueGene libraries are in
the default library search path and make system updates easier.

52ab5b44

Oct 24, 2011
- Update NEWS for start of v2.3.2 · 48773c2e
  Morris Jette authored 13 years ago
  
  48773c2e
- Do not run HeathCheckProgram on powered down nodes · 7b336a56
  Morris Jette authored 13 years ago
  
  Do not attempt to run HeathCheckProgram on powered down nodes. Patch from Ramiro Alba, Centre Tecnològic de Tranferència de Calor, Spain.
  7b336a56
Oct 21, 2011
- BGQ - allow steps to be ran. · 4f001d30
  Danny Auble authored 13 years ago
  
  4f001d30
Oct 20, 2011
- Added some missing calls to allow older versions of SLURM to talk to newer. · 606c4d44
  Danny Auble authored 13 years ago
  
  606c4d44
- BLUEGENE - Fixed issues with running on a sub-midplane system. · 6c335d6f
  Danny Auble authored 13 years ago
  
  6c335d6f