Commits · bf9943fea5a45cfb7b04961439e95aa57d8b0e5e · tud-zih-energy / Slurm

Dec 06, 2011
- Fix leaked iterator in slurmdb_pack_job_cond. · bf9943fe
  Yuri D'Elia authored 13 years ago
  
  bf9943fe
Dec 05, 2011
- Fix task/cgroup plugin error when used with GRES · 6443e89f
  Morris Jette authored 13 years ago
  
  Patch by Alexander Bersenev (Institute of Mathematics and Mechanics, Russia).
  6443e89f
Dec 02, 2011

BLUEGNE - Fixed issue with handling HTC modes and rebooting. · 04bb8ebc

Danny Auble authored 13 years ago

There was also some bad code that would reset the conn_type of a block
to SMALL no matter what type of SMALL it was.

04bb8ebc

Dec 01, 2011

Fix for "fatal: cons_res: sync loop not progressing" · d70a9ac4

jette authored 13 years ago

This was due to a bug in select/cons_res with some configuration
optiions and job options, especially if there is more than one
thread per core and the job option includes "--threads-per-core=1".
Fixes problem reported by CSCS.

d70a9ac4

Nov 30, 2011
- Revert write lock back to a read lock for performance reasons. A write · 1cb9dabd
  Danny Auble authored 13 years ago
  
  lock was deemed not necessary because the information (db_index) was only internal and was only modified in the same function later which is protected by the write lock.
  1cb9dabd
- Fixed if not enforcing associations but want QOS support for a default · 391d8e05
  Danny Auble authored 13 years ago
  
  qos on the cluster to fill that in correctly.
  391d8e05
- Fix issue in accounting where normalized shares could be updated · 2ac2662f
  Danny Auble authored 13 years ago
  
  incorrectly when getting fairshare from the parent.
  2ac2662f
Nov 23, 2011

Fixed race condition when using the DBD in accounting where if a job · a3bb2409

Danny Auble authored 13 years ago

wasn't started at the time the eligible message was sent but started
before the db_index was returned information like start time would be lost.

a3bb2409

Nov 22, 2011
- Fix for fatal error managing GRES. Patch by Carles Fenoy, BSC. · 837be271
  Morris Jette authored 13 years ago
  
  837be271
- Set SLURM_CPUS_PER_TASK=1 when user specifies --cpus-per-task=1 · 85a5a8d0
  Morris Jette authored 13 years ago
  
  85a5a8d0
Nov 18, 2011
- Fix a couple of pgsql bugs · f2e9ef1a
  jette authored 13 years ago
  
  Patch from Yuri D'Ella
  f2e9ef1a
Nov 08, 2011

Avoid orphan job step if slurmctld is down when a job step completes · 9e71fd08

Morris Jette authored 13 years ago

Note this is an old bug. The new code keeps slurmstepd alive and it
keeps trying to send step completion message to slurmctld.

9e71fd08

Nov 07, 2011
- Add missing bracket, bug introduced in fd83389434b · d6376385
  Morris Jette authored 13 years ago
  
  d6376385
- Cache current time in squeue for improved performance · 5515f1bd
  Morris Jette authored 13 years ago
  
  5515f1bd
- GRES allocation ignoring some job parameters · 90862249
  Morris Jette authored 13 years ago
  
  This make the same patch to select/linear as Carles Fenoy's patch to select/cons_res plugin.
  90862249
- Added gres_cpus test. Without this test it could lead to the error "fatal:... · fd838943
  Carles Fenoy authored 13 years ago
  
  Added gres_cpus test. Without this test it could lead to the error "fatal: cons_res: sync loop not progressing" With this patch a job will be rejected if asking for unavailable configuration.
  fd838943
Nov 04, 2011

Don't set CUDA_VISIBLE_DEVICES from gres/gpu if not files defined · 26e93d97

Morris Jette authored 13 years ago

Print an error rather than setting CUDA_VISIBLE_DEVICES environment
variable to  "NoDevFiles" if no device files defined.

26e93d97

Updated set_oomadj.c, replacing deprecated oom_adj reference with oom_score_adj. · 9820986e
Morris Jette authored 13 years ago
```
Patch 4f68cde5bd6b4fcf839f6694457373c81d9548ba from chaos/slurm by Don Lipari, LLNL
```
9820986e

Partial revert of commit · e76a0c9b

Morris Jette authored 13 years ago

The change in function call order of commit e60abe43
resulted in slurmd daemons on front-end systems not registering with the
proper node name.

e76a0c9b

Nov 02, 2011
- Cray - Remove the "family" specification from the GPU reservation request. · ccb8b419
  Morris Jette authored 13 years ago
  
  ccb8b419
Oct 31, 2011
- Do not look for the script file for completed jobs · 8e6ee500
  Morris Jette authored 13 years ago
  
  8e6ee500
- Add QOS to the information logged when a job is submitted. · a42a0dda
  Morris Jette authored 13 years ago
  
  a42a0dda
Oct 28, 2011

Add backfill scheduler resolution parameter · b86bc225

Morris Jette authored 13 years ago

Backfill scheduling - Add SchedulerParameters configuration parameter of
"bf_res" to control the resolution in the backfill scheduler's data about
when jobs begin and end. Default value is 60 seconds (used to be 1 second).

b86bc225

cosmetic mods · 1397892c
Morris Jette authored 13 years ago

1397892c

Don't drain node if job has UID not found · a183f2ed

Morris Jette authored 13 years ago

Do not drain the compute or front-end node when trying to start a job
for which the UID is not found

a183f2ed

Release locks on cray system after inventory and before backfill scheduling loop · 850f6ee0

Morris Jette authored 13 years ago

Release locks on cray system after inventory and before backfill scheduling loop
in order to not process more jobs and avoid blocking pending RPCs for so long

850f6ee0

Remove misleading logging messages · c9572925
Morris Jette authored 13 years ago

c9572925

Oct 27, 2011
- Makefile.in updated after autogen.sh run · 7f8f6af1
  Morris Jette authored 13 years ago
  
  7f8f6af1
- spelling errors and manual pages correction · 6880e0d9
  Morris Jette authored 13 years ago
  
  This patch contains corrections for spelling errors in the code and improvements for some man pages. Patch from Gennaro Oliva.
  6880e0d9
Oct 25, 2011
- Add functions to get a job's GRES allocation value by name · 655afec5
  Morris Jette authored 13 years ago
  
  Patch by Stephen Trofinoff, CSCS.
  655afec5
Oct 24, 2011

Changing logging of step create failure while SlurmctldProlog running · ba27a28e

Morris Jette authored 13 years ago

Change the logging of a job step create failure due to SlurmctldProlog
running from info() to debug() since this can be due to a race condition.

ba27a28e

Do not run HeathCheckProgram on powered down nodes · 7b336a56

Morris Jette authored 13 years ago

Do not attempt to run HeathCheckProgram on powered down nodes. Patch from
Ramiro Alba, Centre Tecnològic de Tranferència de Calor, Spain.

7b336a56

Oct 21, 2011
- BGQ - allow steps to be ran. · 4f001d30
  Danny Auble authored 13 years ago
  
  4f001d30
Oct 20, 2011
- BGQ - Fixes issue with sub-midplane systems in creating new blocks · 66675c51
  Danny Auble authored 13 years ago
  
  66675c51
- Added some missing calls to allow older versions of SLURM to talk to newer. · 606c4d44
  Danny Auble authored 13 years ago
  
  606c4d44
- Re-calculate rather than preserve a job's new prio if original prio <=1 · 3d9033f3
  Morris Jette authored 13 years ago
  
  3d9033f3
- BLUEGENE - Fixed issues with running on a sub-midplane system. · 6c335d6f
  Danny Auble authored 13 years ago
  
  6c335d6f
Oct 19, 2011
- Correct reason for pending job · d1727590
  Morris Jette authored 13 years ago
  
  Report correct job "Reason" if needed nodes are DOWN, DRAINED, or NOT_RESPONDING, "Resources" rather than "PartitionNodeLimit".
  d1727590
- Move slurm_select_init to proper place to avoid loading multiple select · e60abe43
  Danny Auble authored 13 years ago
  
  plugins in the slurmd.
  e60abe43
- Disable some SelectTypeParameters for select/linear that aren't compatible. · 78caf6c3
  Danny Auble authored 13 years ago
  
  78caf6c3