Commits · 828e4d2de96a4c36e7ead510164f4f9773a13ccc · tud-zih-energy / Slurm

May 13, 2015
- Fix small memory leak in backup controller. · c77aa354
  Brian Christiansen authored 9 years ago
  
  c77aa354
May 12, 2015
- Load libtinfo as needed with ncureses tools · 584f4d68
  Morris Jette authored 9 years ago
  
  584f4d68
May 11, 2015

Purge old step data on job requeue · beecc7b0

Morris Jette authored 9 years ago

Make sure that old step data is purged when a job is requeued.
Without this logic, if a job terminates abnormally then old step
data may be left in slurmctld. If the job is then requeued and
started on a different node, referencing that old job step data
can result in abnormal events. One specific failure mode is if
the job is requeued on a node with a different number of cores,
and the step terminated RPC arrives later, the job and step
bitmaps of allocated cores can differ in size generating an
abort.
bug 1660

beecc7b0

May 08, 2015
- Make sure each job has a wckey if that is something that is tracked. · 0896ca06
  Danny Auble authored 9 years ago
  
  0896ca06
- Ensure that SLURM_JOB_NAME is in a job's allocation. · c4e0bd9d
  Brian Christiansen authored 9 years ago
  
  Bug 1618
  c4e0bd9d
- Preserve errno on execve() failure in task plugin · bf81e826
  Jonathon Nelson authored 9 years ago
  
  bf81e826
May 07, 2015
- Make full node reservations display correctly the core count instead of · 00099596
  Danny Auble authored 9 years ago
  
  cpu count.
  00099596
May 06, 2015
- BLUEGENE - Set DB2NOEXITLIST when starting the slurmctld daemon to avoid · bce4b80f
  Danny Auble authored 9 years ago
  
  random crashing in db2 when the slurmctld is exiting. Signed-off-by: Danny Auble <da@schedmd.com>
  bce4b80f
May 05, 2015
- Cray: Add plugstack.conf.template sample SPANK config · 22a0e5a5
  Morris Jette authored 9 years ago
  
  22a0e5a5
May 01, 2015
- Fix sshare --Users: Initialize options, add usage info · b0b5bf3e
  Jens Svalgaard Kohrt authored 9 years ago
  
  b0b5bf3e
Apr 30, 2015

Change slurmctld agent timeout · 98e08216

Morris Jette authored 9 years ago

In slurmctld communication agent, make the thread timeout be the configured
value of MessageTimeout (or 30 seconds, whichever is larger) rather than
30 seconds.

98e08216

Fix scancel step cancel bug · 5cb067fc

Morris Jette authored 9 years ago

Fix scancel bug which could return an error on attempt to signal a job step.
A simple "scancel 12.3" to signal a specific job step would fail. Adding
another option (say "-i", "--partion=", etc.) would fix this.

5cb067fc

Initialize variables to prevent core dump. · 613afa0b
David Bigagli authored 9 years ago

613afa0b

Apr 29, 2015
- ALPS - Have the slurmstepd running a batch job wait for an ALPS release · 2eefdbd6
  Danny Auble authored 9 years ago
  
  before ending the job.
  2eefdbd6
- ALPS - Add new cray.conf variable NoAPIDSignalOnKill. When set to yes this · d4d64877
  Danny Auble authored 9 years ago
  
  will make it so the slurmctld will not signal the apid's in a batch job. Instead it relies on the rpc coming from the slurmctld to kill the job to end things correctly.
  d4d64877
Apr 28, 2015

scancel logic changes · 225a1dea

Morris Jette authored 9 years ago

Refactor scancel so that all pending jobs are cancelled before starting
cancellation of running jobs. Otherwise they happen in parallel and the
pending jobs can be scheduled on resources as the running jobs are being
cancelled.

225a1dea

Change default SchedulerParameters=max_sched_time from 4 seconds to 2. · 01da71b8
Danny Auble authored 9 years ago

01da71b8

Add SchedulerParameters option sched_min_interval · 26624602

jette authored 9 years ago

Add SchedulerParameters option of sched_min_interval that controls the
minimum time interval between any job scheduling action. The default value
is zero (disabled).
bug 1623

26624602

Apr 24, 2015
- variable initialization for srun --no-alloc · fe3f21fb
  Morris Jette authored 9 years ago
  
  Initialize some variables used with the srun --no-alloc option that may cause random failures.
  fe3f21fb
- Start v14.11.7 NEWS · 3180e4c7
  Morris Jette authored 9 years ago
  
  3180e4c7
Apr 22, 2015
- ALPS - Don't run a release on a reservation on the slurmctld for a batch · e1031b0c
  Danny Auble authored 9 years ago
  
  job. This is already handled on the stepd when the script finishes.
  e1031b0c
- ALPS - Added new SchedulerParameters=inventory_interval to specify how · 652ddab3
  Danny Auble authored 9 years ago
  
  often an inventory request is handled.
  652ddab3
- Fix for data shift when loading job archives. · c7b91668
  Brian Christiansen authored 9 years ago
  
  c7b91668
- Improve database interaction from controller. · 59b9c909
  Brian Christiansen authored 9 years ago
  
  59b9c909
Apr 21, 2015
- Remove xmalloc_nz from unpack functions. If the unpack ever failed the · 94a6a65a
  Danny Auble authored 9 years ago
  
  free afterwards would not have zeroed out memory on the variables that didn't get unpacked.
  94a6a65a
- Fix sbatch to set SLURM_JOB_NAME based on the command line option. · 1efc2011
  David Bigagli authored 9 years ago
  
  1efc2011
- Fix sbatch script parsing · 4e682555
  Morris Jette authored 9 years ago
  
  sbatch to stop parsing script for "#SBATCH" directives after first command. It keeps parsing so long as lines contain only white space or comments (first non-white space character is '#').
  4e682555
- Fixed compiler warnings generated by gcc version >= 4.6. · 87650241
  David Bigagli authored 9 years ago
  
  87650241
Apr 20, 2015
- Add SchedulerParameters option sched_max_job_start · c0eb47c2
  Morris Jette authored 9 years ago
  
  Add SchedulerParameters option of "sched_max_job_start=" to limit the number of jobs that can be started in any single execution of the main scheduling logic.
  c0eb47c2
- ALPS - Move basil_inventory to less confusing function. · d041e9fb
  Danny Auble authored 9 years ago
  
  d041e9fb
- Fix NEWS file · 4b01baad
  Danny Auble authored 9 years ago
  
  4b01baad
Apr 18, 2015
- Update NEWS · 4794d574
  David Bigagli authored 9 years ago
  
  4794d574
Apr 17, 2015
- Fix for array jobs submitted to multiple partitions not starting. · 2c6542d1
  Brian Christiansen authored 9 years ago
  
  Bug 1601
  2c6542d1
- Update NEWS · a77d2fbb
  David Bigagli authored 9 years ago
  
  a77d2fbb
- sreport - Fix Energy displays · 1f2ada02
  Danny Auble authored 9 years ago
  
  Bug 1603
  1f2ada02
Apr 16, 2015
- MySQL - Various memory leak fixes. One missed in fc67b70c · 2ac14686
  Danny Auble authored 9 years ago
  
  and update NEWS
  2ac14686
Apr 15, 2015
- MySQL - Fix issue when using the TrackSlurmctldDown and nodes are down at · 5d13e495
  Danny Auble authored 9 years ago
  
  the same time, don't double bill the down time.
  5d13e495
- FRONTEND - If doing a clean start make sure the nodes are brought up in the · fa35f611
  Danny Auble authored 9 years ago
  
  database.
  fa35f611
- sview - When right clicking on a tab make sure we don't display the page · a44e57ea
  Danny Auble authored 9 years ago
  
  list, but only the column list.
  a44e57ea
- Fix slurmdbd rollup bug · 3a44ecec
  Morris Jette authored 9 years ago
  
  Prevent slurmdbd error if cluster added or removed while rollup in progress. Removing a cluster can cause slurmdbd to abort. Adding a cluster can cause the slurmdbd rollup to hang.
  3a44ecec