Commits · 9d557cedba6de0e9ae6853434bf28c38ec6244e8 · tud-zih-energy / Slurm

Apr 09, 2013
- CRAY - Fix sanity check for systems with more than 32 cores per node. · 9d557ced
  Danny Auble authored 11 years ago
  
  9d557ced
- Disable srun --pty option unless LaunchType=launch/slurm. · 3eb0414c
  Morris Jette authored 11 years ago
  
  Fix for bug 258
  3eb0414c
Apr 02, 2013
- BLUEGENE - Fix issue where when doing backfill preemptable jobs were · f12f9e2c
  Danny Auble authored 11 years ago
  
  never looked at to determine eligibility of backfillable job.
  f12f9e2c
- Modify slurmdbd to retransmit to slurmctld daemon if it is not responding. · 84c45d70
  Morris Jette authored 11 years ago
  
  84c45d70
- BGQ - Fix issue on state recover if block states are not around · 6af6f615
  Danny Auble authored 11 years ago
  
  and when reading in state from DB2 we find a block that can't be created. You can now do a clean start to rid the bad block.
  6af6f615
- BGQ - Fix for when a step completes in Slurm before the runjob_mux notifies · 8486c97e
  Danny Auble authored 11 years ago
  
  the slurmctld there were software errors on some nodes.
  8486c97e
- BGQ - Fix race condition were a job could of been removed from a block · 9fe4f169
  Danny Auble authored 11 years ago
  
  without it still existing there. This is extremely rare.
  9fe4f169
- BGQ - Fix issue where if for some reason we are freeing a block with · 8a49d19f
  Danny Auble authored 11 years ago
  
  a pending job on it we don't kill the job.
  8a49d19f
- BGQ - Handle issue where blocks would have a pending job on them and · 59f87c1f
  Danny Auble authored 11 years ago
  
  while it was free cnodes would go into software error and kill the job.
  59f87c1f
Apr 01, 2013
- Reset a job's reason from PartitionDown when the partition is set up · c1a0ef0c
  Morris Jette authored 11 years ago
  
  Fix for bug 224
  c1a0ef0c
Mar 29, 2013
- BGQ - Push action 'D' info to scontrol for admins. · 257a97ef
  Danny Auble authored 12 years ago
  
  257a97ef
- Add sanity check for NULL cluster names trying to register. · 4b862252
  Danny Auble authored 12 years ago
  
  4b862252
Mar 27, 2013
- Added support for FreeBSD. · 5338879e
  Jason Bacon authored 12 years ago
  
  5338879e
- Purge vestigial job scripts · ea3c9f0b
  Morris Jette authored 12 years ago
  
  WIthout this patch, when the slurmd cold starts or slurmstepd terminates abnormally, the job script file can be left around. bug 243
  ea3c9f0b
- Reject job at submit time if the node count is invalid · f1cf6d2d
  Morris Jette authored 12 years ago
  
  Previously such a job submitted to a DOWN partition would be queued. bug 187
  f1cf6d2d
Mar 26, 2013
- Accounting - Minor fix to avoid reuse of variable erroneously. · 9403500e
  Danny Auble authored 12 years ago
  
  9403500e
- Accounting - When rolling up data from past usage ignore "idle" time from · 2ed8a4d6
  Danny Auble authored 12 years ago
  
  a reservation when it has the "Ignore_Jobs" flag set. Since jobs could run outside of the reservation in it's nodes without this you could have double time.
  2ed8a4d6
Mar 25, 2013
- Cray - Disable enforcement of MaxTasksPerNode · aacdb424
  Morris Jette authored 12 years ago
  
  This is not applicable with launch/aprun
  aacdb424
- Note nature of last two patches from Hongjia Cao · a63e616e
  Morris Jette authored 12 years ago
  
  a63e616e
Mar 22, 2013

Select/cray - Modify build to enable direct use of libslurm library. · 7d4f145a

Morris Jette authored 12 years ago

These changes are required so that select/cray can load select/linear,
  which is a bit more complex than the other select plugin structures.
Export plugin_context_create and plugin_context_destroy symbols from
  libslurm.so.
Correct typo in exported hostlist_sort symbol name
Define some functions in select/cray to avoid undefined symbols if
  the plugin is loaded via libslurm rather than from a slurm command
  (which has all of the required symbols)

7d4f145a

Mar 20, 2013
- [PATCH] fix of job requiring contiguous nodes can not run · e416e35f
  Hongjia Cao authored 12 years ago
  
  e416e35f
- SlurmDBD - fix to allow user root along with the slurm user to register a · 485cb062
  Danny Auble authored 12 years ago
  
  cluster.
  485cb062
Mar 19, 2013
- Select/cons_res - Tighter packing of job allocations on sockets. · 7fcdc7e5
  Morris Jette authored 12 years ago
  
  7fcdc7e5
- Note nature of latest change · 8e038b5c
  Morris Jette authored 12 years ago
  
  8e038b5c
- Do not report error when job step terminates while sstat is running · 4cb6137c
  Morris Jette authored 12 years ago
  
  4cb6137c
Mar 14, 2013
- sreport - Fix by adding planned down time to utilization reports. · dced5e7f
  Danny Auble authored 12 years ago
  
  dced5e7f
- Accounting - more checks for strings with a possible `'` in it. · ff021de1
  Danny Auble authored 12 years ago
  
  ff021de1
- CRAY - Fix SLURM_TASKS_PER_NODE to be set correctly. · 5c370edb
  Danny Auble authored 12 years ago
  
  5c370edb
Mar 13, 2013

Correction to error returned by step request error for too many CPUs · 36df0bbf

Morris Jette authored 12 years ago

If step requests more CPUs than possible in specified node count of job
allocation then return ESLURM_TOO_MANY_REQUESTED_CPUS rather than
ESLURM_NODES_BUSY and retrying.

36df0bbf

Mar 12, 2013
- Minor format changes from previous commit · f5a89755
  Morris Jette authored 12 years ago
  
  f5a89755
Mar 11, 2013
- Export SLURM_ environment variables from sbatch, even if not --exported · 336bd7bf
  Nathan Yee authored 12 years ago
  
  Without this change, when the sbatch --export option is used, many Slurm environment variables are not set unless explicitly exported.
  336bd7bf
- Fix for sacctmgr add qos to handle the 'flags' option. · c8498b0d
  Danny Auble authored 12 years ago
  
  c8498b0d
Mar 08, 2013
- Start NEWS for v2.5.5 · ad3caaae
  Morris Jette authored 12 years ago
  
  ad3caaae
- Fix to handle init.d script for querying status and not return 1 on · 01e855a9
  Danny Auble authored 12 years ago
  
  success
  01e855a9
Mar 07, 2013

GRES topology bug in core selection logic fixed. · 07eb5d24

jette authored 12 years ago

This problem would effect systems in which specific GRES are associated
with specific CPUs.
One possible result is the CPUs identified as usable could be inappropriate
and job would be held when trying to layout out the tasks on CPUs (all
done as part of the job allocation process).
The other problem is that if multiple GRES are linked to specific CPUs,
there was a CPU bitmap OR which should have been an AND, resulting in
some CPUs being identified as usable, but not available to all GRES.

07eb5d24

Mar 06, 2013
- BGQ - More robust checking for correct node, task, and ntasks-per-node · 3419a62c
  Danny Auble authored 12 years ago
  
  options in srun, and push that logic to salloc and sbatch. Bug 201
  3419a62c
- BGQ - If signal is NODE_FAIL allow forward even if job is completing · de8232d8
  Danny Auble authored 12 years ago
  
  and timeout in the runjob_mux trying to send in this situation. Bug 223
  de8232d8
Mar 04, 2013
- Prevent slurmctld assert after invalid reservation update attempt · c97129d6
  Morris Jette authored 12 years ago
  
  The original reservation data structure is deleted and it's backup added to the reservation list, but jobs can retain a pointer to the original (now invalid) reservation data structure. Bug 250
  c97129d6
- sdiag command - Correction to jobs started value reported. · c3b4f76f
  Alejandro Lucero Palau authored 12 years ago
  
  c3b4f76f
Mar 01, 2013
- BGQ - Allow user to request full dimensional mesh. · c9e5b072
  Danny Auble authored 12 years ago
  
  c9e5b072