Commits · b24e2ead5a8afba665ff1e2051da371b85e90b09 · tud-zih-energy / Slurm

Aug 02, 2012
- sched/backfill: Improve accuracy of expected job start · 610f83c1
  Morris Jette authored 12 years ago
  
  This patch adds logic to work around advanced reservations better.
  610f83c1
Aug 01, 2012
- Add support for more than 256 jobs per node · fd8026b8
  Morris Jette authored 12 years ago
  
  Change node_req field in struct job_resources from 8 to 32 bits so we can run more than 256 jobs per node.
  fd8026b8
- Accounting - Fix so complete 32 bit numbers can be put in for a priority. · e3d2b258
  Danny Auble authored 12 years ago
  
  e3d2b258
- Start NEWS for version 2.4.3 · a4c02041
  Morris Jette authored 12 years ago
  
  a4c02041
- BGQ - update documentation about runjob_mux_refresh_config which works · 5a110989
  Danny Auble authored 12 years ago
  
  correctly as of IBM driver V1R1M1 efix 008.
  5a110989
- Force slurmd exit after 2 minute wait, even if threads are hung. · 30411bf1
  Morris Jette authored 12 years ago
  
  30411bf1
- FRONTEND - Made error warning more apparent if a frontend node isn't · 4f24ccb5
  Danny Auble authored 12 years ago
  
  configured correctly.
  4f24ccb5
Jul 31, 2012

Fixed sacct --state=S query to return information about suspended jobs · f8ff6b38
Danny Auble authored 12 years ago
```
current or in the past.
```
f8ff6b38
BLUEGENE - correct start time setup when no jobs are blocking the way · 50f698d9
Mark Nelson authored 12 years ago
```
from Mark Nelson
```
50f698d9

Use mount and umount syscalls when handling cgroup namespaces. · 485c80bc

Janne Blomqvist authored 12 years ago

Using the syscalls directly rather than calling bin/(u)mount via
system() avoids a few fork + exec calls, and provides better error
handling if something goes wrong.

Users of this functionality are also updated to use slurm_strerror in
order to provide a more informative error message.

The mount and umount syscalls are Linux-specific, but so are cgroups
so no portability is lost.

485c80bc

remove last patch to give author credit · 557c52d1
Danny Auble authored 12 years ago

557c52d1

Use mount and umount syscalls when handling cgroup namespaces. · c3889ec4

Danny Auble authored 12 years ago

Using the syscalls directly rather than calling bin/(u)mount via
system() avoids a few fork + exec calls, and provides better error
handling if something goes wrong.

Users of this functionality are also updated to use slurm_strerror in
order to provide a more informative error message.

The mount and umount syscalls are Linux-specific, but so are cgroups
so no portability is lost.

c3889ec4

Use mount and umount syscalls when handling cgroup namespaces. · b4c1d3d7

Danny Auble authored 12 years ago

Using the syscalls directly rather than calling bin/(u)mount via
system() avoids a few fork + exec calls, and provides better error
handling if something goes wrong.

Users of this functionality are also updated to use slurm_strerror in
order to provide a more informative error message.

The mount and umount syscalls are Linux-specific, but so are cgroups
so no portability is lost.

b4c1d3d7

BGQ - added version string to the load of the runjob_mux plugin to verify · 610cfe65
Danny Auble authored 12 years ago
```
    the current plugin has been loaded when using runjob_mux_refresh_config
```
610cfe65

Jul 30, 2012
- Attribute recent work for "sinfo -T" option · 439beeb7
  Morris Jette authored 12 years ago
  
  439beeb7
Jul 27, 2012

Enhancements to sinfo reservation output · 3fd84252

Morris Jette authored 12 years ago

I would like to make two changes to this:

1) since the reservation name can easily exceed 9 characters, I would like the field to be however large it needs to be without truncating the name. I did this by looking at the names then setting the field size to that width.

2) The other headers are in capitals, so I changed

ResName State StartTime EndTime Duration Nodelist

RESV_NAME STATE START_TIME END_TIME DURATION NODELIST

3fd84252

Jul 26, 2012
- Add Google search to all web pages. · d08b1ac1
  Morris Jette authored 12 years ago
  
  d08b1ac1
- Correct parsing of srun/sbatch input/output/error file names starting with "none" · 4234e00a
  Morris Jette authored 12 years ago
  
  Correct parsing of srun/sbatch input/output/error file names so that only the name "none" is mapped to /dev/null and not any file name starting with "none" (e.g. "none.o"). This fixes bug #98.
  4234e00a
Jul 24, 2012

Gres: Fix for tracking allocated resources when one item and associated file · 102258a2

Morris Jette authored 12 years ago

Gres: If a gres has a count of one and an associated file then when doing
a reconfiguration, the node's bitmap was not cleared resulting in an
underflow upon job termination or removal from scheduling matrix by the
backfill scheduler.

102258a2

Jul 23, 2012

Cray and BlueGene: Correct logic for front-end node allocation tracking · ca95f242

Morris Jette authored 12 years ago

Cray and BlueGene - Do not treat lack of usable front-end nodes when
slurmctld deamon starts as a fatal error. Also preserve correct front-end
node for jobs when there is more than one front-end node and the slurmctld
daemon restarts.

ca95f242

Jul 19, 2012
- BLUEGENE - Fix for handling blocks when a larger block will not free and · 1b2b3c85
  Danny Auble authored 12 years ago
  
  while it is attempting to free underlying hardware is marked in error making small blocks overlapping with the freeing block. This only applies to dynamic layout mode.
  1b2b3c85
- Reset backfilled job counter only when explicitly cleared using scontrol. · b4202119
  Alejandro Lucero Palau authored 12 years ago
  
  b4202119
Jul 16, 2012
- cpu_freq support: cosmetic modifications · d8567c6d
  Morris Jette authored 12 years ago
  
  d8567c6d
Jul 13, 2012
- Fix initialization of protocol_version for some messages to make sure it · b34e5c28
  Danny Auble authored 12 years ago
  
  is always set when sending or receiving a message.
  b34e5c28
- BGL - Fix for syncing users on block from Tim Wickberg · 865bec2a
  Tim Wickberg authored 12 years ago
  
  865bec2a
Jul 12, 2012
- BGQ - Make it possible for a multi midplane allocation to run on more · 010570f4
  Danny Auble authored 12 years ago
  
  than 1 midplane but not the entire allocation.
  010570f4
- BGQ - correct logic to place multiple (< 1 midplane) steps inside a · 5ed86088
  Danny Auble authored 12 years ago
  
  multi midplane block allocation.
  5ed86088
- BGQ - correctly remove running jobs when freeing a shared block. · a1f9b6a7
  Danny Auble authored 12 years ago
  
  a1f9b6a7
- BLUEGENE - Handle job completion correctly if an admin removes a block · 5430c095
  Danny Auble authored 12 years ago
  
  where other blocks on an overlapping midplane are running jobs.
  5430c095
Jul 11, 2012
- BLUEGENE - If a large block (> 1 midplane) is in error and underlying · 0c371d36
  Danny Auble authored 12 years ago
  
  hardware is marked bad remove the larger block and create a block over just the bad hardware making the other hardware available to run on.
  0c371d36
- BGQ - make sure we have a valid block when creating or finishing a step · 4731a11b
  Danny Auble authored 12 years ago
  
  allocation.
  4731a11b
- BLUEGENE - remove race condition where if a block is removed while waiting · 11e2759f
  Danny Auble authored 12 years ago
  
  for a job to finish on it the number of unused cpus wasn't updated correctly.
  11e2759f
Jul 10, 2012

Correct job node_cnt value for job completion plugin · 97ce2e19

Morris Jette authored 12 years ago

When using the jobcomp/script interface, we have noticed the NODECNT
environment variable is off-by-one when logging completed jobs in
the NODE_FAIL state (though the NODELIST is correct).

This appears to be because in many places in job_completion_logger()
is called after deallocate_nodes(), which appears to decrement
job->node_cnt for DOWN nodes.

If job_completion_logger() only called the job completion plugin,
then I would guess that it might be safe to move this call ahead
of deallocate_nodes(). However, it seems like job_completion_logger()
also does a bunch of accounting stuff (?), so perhaps that would
need to be split out first?

Also, there is the possibility that this is working as designed,
though if so a well placed comment in the code might be appreciated.
If the decreased nodecount is intended, though, should the DOWN
nodes also be removed from the job's NODELIST? - Mark Grondona

97ce2e19

Jul 09, 2012
- Fix bug in task layout with select/cons_res plugin and --ntasks-per-node · f9f087f2
  Martin Perry authored 12 years ago
  
  See Bugzilla #73 for more complete description of the problem. Patch by Martin Perry, Bull.
  f9f087f2
Jul 06, 2012

Fix for incorrect partition point for job · dd1d573f

Carles Fenoy authored 12 years ago

If job is submitted to more than one partition, it's partition pointer can
be set to an invalid value. This can result in the count of CPUs allocated
on a node being bad, resulting in over- or under-allocation of its CPUs.
Patch by Carles Fenoy, BSC.

Hi all,

After a tough day I've finally found the problem and a solution for 2.4.1
I was able to reproduce the explained behavior by submitting jobs to 2 partitions.
This makes the job to be allocated in one partition but in the schedule function the partition of the job is changed to the NON allocated one. This makes that the resources can not be free at the end of the job.

I've solved this by changing the IS_PENDING test some lines above in the schedule function in (job_scheduler.c)

This is the code from the git HEAD (Line 801). As this file has changed a lot from 2.4.x I have not done a patch but I'm commenting the solution here.
I've moved the if(!IS_JOB_PENDING) after the 2nd line (part_ptr...). This prevents the partition of the job to be changed if it is already starting in another partition.

job_ptr = job_queue_rec->job_ptr;

part_ptr = job_queue_rec->part_ptr;
job_ptr->part_ptr = part_ptr;
xfree(job_queue_rec);

if (!IS_JOB_PENDING(job_ptr))

continue; /* started in other partition */

Hope this is enough information to solve it.

I've just realized (while writing this mail) that my solution has a memory leak as job_queue_rec is not freed.

Regards,
Carles Fenoy

dd1d573f

Jul 03, 2012
- BLUEGENE - Correct potential deadlock issue when hardware goes bad and · f0949d91
  Danny Auble authored 12 years ago
  
  there are jobs running on that hardware.
  f0949d91
- Add DebugFlag of Switch to log switch plugin details · 94a7fd3f
  Morris Jette authored 12 years ago
  
  94a7fd3f
- Add support for advanced reservations at core resolution · f87e3a0a
  Alexjandro Lucero Palau authored 12 years ago
  
  Add support for advanced reservation for specific cores rather than whole nodes. Current limiations: homogeneous cluster, nodes idle when reservation created, and no more than one reservation per node. Code is still under development. Work by Alejandro Lucero Palau, et. al, BSC.
  f87e3a0a
Jul 02, 2012
- Fix bug for job state change from 2.3 -> 2.4 job state can now be preserved · 3bc86988
  Carles Fenoy authored 12 years ago
  
  correctly when transitioning. This also applies for 2.4.0 -> 2.4.1, no state will be lost. (Thanks to Carles Fenoy)
  3bc86988
Jun 29, 2012

Add Part_Nodes flag to reservation request · 0c677bc3

Bill Brophy authored 12 years ago

Add reservation flag of Part_Nodes to allocate all nodes in a partition to
a reservation and automatically change the reservation when nodes are
added to or removed from the reservation. Based upon work by
Bill Brophy, Bull.

0c677bc3