- Oct 11, 2011
-
-
-
Matthieu Hautreux authored
With release_agent notified at the step cgroup level, the step cgroup can be removed while slurmstepd as not yet finished its internals epilog mechanisms. Inhibiting release agent at the step level and ensuring its proper removal helps to guarantee that the node will only be eligible for job execution when the resources will be completely available (no longer used by the job or the epilogs).
-
Matthieu Hautreux authored
A delay occurs between a task creation and its addition to a different cgroup than the inherited one. In the meantime, the process can disapear resulting in a ESRCH during the addition in the second cgroup. Now react to that event as a warning instead of an error.
-
Mark A. Grondona authored
Move the code that waits for parent signal before exec(2) out of exec_task() and into fork_all_tasks() directly. This makes all the code that handles the fork-and-wait into slurmstepd/mgr.c, and allows the exec_wait_child_wait_for_parent() function to be used in place of explicit read().
-
Mark A. Grondona authored
tty setup needs to occur before child tasks block waiting from signal to the parent, so move this code out of exec_task() into fork_all_tasks() so that the wait-for-signal-from-parent code can also later move out of exec_task().
-
Mark A. Grondona authored
As reported by Sam Lang on slurm-dev, task_epilog scripts are not held before exec, and thus there is a race condition between when the task_epilog is launched and slurmstepd calls slurm_container_add() during which the task_epilog script could either run to completion, or launch other processes that escape any job container defined by configuration. Use the new "exec_wait" api to have the child wait before exec just as is done in fork_all_tasks. Based on an original idea by Sam Lang <samlang@gmail.com>.
-
Mark A. Grondona authored
Remove the explicitly coded fork-and-wait-before-exec code from slurmstepd fork_all_tasks and replace with the "exec_wait" API. This change should be functionally identical to the previous code.
-
Mark A. Grondona authored
Abstract the code in slurmstepd fork_all_tasks that allows the parent to signal children before they call exec into an "exec_wait_info" interface. This will allow the code to be easily reused in other parts of slurmstepd (e.g. task epilog) without cut-and-paste of code.
-
jette authored
Prevent job hold by operator or account coordinator of his own job from being an Administrator Hold rather than User Hold by default.
-
- Oct 08, 2011
-
-
Mark A. Grondona authored
Move the code that waits for parent signal before exec(2) out of exec_task() and into fork_all_tasks() directly. This makes all the code that handles the fork-and-wait into slurmstepd/mgr.c, and allows the exec_wait_child_wait_for_parent() function to be used in place of explicit read().
-
Mark A. Grondona authored
tty setup needs to occur before child tasks block waiting from signal to the parent, so move this code out of exec_task() into fork_all_tasks() so that the wait-for-signal-from-parent code can also later move out of exec_task().
-
Mark A. Grondona authored
As reported by Sam Lang on slurm-dev, task_epilog scripts are not held before exec, and thus there is a race condition between when the task_epilog is launched and slurmstepd calls slurm_container_add() during which the task_epilog script could either run to completion, or launch other processes that escape any job container defined by configuration. Use the new "exec_wait" api to have the child wait before exec just as is done in fork_all_tasks. Based on an original idea by Sam Lang <samlang@gmail.com>.
-
Mark A. Grondona authored
Remove the explicitly coded fork-and-wait-before-exec code from slurmstepd fork_all_tasks and replace with the "exec_wait" API. This change should be functionally identical to the previous code.
-
Mark A. Grondona authored
Abstract the code in slurmstepd fork_all_tasks that allows the parent to signal children before they call exec into an "exec_wait_info" interface. This will allow the code to be easily reused in other parts of slurmstepd (e.g. task epilog) without cut-and-paste of code.
-
- Oct 07, 2011
-
-
Morris Jette authored
Prevent slurmctld crashing with divide by zero with a configuration of MaxMemPerCPU=0.
-
- Oct 05, 2011
-
-
Danny Auble authored
-
Danny Auble authored
block happens correctly now.
-
- Oct 04, 2011
-
-
Morris Jette authored
-
Morris Jette authored
Major re-write of the CPU Management User and Administrator Guide (web page) by Martin Perry, Bull.
-
Morris Jette authored
-
- Oct 03, 2011
-
-
Danny Auble authored
-
- Sep 30, 2011
-
-
Mark A. Grondona authored
PluginDir is a path. It shouldn't be an error to have duplicate plugins in your path. Plus, the error is not helpful because it doesn't specify which path is not being loaded. Therefore, just remove the error and load the first plugin in the path as expected.
-
Morris Jette authored
Fix bugs in sched/backfill with respect to QOS reservation support and job time limits. Patch from Alejandro Lucero Palau (Barcelona Supercomputer Center).
-
Morris Jette authored
-
Morris Jette authored
Fix to GRES allocation logic when resources are associated with specific CPUs on a node. Patch from Steve Trofinoff, CSCS.
-
- Sep 29, 2011
-
-
Danny Auble authored
(i.e. 1-9,0 instead of 0-9). The bug would cause 'sacct -N nodename' to not give correct results on these systems.
-
Danny Auble authored
is in an error state, won't deny jobs.
-
Danny Auble authored
-
Danny Auble authored
restarts of the slurmctld.
-
Danny Auble authored
-
Danny Auble authored
admin sets the state to error.
-
- Sep 28, 2011
-
-
Morris Jette authored
Advise use of the logrotate tool in order to avoid SLURM log files from growing too large. Patch from Rod Shultz, Bull.
-
Morris Jette authored
Do not treat the absence of a gres.conf file as a fatal error on systems configured with GRES, but set GRES counts to zero. These counts can be Counts can be altered by node_config_load() in the gres plugin.
-
Danny Auble authored
-
Danny Auble authored
-
- Sep 27, 2011
-
-
Mark A. Grondona authored
The slurmctld code that processes job notify messages unecessarily restricts these messages to be from the slurm user or root. This patch allows users to send notifications to their own jobs.
-
- Sep 26, 2011
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Many cosmetic modifications to eliminate warning message from GCC version 4.6 compiler, mostly due to unused variables.
-