- Sep 17, 2011
-
-
Danny Auble authored
jobs happen to be running on blocks not in the new config.
-
- Sep 16, 2011
-
-
Morris Jette authored
salloc/mpirun does not play well together with task affinity socket binding. The following example illustrates the problem. [sulu] (slurm) mnp> salloc -p bones-only -N1-1 -n3 --cpu_bind=socket mpirun cat /proc/self/status | grep Cpus_allowed_list salloc: Granted job allocation 387 -------------------------------------------------------------------------- An invalid physical processor id was returned ... The problem is that with mpirun jobs Slurm launches only a single task, regardless of the value of -n. This confuses the socket binding logic in task affinity. The result is that task affinity binds the task to only a single cpu, instead of all the allocated cpus on the socket. When mpi attempts to bind to any of the other allocated cpus on the socket, it gets the "invalid physical processor id" error. Note that the problem may occur even if socket binding is not explicitly requested by the user. If task/affinity is configured and the allocated CPUs are a whole number of sockets, Slurm will use "implicit auto binding" to sockets, triggering the problem. Patch from Martin Perry (Bull).
-
Morris Jette authored
Update reservation web page to describe mechanism to reserve CPUs rather than whole nodes and provide an example.
-
- Sep 15, 2011
-
-
Morris Jette authored
Avoid clearing a job's reason from JobHeldAdmin or JobHeldUser when it is otherwise updated using scontrol or sview commands. Patch based upon work by Phil Eckert (LLNL).
-
Morris Jette authored
Do not remove the backup slurmctld's pid file when it assumes control, only when it actually shuts down. Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Danny Auble authored
-
- Sep 14, 2011
-
-
Danny Auble authored
-
Danny Auble authored
variable wasn't initialized in the job structure making it so that job wouldn't run.
-
Danny Auble authored
-
- Sep 13, 2011
-
-
Danny Auble authored
-
- Sep 12, 2011
-
-
Danny Auble authored
on a L/P system or Q.
-
Danny Auble authored
-
Danny Auble authored
switching from a bluegene to a regular system with sview.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
when using multi-cluster mode in sview
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
type set it that way right off the start.
-
Danny Auble authored
conn_types.
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
This should eliminate an abort reported by Sam Long when a message unpack from the slurmdbd is invalid. Based upon patch from Sam Lang.
-
Morris Jette authored
was reporting -1 if unknown, new code avoids printing any uid if not known. Patch by Don Albert, Bull.
-
Morris Jette authored
The "+" sign only appears to indicate that there is more information than the current field width can hold. For example "CANCELLED+" can indicate that the state is actually "CANCELLED by nnn". Increasing the field width with the %NUMBER format modifier will show this. Patch by Don Albert, Bull.
-
- Sep 10, 2011
-
-
Danny Auble authored
-
Danny Auble authored
conn_types in a block definition.
-
Danny Auble authored
allow them on Q systems.
-
Danny Auble authored
allocation with jobs running on blocks that don't exist in the static setup.
-
- Sep 09, 2011
-
-
Morris Jette authored
This modifcation improves the performance of SLURM's preemption logic be reducing the execution time of the scheduling logic and doing a better job of minimizing the number of job's preempted to initiate a new job. Based largely upon work by Phil Eckert, LLNL.
-
Morris Jette authored
-
Morris Jette authored
When a user changes a job's configuration (e.g. size), then recalculate it's priority. Based upon a patch from Phil Eckert, LLNL.
-
- Sep 08, 2011
-
-
Morris Jette authored
If there is no SchedLogfile defined and 'scontrol schedloglevel 1' is issued from an administrator, slurmctld will segfault at the next "sched: " log message due to NULL log file pointer. There are obviously multiple ways to fix this issue, but in this patch the RPC simply returns and "Operation Disabled" error immediately if the sched log file is NULL. Other options include opening a new logfile with a default name, sending sched log messages to stderr, or enhancing the scontrol interface to allow specifying a logfile name for the schedlog. There are other cases in the schedlog code that could cause problems for the slurmctld, but since the sched log stuff is tied in strangely with the rest of the logging code, I didn't want to try modifying anything in log.c, for fear of breaking the normal logging functions. Patch from Mark Grondona, LLNL.
-
Morris Jette authored
Add State=ACTIVE or State=INACTIVE to "scontrol show reservation" output. Patch from Phil Eckert, LLNL.
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
and vice versa the node->base partition lists will be displayed if setup in your .slurm/sviewrc file.
-
- Sep 07, 2011
-
-
Morris Jette authored
-
Morris Jette authored
This removes a duplicate call to _log_msg() when a user's access is denied or "no_sys_info" command line option is NOT set. The duplicate log message was a result of commit 12ba7f70 on January 29, 2010. Patch from Mark Grondona, LLNL.
-