- Sep 17, 2003
-
-
jwindley authored
-
Moe Jette authored
-
Moe Jette authored
1=required nodes DOWN/DRAINED.
-
Moe Jette authored
unavailable states).
-
Moe Jette authored
right away (before running scheduling function).
-
Moe Jette authored
returned to service. The priority is changed from 1 to value which would be set for the job if submitted at that time. (gnats:279)
-
Moe Jette authored
nodes which are not available (DOWN or DRAIN). This will prevent them from blocking other jobs from using the nodes which are available (i.e. over-ride FIFO scheduling). (gnats:279)
-
Moe Jette authored
Without doing so, its internal record of jobs from its last period of activity are resurrected.
-
Moe Jette authored
-
- Sep 16, 2003
-
-
Moe Jette authored
MAX_SERVER_THREADS is exceeded. Thread counter, mutex, and cond logic all moved into new allocate/deallocate server thread functions.
-
Moe Jette authored
-
Mark Grondona authored
-
Mark Grondona authored
-
Moe Jette authored
-
Moe Jette authored
assumes control. It previously captured state only when the backup controller daemon was initiated.
-
Moe Jette authored
This was not happening for the backup slurmctld.
-
- Sep 15, 2003
-
-
Moe Jette authored
-
Moe Jette authored
-
Mark Grondona authored
-
Mark Grondona authored
-
Mark Grondona authored
setting SLURM_NODELIST in the environment)
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
in slurmd killing itself if the KILL_JOB RPC arrived before the job began execution (the pid in the data structure was still zero.
-
- Sep 13, 2003
-
-
Moe Jette authored
cases. Exit code is now 0 only if all commands execute without error. Exit code is 1 if any failure occurs for any command executed. (gnats:278)
-
- Sep 12, 2003
-
-
Mark Grondona authored
-
Mark Grondona authored
-
Moe Jette authored
when the job does not exist).
-
Moe Jette authored
it is a duplicate record.
-
Mark Grondona authored
-
Mark Grondona authored
o check for a job step state of STARTED before issuing kill_job rpc
-
Moe Jette authored
was only going to 65500 for the job_id and the step_id was always zero. This change does not elimiate the possibility of an error, but reduces its probability by a factor of about 65000. (gnats:276)
-
Moe Jette authored
to job_kill request and slurmctld leaves node and job in COMPLETING state until the slurmd issues an EPILOG_COMPLETE RPC on each node. This permits better support for non-killable processes and/or long-running epilog scripts. Several minor changes in node registration handling and slurmctld agent logic to better address a flood of incomming RPC (typically when system restarts). (gnats:268)
-
- Sep 11, 2003
-
-
Moe Jette authored
-
- Sep 10, 2003
- Sep 09, 2003
-
-
Mark Grondona authored
-
Mark Grondona authored
-
Mark Grondona authored
may result in multiple executions of system epilog for a single job (gnats:267)
-