- Aug 04, 2003
-
-
Moe Jette authored
batch_job_launch RPC, then deallocate those resources and requeue the job. If a node registers and fails to show a batch job that should have a script running there (node zero of allocation), then consider the job complete.
-
- Jul 25, 2003
-
-
Moe Jette authored
-
- Jul 18, 2003
-
-
Moe Jette authored
-
- Jul 15, 2003
- Jul 14, 2003
-
-
Moe Jette authored
while fail/if too many then fatal/sleep and retry
-
- Jun 28, 2003
-
-
Moe Jette authored
-
- Jun 27, 2003
-
-
Moe Jette authored
but don't wait for any reply. We remove the DOWN node from the job's bitmap. As soon as the other nodes complete the KILL_JOB RPC, the job transistions to some COMPLETED state.
-
- Jun 25, 2003
-
-
Moe Jette authored
If a node is down and not responding, don't bother to send a KILL_JOB RPC to it. If that is the only node associated with a job, don't have that job go through a COMPLETING state. It goes directly to a COMPLETED state. Also preserve the NO_RESPOND flag associated with a node if its state is changed via user request (e.g. scontrol).
-
- May 30, 2003
-
-
Moe Jette authored
Previously slurmctld was not filtering out nodes failing to satisfy the job's mincpus, mem, or tmp constraints.
-
- Apr 02, 2003
-
-
Moe Jette authored
count (*not* node count) and exceeds the partition's node limit for a job.
-
- Mar 13, 2003
-
-
Moe Jette authored
-
- Mar 10, 2003
- Mar 07, 2003
- Mar 05, 2003
- Feb 26, 2003
-
-
Moe Jette authored
-
- Feb 25, 2003
-
-
Moe Jette authored
its nodes are put in state COMPLETING (used to be IDLE) and a revoke credential RPC is issued to all slurmd's. When the slurmd's respond (after running the epilog and confirming that all tasks are completed), the node's state is changed to IDLE and other jobs can run.
-
- Feb 14, 2003
-
-
Moe Jette authored
Make output formatting match config file (UNLIMITED instead of INFINITE or huge number). Only print job end_time if meaningful.
-
- Jan 29, 2003
-
-
Moe Jette authored
All references to header files changed to new pathname.
-
- Jan 27, 2003
-
-
Moe Jette authored
-
- Jan 23, 2003
-
-
Moe Jette authored
-
- Jan 17, 2003
-
-
Moe Jette authored
-
- Jan 16, 2003
-
-
Mark Grondona authored
prolog is run as root and failure is reported as a non-zero return code from the launch tasks or launch batch job messages epilog is run as result of revoke credential message
-
- Jan 09, 2003
-
-
Moe Jette authored
Slurmctld add node list to job step (leave bitmap too, faster) Jette Slurmctld fix node state when job keeps running and node fails Jette Slurmctld job requests with min-max resource requirement pair Jette Slurmctld job requests with excluded node list (--exclude) Jette
-
- Dec 30, 2002
-
-
Moe Jette authored
Also add exclude=node_list to allocate RPCs. slurmctld accepts new RPC arguments, but does nothing with them (yet).
-
- Dec 28, 2002
-
-
Moe Jette authored
-
- Dec 16, 2002
-
-
Moe Jette authored
Slurmctld priority of zero treated like job hold (not scheduled). Preserve priority order even if incomming job could use idle resources.
-
- Dec 02, 2002
-
-
Moe Jette authored
initiation. Establish limit of MAX_TASKS_PER_NODE and add matching error code.
-
- Dec 01, 2002
-
-
Moe Jette authored
-
- Nov 22, 2002
-
-
Moe Jette authored
-
- Nov 06, 2002
-
-
Moe Jette authored
-
- Oct 25, 2002
-
-
Moe Jette authored
to use.
-
- Oct 17, 2002
- Oct 14, 2002
-
-
Moe Jette authored
-
- Sep 25, 2002
-
-
Moe Jette authored
-