- Sep 23, 2003
-
-
Moe Jette authored
scalability. An arbitrary number of requests may be queued and they are processed one per second until the queue is empty or pending requests were last attempted recently (configuration parameters set to 60 seconds as a minimum retry interval).
-
- Sep 17, 2003
-
-
Moe Jette authored
-
- Sep 12, 2003
-
-
Moe Jette authored
when the job does not exist).
-
Moe Jette authored
to job_kill request and slurmctld leaves node and job in COMPLETING state until the slurmd issues an EPILOG_COMPLETE RPC on each node. This permits better support for non-killable processes and/or long-running epilog scripts. Several minor changes in node registration handling and slurmctld agent logic to better address a flood of incomming RPC (typically when system restarts). (gnats:268)
-
- Aug 04, 2003
-
-
Moe Jette authored
batch_job_launch RPC, then deallocate those resources and requeue the job. If a node registers and fails to show a batch job that should have a script running there (node zero of allocation), then consider the job complete.
-
- Aug 02, 2003
-
-
Moe Jette authored
"Can't connect to node" with every ping failure.
-
Moe Jette authored
-
Moe Jette authored
Changed the logging level of a few other message.
-
Moe Jette authored
until the previous one completes. This avoids having too many cycles active (and a bunch of threads too). Ping_nodes control functions moved to a new module.
-
- Jul 23, 2003
-
-
Moe Jette authored
-
- Jul 15, 2003
-
-
Moe Jette authored
Perform some general code clean-up in those areas.
-
- Jul 14, 2003
-
-
Moe Jette authored
while fail/if too many then fatal/sleep and retry
-
- Jul 08, 2003
-
-
Moe Jette authored
-
- Jun 27, 2003
-
-
Moe Jette authored
-
- Jun 23, 2003
-
-
Moe Jette authored
out message was sent (e.g., slurmd down, msg sent to slurmd, slurmd up and registers, msg previously sent to slurmd times out).
-
- Jun 18, 2003
-
-
Moe Jette authored
rather than letting agent go off the end of an array.
-
- Jun 13, 2003
-
-
Mark Grondona authored
failed (presumably due to unkillable processes) o retry failed JOB_KILL rpcs
-
- Jun 12, 2003
- Jun 11, 2003
-
-
Mark Grondona authored
instead of just kill_wait seconds because slurmd sleeps for kill_wait seconds, so therefore slurmctld would never recv a reply.
-
- May 28, 2003
-
-
Mark Grondona authored
slurm_send_recv_node_msg(), slurm_send_recv_rc_msg(), etc. o Fixed fd leak in agent.c using slurm_send_recv_rc_msg() w/ timeout.
-
- Mar 26, 2003
-
-
Moe Jette authored
reference if a reconfigure RPC was active at the same time.
-
- Mar 14, 2003
- Mar 13, 2003
-
-
Moe Jette authored
-
- Mar 05, 2003
-
-
Moe Jette authored
-
Moe Jette authored
slurmctld sends REQUEST_KILL_TIMELIMIT when job reaches its time limit, change job state to TIMEOUT | COMPLETING and node state to COMPLETING.
-
Moe Jette authored
Add COMPLETING node and job states. Major restructuring of slurmctld node and job state transition code.
-
- Feb 25, 2003
-
-
Moe Jette authored
its nodes are put in state COMPLETING (used to be IDLE) and a revoke credential RPC is issued to all slurmd's. When the slurmd's respond (after running the epilog and confirming that all tasks are completed), the node's state is changed to IDLE and other jobs can run.
-
- Feb 14, 2003
-
-
Moe Jette authored
Initialize/free/set alloc_sid and alloc_node in API functions. Pack/unpack/free new elements in job descriptor RPCs. Load/dump/pack new elements into job table records.
-
- Jan 27, 2003
-
-
Moe Jette authored
-
- Jan 24, 2003
- Jan 23, 2003
-
-
Moe Jette authored
-
- Jan 13, 2003
-
-
Moe Jette authored
-
- Jan 10, 2003
-
-
Moe Jette authored
Add support in slurmctld for SlurmctldDebug on startup and reconfig RPC.
-
- Jan 09, 2003
-
-
Moe Jette authored
Slurmctld add node list to job step (leave bitmap too, faster) Jette Slurmctld fix node state when job keeps running and node fails Jette Slurmctld job requests with min-max resource requirement pair Jette Slurmctld job requests with excluded node list (--exclude) Jette
-
- Dec 27, 2002
-
-
Moe Jette authored
job time limit. Defined pack/unpack/free functions for RPC. Slurmctld modified to broadcast REQUEST_UPDATE_JOB_TIME RPCs.
-
- Dec 18, 2002
-
-
Moe Jette authored
-