- Sep 21, 2003
-
-
Moe Jette authored
control (it needs to complete all pending RPCs and save state before the primary reads state and takes over).
-
- Sep 20, 2003
-
-
Moe Jette authored
EPILOG_COMPLETE_MESSAGE. At this time the job is COMPLETED and all associated nodes available.
-
- Sep 17, 2003
-
-
Moe Jette authored
returned to service. The priority is changed from 1 to value which would be set for the job if submitted at that time. (gnats:279)
-
- Sep 12, 2003
-
-
Moe Jette authored
to job_kill request and slurmctld leaves node and job in COMPLETING state until the slurmd issues an EPILOG_COMPLETE RPC on each node. This permits better support for non-killable processes and/or long-running epilog scripts. Several minor changes in node registration handling and slurmctld agent logic to better address a flood of incomming RPC (typically when system restarts). (gnats:268)
-
- Sep 04, 2003
-
-
Moe Jette authored
This prevents an orphan job if srun dies after sending the request or the network fails or the authenticaion mechanism fails.
-
- Aug 08, 2003
-
-
Moe Jette authored
Add new logic to prevent some node state transitions via update_node RPC (e.g. IDLE to ALLOCATED).
-
- Aug 02, 2003
- Jul 31, 2003
-
-
Moe Jette authored
Set "reason" field when node set down for slurmd error.
-
Moe Jette authored
-
Moe Jette authored
the backup controller and proc_req.c is the code to process incomming RPCs. No changes in controller logic were made for this. job_mgr.c was also modified to better handle bad job records on controller restart's data recovery.
-
- Jul 29, 2003
- Jul 24, 2003
- Jul 23, 2003
- Jul 15, 2003
- Jul 07, 2003
-
-
Moe Jette authored
Message to controller will retry after SlurmctldTimeout period if message to primary controller fails and backup controller returns error indicating it is in backup mode.
-
- Jul 04, 2003
-
-
Moe Jette authored
WaitTime sets srun's default value for --wait. MaxJobCount sets the maximum job count for slurmctld (replacing #define MAX_JOB_CNT). MinJobAge sets the minimum job purrge age for slurmctld (replacing #define MIN_JOB_AGE).
-
- Jun 09, 2003
-
-
Moe Jette authored
recover qsw state saved (if any and if "-c" option not used) and use as argument to qsw_init(). If no state to be preserved, call qsw_init(NULL) to initialize data structures.
-
- May 28, 2003
-
-
Mark Grondona authored
slurm_send_recv_node_msg(), slurm_send_recv_rc_msg(), etc. o Fixed fd leak in agent.c using slurm_send_recv_rc_msg() w/ timeout.
-
- Apr 18, 2003
-
-
Moe Jette authored
(it was being treated like a shutdown request and shutting down the slurmd daemons too).
-
- Apr 11, 2003
-
-
Moe Jette authored
-
Mark Grondona authored
to have the correct value for the SlurmUser id
-
- Apr 03, 2003
-
-
Moe Jette authored
which is meaningless in these cases.
-
- Apr 01, 2003
-
-
Moe Jette authored
immediate flag when partition configuration prevents immediate initiation.
-
- Mar 27, 2003
-
-
Moe Jette authored
-
- Mar 26, 2003
- Mar 25, 2003
-
-
Moe Jette authored
-
- Mar 24, 2003
- Mar 21, 2003
-
-
Moe Jette authored
This puts core file in desired location in any case.
-
- Mar 20, 2003
-
-
Moe Jette authored
Shutdown any backup controller before restoring state.
-
- Mar 19, 2003
-
-
Moe Jette authored
don't need to worry about the backup controller freeing it then trying to use it again when it takes over a second time.
-