- Jan 16, 2004
-
-
Moe Jette authored
-
- Jan 14, 2004
-
-
Moe Jette authored
was being ignored by default when manually initiated, which prevents it from terminating gracefully.
-
- Jan 13, 2004
-
-
Moe Jette authored
when a node becomes DOWN for not responding. This is because if there are a large number of non-responsive nodes, the ping agent can take a long time to complete (one second per non-responsive node or 10 second timeout per node with 10 parallel tasks). This should more properly mark nodes as DOWN.
-
Moe Jette authored
-
Moe Jette authored
then don't treat as fatal error.
-
- Dec 31, 2003
-
-
Moe Jette authored
modifications were relatively minor - mostly changes in function names or arguments.
-
- Dec 24, 2003
-
-
Moe Jette authored
(it was inappropriately going to DRAINING state).
-
- Dec 23, 2003
-
-
Moe Jette authored
see memory leaks.
-
Moe Jette authored
(it grows anyway). Fix state read logic to better handle error conditions.
-
Moe Jette authored
Fix update node RPC to handle reason field change without state change. State was being handled as type int instead of uint16_t so NO_VAL check was not working properly.
-
Moe Jette authored
-
Moe Jette authored
"Scontrol abort" works. It was leaving a hung pthread due to a recent change. Fix a couple of potential memory leaks "switch_type" has been added to config data structure, un/pack, etc, but not yet reported to the user or documented yet. The plugins now use function calls to get a their type and plugin directory from a common data structure rather than individually reading and parsing the configuration file.
-
- Dec 22, 2003
-
-
Moe Jette authored
-
- Dec 19, 2003
-
-
Mark Grondona authored
-
Mark Grondona authored
(as well as the reverse) in all qsw code. - move elanhosts.[ch] to common - initialize elanhost_config on demand in qsw.c - remove calls to elanhosts in slurmd/elan_interconnect.c - merge libhostlist into libcommon since elanhosts needs it.
-
- Dec 11, 2003
-
-
Moe Jette authored
sharing via node record of job count (0 | 1) and bitmap of nodes which permit sharing. Previous logic could permit a job accepting shared nodes to be scheduled on a node that already had a running job not accepting shared nodes.
-
Moe Jette authored
failure, log with fatal() and exit.
-
- Dec 10, 2003
- Dec 08, 2003
-
-
Moe Jette authored
-
- Dec 06, 2003
-
-
jwindley authored
Avoid sending TASKS=0 to Wiki; Maui silently rejects the job (insofar as Maui silently does anything)
-
- Dec 05, 2003
-
-
Moe Jette authored
gracefully kill all jobs allocated resources on those nodes, gracefully kill all pending jobs that require those nodes, leave pending jobs that exclude those nodes but ignore those nodes. Added "best_effort" argument to node_name2bitmap() function. Fix potential memory leak when maui scheduler interface resets the required nodes. (gnats:342)
-
- Dec 03, 2003
- Nov 25, 2003
-
-
Moe Jette authored
Otherwise signal all steps associated with the job (unless individual job steps are identified).
-
- Nov 22, 2003
- Nov 21, 2003
- Nov 20, 2003
- Nov 19, 2003