- Sep 29, 2004
-
-
Moe Jette authored
-
- Sep 27, 2004
-
-
Moe Jette authored
-
- Sep 24, 2004
-
-
Moe Jette authored
Move pretty much all BGL-specific logic into that module and the associated plugin and make use of an opaque data object for maintaining the information.
-
- Sep 21, 2004
-
-
Moe Jette authored
-
- Sep 20, 2004
-
-
Moe Jette authored
reported as desired. Add new function to drain node (for use by select/bluegene node monitoring thread).
-
- Sep 17, 2004
-
-
Moe Jette authored
-
- Sep 15, 2004
- Aug 20, 2004
-
-
Moe Jette authored
naming (3-digit number with distict maximum value for each digit).
-
- Aug 19, 2004
-
-
Moe Jette authored
-
- Aug 17, 2004
-
-
Moe Jette authored
Map all nodes in cluster to a single front-end node. Don't repeat ping/register/kill/etc. RPCs to all pseudo nodes, just the front-end. Treat single message for some RPCs as representing all nodes in the cluster: register, ping responce, epilog complete, etc.
-
- Aug 10, 2004
-
-
Moe Jette authored
-
- Aug 06, 2004
-
-
Moe Jette authored
-
- Aug 04, 2004
-
-
Moe Jette authored
-
- Jul 26, 2004
-
-
Moe Jette authored
-
- Jul 23, 2004
-
-
Moe Jette authored
scontrol options. For now only the NULL plugin is available, but this is required for ASC Purple.
-
- Jul 09, 2004
-
-
Moe Jette authored
with "Shared=yes" configuration failed to function as desired. (gnats:459).
-
- Jun 02, 2004
-
-
Moe Jette authored
option. Use SHOW_ALL as flag.
-
- Apr 30, 2004
-
-
Moe Jette authored
commands modified. Info filtering added to slurmctld.
-
- Mar 20, 2004
-
-
Moe Jette authored
order to interupt accept(), because this can fail if the authentication plugin is bad or there are other communications problems. Use interrupt instead.
-
- Mar 16, 2004
-
-
Moe Jette authored
MAX_SERVER_THREADS and increase that value from 50 to 60
-
Moe Jette authored
flag the required state saves. Perform the saves using synchronous I/O from just one thread. Under heavy loads this results in much faster responsiveness and lowers slurmctld's memory and CPU overhead considerably.
-
- Mar 11, 2004
-
-
Moe Jette authored
slurmctld startup Create StateSaveLocation directory if changes via slurmctld reconfig
-
- Jan 26, 2004
-
-
Moe Jette authored
clusters with FastSchedule configured off * Only return DOWN nodes to service if the reason for them being in that state is non-responsiveness and ReturnToService configured on * Some general code clean-up
-
- Dec 31, 2003
-
-
Moe Jette authored
modifications were relatively minor - mostly changes in function names or arguments.
-
- Dec 11, 2003
-
-
Moe Jette authored
sharing via node record of job count (0 | 1) and bitmap of nodes which permit sharing. Previous logic could permit a job accepting shared nodes to be scheduled on a node that already had a running job not accepting shared nodes.
-
- Dec 05, 2003
-
-
Moe Jette authored
gracefully kill all jobs allocated resources on those nodes, gracefully kill all pending jobs that require those nodes, leave pending jobs that exclude those nodes but ignore those nodes. Added "best_effort" argument to node_name2bitmap() function. Fix potential memory leak when maui scheduler interface resets the required nodes. (gnats:342)
-
- Nov 25, 2003
-
-
Moe Jette authored
Otherwise signal all steps associated with the job (unless individual job steps are identified).
-
- Nov 18, 2003
-
-
Moe Jette authored
-
- Nov 13, 2003
- Nov 07, 2003
-
-
Moe Jette authored
Some of this was formerly in slurmctld/read_config.c.
-
- Oct 29, 2003
-
-
Moe Jette authored
and/or job step(s) will have their resources de-allocated and be killed. A resource allocation will not be release unless no job steps are active for at least InactiveLimit seconds. DPCS jobs will be subject to this forced de-allocation if they remain inactive for an extended period of time, which can get SLURM and DPCS back in sync if DPCS does a cold-start.
-
- Oct 24, 2003
-
-
Moe Jette authored
avoid highly fragmented resource allocations. Add list of excluded nodes to job info dumpped and reported. Fix how mis-matched RPC version number are handled. Let error code get back to the API function. Dump job state information upon each job's termination via plugin. Re-issue incomplete write requests in job/partition state save. Make slurmctld continue proper operation without any default partition (gnats:317). Add command/RPC to delete a partition. Retry socket connection for slurmd/io.c as needed (gnats:253).
-
jwindley authored
-
- Oct 11, 2003
-
-
Mark Grondona authored
- changed defs of HAVE_LIBELAN3 to HAVE_ELAN
-
- Oct 03, 2003
-
-
Moe Jette authored
lost of an EPILOG_COMPLETE message.
-
- Sep 29, 2003
-
-
Moe Jette authored
-
- Sep 23, 2003
-
-
Moe Jette authored
-
- Sep 21, 2003
-
-
Moe Jette authored
control (it needs to complete all pending RPCs and save state before the primary reads state and takes over).
-