- Aug 18, 2004
-
-
Moe Jette authored
-
- Aug 17, 2004
-
-
Moe Jette authored
Map all nodes in cluster to a single front-end node. Don't repeat ping/register/kill/etc. RPCs to all pseudo nodes, just the front-end. Treat single message for some RPCs as representing all nodes in the cluster: register, ping responce, epilog complete, etc.
-
- Jun 22, 2004
-
-
Moe Jette authored
request occurs prior to initial node registration. Node that should have been DRAINED was being set to state DRAINING.
-
- Jun 09, 2004
-
-
Moe Jette authored
-
- Jun 04, 2004
-
-
Moe Jette authored
switch (was setting DRAINED nodes in state DRAINING).
-
- Jun 02, 2004
-
-
Moe Jette authored
option. Use SHOW_ALL as flag.
-
- Apr 30, 2004
-
-
Moe Jette authored
commands modified. Info filtering added to slurmctld.
-
- Apr 23, 2004
-
-
Moe Jette authored
* Memory leak in slurm_cred.c, added EVP_MD_CTX_cleanup(). * Pthread stack size too small on AIX. Resulting in stack corruption and ugly failure modes. Added slurm_attr_init to macros.h to explicitly set the stack size for all pthreads. * /dev/urandom not present on AIX, use rand() as needed instead in constructing a credential. Used in "srun --join". * getsockopt(Socket, Level, SO_ERROR, &err, OptionLenght) sometime returns an error code of -1. This causes an assert failure in slurmd/io.c:_update_error_state(). * Function aliasing is not working on AIX. It is being turned off via a variable in config.h and "#if" logic in macros.h and slurm_xlator.h. * dlopen failing if plugins reference any functions not present in caller. This may be fixed with the LDFLAG "-Wl,-bgcbypass=1000" being added for the slurm commands (avoid garbage collection of unused functions). * read() is sometimes generates EAGAIN error, which was not handled in some places. * vsnprintf() for string NULL is printing "" instead of "(null)" as produced by snprintf(). More format printing was added to log.c to produce more consistent log messages. * poll() takes a timeout of -1 for unlimited rather than any negative number. Modify logic that was always multiplying by 1000 to convert usec to msec. * getopt_long keyword table was not NULL terminated, resulting in segfault with invalid command-line argument in most commands. * xmalloc module assert failures were not generating a core file. Changed "fatal();abort();" to "error();abort();". * Change msg timeout from 3 sec to 5 sec. Running everything on single AIX node was very slow.
-
- Apr 05, 2004
-
-
Moe Jette authored
than old local functions.
-
- Mar 16, 2004
-
-
Moe Jette authored
flag the required state saves. Perform the saves using synchronous I/O from just one thread. Under heavy loads this results in much faster responsiveness and lowers slurmctld's memory and CPU overhead considerably.
-
- Mar 04, 2004
-
-
Moe Jette authored
data structure. This eliminates risks associated with re-reading slurm.conf.
-
- Feb 26, 2004
-
-
Moe Jette authored
state recovery.
-
- Jan 26, 2004
-
-
Moe Jette authored
clusters with FastSchedule configured off * Only return DOWN nodes to service if the reason for them being in that state is non-responsiveness and ReturnToService configured on * Some general code clean-up
-
- Jan 23, 2004
-
-
Moe Jette authored
fast_schedule == 0 (i.e. the node's value is used directly rather than the partition's configuration value).
-
- Dec 31, 2003
-
-
Moe Jette authored
modifications were relatively minor - mostly changes in function names or arguments.
-
- Dec 24, 2003
-
-
Moe Jette authored
(it was inappropriately going to DRAINING state).
-
- Dec 23, 2003
-
-
Moe Jette authored
Fix update node RPC to handle reason field change without state change. State was being handled as type int instead of uint16_t so NO_VAL check was not working properly.
-
- Dec 22, 2003
-
-
Moe Jette authored
-
- Dec 11, 2003
-
-
Moe Jette authored
sharing via node record of job count (0 | 1) and bitmap of nodes which permit sharing. Previous logic could permit a job accepting shared nodes to be scheduled on a node that already had a running job not accepting shared nodes.
-
- Dec 05, 2003
-
-
Moe Jette authored
gracefully kill all jobs allocated resources on those nodes, gracefully kill all pending jobs that require those nodes, leave pending jobs that exclude those nodes but ignore those nodes. Added "best_effort" argument to node_name2bitmap() function. Fix potential memory leak when maui scheduler interface resets the required nodes. (gnats:342)
-
- Nov 21, 2003
-
-
Moe Jette authored
-
- Nov 19, 2003
- Nov 18, 2003
-
-
Moe Jette authored
be changed whenever a node responded to a "ping" and other insignificant events, which resulted in the backfill scheduling running more frequently than required.
-
- Nov 07, 2003
-
-
Moe Jette authored
purge the request and job if/when the node changes to state DOWN.
-
- Nov 06, 2003
- Oct 30, 2003
-
-
Moe Jette authored
-
- Oct 24, 2003
-
-
Moe Jette authored
and quitting.
-
- Oct 11, 2003
-
-
Mark Grondona authored
- changed defs of HAVE_LIBELAN3 to HAVE_ELAN
-
- Oct 10, 2003
-
-
Moe Jette authored
-
- Oct 08, 2003
-
-
Moe Jette authored
debug3().
-
- Sep 29, 2003
-
-
Moe Jette authored
Fatal() no longer calls abort(), but terminates job using exit(1).
-
- Sep 25, 2003
- Sep 23, 2003
-
-
Moe Jette authored
scalability. An arbitrary number of requests may be queued and they are processed one per second until the queue is empty or pending requests were last attempted recently (configuration parameters set to 60 seconds as a minimum retry interval).
-
- Sep 21, 2003
- Sep 19, 2003
-
-
Moe Jette authored
-