- Dec 05, 2003
-
-
Moe Jette authored
sets new environment variable SLURM_CPUS_ON_NODE for use by LAM/MPI. Also fixed bug in srun task distribution logic for block distribution in heterogeneous cluster.
-
- Nov 26, 2003
-
-
Moe Jette authored
-
- Nov 25, 2003
- Nov 21, 2003
-
-
Mark Grondona authored
- fixes to help ensure slurmd uses the same key for shared memory on a restart (to avoid losing track of jobs) - slurmd only runs one launch thread at a time - fix bug in slurmd where multiple threads used same address space for connecting client address. - srun always sends SIGKILL to job step before issuing complete request - Changed short string for draining nodes to drng from drain. - srun default launch message timeout increased to 5s.
-
- Nov 20, 2003
-
-
Moe Jette authored
src/api/free_msg.c.
-
- Nov 14, 2003
- Nov 13, 2003
-
-
Moe Jette authored
-
jwindley authored
-
Mark Grondona authored
o remove unneeded #include from interconnect.h
-
- Nov 10, 2003
-
-
Moe Jette authored
-
- Nov 07, 2003
- Nov 05, 2003
-
-
Moe Jette authored
between primary and backup. The request has a brief window in which it can abort and we want to decrease the likelyhood of that happening by retrying less frequently when we know control is transistioning.
-
Moe Jette authored
is still waiting to take control (rather than one long wait).
-
Moe Jette authored
(jobcomp/none) in the configuration data structure.
-
Moe Jette authored
data structure.
-
- Oct 29, 2003
-
-
Moe Jette authored
and/or job step(s) will have their resources de-allocated and be killed. A resource allocation will not be release unless no job steps are active for at least InactiveLimit seconds. DPCS jobs will be subject to this forced de-allocation if they remain inactive for an extended period of time, which can get SLURM and DPCS back in sync if DPCS does a cold-start.
-
- Oct 24, 2003
-
-
Moe Jette authored
-
Moe Jette authored
avoid highly fragmented resource allocations. Add list of excluded nodes to job info dumpped and reported. Fix how mis-matched RPC version number are handled. Let error code get back to the API function. Dump job state information upon each job's termination via plugin. Re-issue incomplete write requests in job/partition state save. Make slurmctld continue proper operation without any default partition (gnats:317). Add command/RPC to delete a partition. Retry socket connection for slurmd/io.c as needed (gnats:253).
-
Moe Jette authored
-
Moe Jette authored
-
jwindley authored
-
- Oct 23, 2003
- Oct 22, 2003
-
-
Moe Jette authored
is zero.
-
Mark Grondona authored
-
Mark Grondona authored
not correct.
-
Mark Grondona authored
o replace calls of pthread_kill() to IO thread with eio_handle_signal() o Try to avoid having obj->ops == 0x0
-
- Oct 20, 2003
-
-
Mark Grondona authored
-
Mark Grondona authored
-
Moe Jette authored
-
Moe Jette authored
-
- Oct 17, 2003
-
-
Mark Grondona authored
-
Moe Jette authored
-