- Dec 31, 2003
-
-
Moe Jette authored
(gnats:346)
-
- Dec 24, 2003
-
-
Moe Jette authored
to prevent things getting hosed if the table is re-read on SIGHUP.
-
- Dec 23, 2003
-
-
Moe Jette authored
"Scontrol abort" works. It was leaving a hung pthread due to a recent change. Fix a couple of potential memory leaks "switch_type" has been added to config data structure, un/pack, etc, but not yet reported to the user or documented yet. The plugins now use function calls to get a their type and plugin directory from a common data structure rather than individually reading and parsing the configuration file.
-
- Dec 19, 2003
-
-
Mark Grondona authored
(as well as the reverse) in all qsw code. - move elanhosts.[ch] to common - initialize elanhost_config on demand in qsw.c - remove calls to elanhosts in slurmd/elan_interconnect.c - merge libhostlist into libcommon since elanhosts needs it.
-
- Dec 17, 2003
-
-
jwindley authored
-
- Dec 15, 2003
- Dec 12, 2003
-
-
Mark Grondona authored
to srun or slurmctld. o new errno ESLURMD_UID_NOT_FOUND
-
- Dec 11, 2003
-
-
Moe Jette authored
Just log open failure and continue.
-
- Dec 10, 2003
-
-
Moe Jette authored
job logs, state save directories, etc.
-
- Dec 08, 2003
- Dec 05, 2003
-
-
Moe Jette authored
sets new environment variable SLURM_CPUS_ON_NODE for use by LAM/MPI. Also fixed bug in srun task distribution logic for block distribution in heterogeneous cluster.
-
- Nov 26, 2003
-
-
Moe Jette authored
-
- Nov 25, 2003
- Nov 21, 2003
-
-
Mark Grondona authored
- fixes to help ensure slurmd uses the same key for shared memory on a restart (to avoid losing track of jobs) - slurmd only runs one launch thread at a time - fix bug in slurmd where multiple threads used same address space for connecting client address. - srun always sends SIGKILL to job step before issuing complete request - Changed short string for draining nodes to drng from drain. - srun default launch message timeout increased to 5s.
-
- Nov 20, 2003
-
-
Moe Jette authored
src/api/free_msg.c.
-
- Nov 14, 2003
- Nov 13, 2003
-
-
Moe Jette authored
-
jwindley authored
-
Mark Grondona authored
o remove unneeded #include from interconnect.h
-
- Nov 10, 2003
-
-
Moe Jette authored
-
- Nov 07, 2003
- Nov 05, 2003
-
-
Moe Jette authored
between primary and backup. The request has a brief window in which it can abort and we want to decrease the likelyhood of that happening by retrying less frequently when we know control is transistioning.
-
Moe Jette authored
is still waiting to take control (rather than one long wait).
-
Moe Jette authored
(jobcomp/none) in the configuration data structure.
-
Moe Jette authored
data structure.
-
- Oct 29, 2003
-
-
Moe Jette authored
and/or job step(s) will have their resources de-allocated and be killed. A resource allocation will not be release unless no job steps are active for at least InactiveLimit seconds. DPCS jobs will be subject to this forced de-allocation if they remain inactive for an extended period of time, which can get SLURM and DPCS back in sync if DPCS does a cold-start.
-
- Oct 24, 2003
-
-
Moe Jette authored
-
Moe Jette authored
avoid highly fragmented resource allocations. Add list of excluded nodes to job info dumpped and reported. Fix how mis-matched RPC version number are handled. Let error code get back to the API function. Dump job state information upon each job's termination via plugin. Re-issue incomplete write requests in job/partition state save. Make slurmctld continue proper operation without any default partition (gnats:317). Add command/RPC to delete a partition. Retry socket connection for slurmd/io.c as needed (gnats:253).
-
Moe Jette authored
-