- Dec 08, 2003
-
-
Moe Jette authored
-
- Dec 05, 2003
-
-
Moe Jette authored
sets new environment variable SLURM_CPUS_ON_NODE for use by LAM/MPI. Also fixed bug in srun task distribution logic for block distribution in heterogeneous cluster.
-
- Dec 02, 2003
-
-
Moe Jette authored
-
- Nov 26, 2003
-
-
Moe Jette authored
-
- Nov 25, 2003
-
-
Moe Jette authored
Otherwise signal all steps associated with the job (unless individual job steps are identified).
-
- Nov 24, 2003
-
-
Moe Jette authored
signals all job steps, but not the job script itself).
-
- Nov 21, 2003
-
-
Mark Grondona authored
- fixes to help ensure slurmd uses the same key for shared memory on a restart (to avoid losing track of jobs) - slurmd only runs one launch thread at a time - fix bug in slurmd where multiple threads used same address space for connecting client address. - srun always sends SIGKILL to job step before issuing complete request - Changed short string for draining nodes to drng from drain. - srun default launch message timeout increased to 5s.
-
- Nov 20, 2003
-
-
Mark Grondona authored
o Modify srun(1) man page to reflect that, really, slurmd debug level is only allowed to be set up to 4.
-
- Nov 18, 2003
-
-
Moe Jette authored
-
- Nov 17, 2003
-
-
jwindley authored
-
- Nov 14, 2003
-
-
Moe Jette authored
-
- Nov 10, 2003
-
-
Moe Jette authored
-
- Nov 07, 2003
- Nov 05, 2003
-
-
Moe Jette authored
to take effect.
-
- Nov 03, 2003
-
-
Moe Jette authored
-
- Oct 31, 2003
-
-
Moe Jette authored
scontrol command to use it.
-
- Oct 29, 2003
-
-
Moe Jette authored
and/or job step(s) will have their resources de-allocated and be killed. A resource allocation will not be release unless no job steps are active for at least InactiveLimit seconds. DPCS jobs will be subject to this forced de-allocation if they remain inactive for an extended period of time, which can get SLURM and DPCS back in sync if DPCS does a cold-start.
-
- Oct 24, 2003
-
-
Moe Jette authored
report an error message.
-
Moe Jette authored
-
Moe Jette authored
avoid highly fragmented resource allocations. Add list of excluded nodes to job info dumpped and reported. Fix how mis-matched RPC version number are handled. Let error code get back to the API function. Dump job state information upon each job's termination via plugin. Re-issue incomplete write requests in job/partition state save. Make slurmctld continue proper operation without any default partition (gnats:317). Add command/RPC to delete a partition. Retry socket connection for slurmd/io.c as needed (gnats:253).
-
jwindley authored
-
- Oct 15, 2003
-
-
Moe Jette authored
equivalents. For consistency make them return integer values (like before) that are not valid characters (e.g. 0x100). This avoid the use of single character values that we might want to preserve.
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
- Oct 14, 2003
-
-
Moe Jette authored
-
- Oct 08, 2003
-
-
Moe Jette authored
-
- Oct 06, 2003
-
-
Moe Jette authored
-
- Oct 03, 2003
-
-
Moe Jette authored
-
- Oct 01, 2003
-
-
Moe Jette authored
-
- Sep 30, 2003
-
-
Mark Grondona authored
-
- Sep 29, 2003
-
-
Moe Jette authored
-
- Sep 24, 2003
- Sep 23, 2003
-
-
Mark Grondona authored
-
- Sep 22, 2003
-
-
Moe Jette authored
and --nodes options.
-
- Sep 20, 2003
-
-
Mark Grondona authored
on nodes relative to the current allocation. o srun no longer sends SIGKILL to job if one task is killed except if --no-allocate is used. (the job will otherwise be killed by the controller anyway)
-