- Aug 13, 2011
-
-
Danny Auble authored
-
- Aug 12, 2011
-
-
Danny Auble authored
next parallel step is ran on a sub block, SLURM won't over subscribe cnodes.
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This reverts commit c5d63854 from 8/11/2011. The memory copy is not a leak, but is required to avoid memory corruption.
-
Morris Jette authored
make sure that a job has a step_list before creating an interator for it
-
Morris Jette authored
Improve logging messages and readability of some code
-
Morris Jette authored
This prevents bad node index values in a job step completion record from crashing slurmctld, as is possible if srun has bad configuration information about a job step or other failure.
-
- Aug 11, 2011
-
-
Morris Jette authored
Add a basic test of Bluegene/Q job step allocations within an existing job allocation.
-
Morris Jette authored
on a Bluegene/Q system when srun's --test-only option is used within an existing allocation then launch the job directly with the slurmd daemon and do not use IBM's "runjob" command. Useful for testing.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
BLUEGENE - Modify "scontrol show step" to show I/O nodes (BGL and BGP) or c-nodes (BGQ) allocated to each step. Change field name from "Nodes=" to "BP_List=".
-
- Aug 10, 2011
-
-
Morris Jette authored
The test is now more generic to support all Bluegene system types
-
Danny Auble authored
cannot fit into the available shape.
-
Morris Jette authored
Modify existing tests so they all run as desired on an emulated Bluegene/Q system
-
Morris Jette authored
Previous code would fail when trying to launch more than 4096 tasks, which is a problem on BGQ systems where SLURM actually launches job steps.
-
Morris Jette authored
The SLURM_JOB_CPUS_PER_NODE and SLURM_TASKS_PER_NODE environment variables were being improperly set for IBM Bluegene systems
-
Danny Auble authored
or not.
-
- Aug 09, 2011
-
-
Morris Jette authored
This change applies only to Cray systems and only when the srun wrapper for aprun. Map --exclusive to -F exclusive and --share to -F share. Note this does not consider the partition's Shared configuration, so it is an imperfect mapping of options.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
On Cray systems only, the value of avail_node_bitmap was not being properly set for non-responsive nodes.
-
Morris Jette authored
A node DOWN to ALPS will be marked DOWN to SLURM only after reaching SlurmdTimeout. In the interim, the node state will be NO_RESPOND. This change makes behavior makes SLURM handling of the node DOWN state more consistent with ALPS. This change effects only Cray systems.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Fix the node state accounting to be consistent with the node state set by ALPS.
-
- Aug 08, 2011
-
-
Morris Jette authored
Split set_node_down() into two functions: set_node_down() will continue to accept a node name as an argument and set_node_down_ptr() which is new and accepts a node pointer as an argument and will be faster.
-
Morris Jette authored
Test4.5 was failing due to failure to parse node count with "K" suffix and change in case of node state name.
-