- Oct 20, 2011
-
-
Morris Jette authored
-
Danny Auble authored
-
- Oct 19, 2011
-
-
Danny Auble authored
-
- Oct 18, 2011
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Matthieu Hautreux authored
-
- Oct 14, 2011
-
-
Danny Auble authored
-
Danny Auble authored
if any message comes through, and since we have states of cnodes in the status threads we don't need to keep retrying to send the message of cnodes in error if the slurmctld is down.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
parameter to the bluegene.conf file, MaxBlockInError, which is a percentage of a block that can be in an error state before future jobs are disallowed to run on the block.
-
Danny Auble authored
in the system. cnode_err_bitmap is used to tell if there are cnodes in an error state in the usable portion of the midplane. cnode_usable_bitmap is used to tell what part of the midplane is usable, typically used for small blocks which only use a portion of a midplane.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
picking cnodes for a step.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
midplane that isn't the first midplane in the block.
-
Danny Auble authored
for blocks and all other hardware, so we can sync correctly.
-
Danny Auble authored
-
Danny Auble authored
can access it.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
hasty.
-
- Oct 13, 2011
-
-
Matthieu Hautreux authored
The addition of the default slurm cg with the cpuset subsystem was incomplete preventing from having a working solution. The contents of cpuset.cpus and cpuset.mems were not replicated from the parent resulting in "No space left on device" errors when trying to add tasks to the step cg.
-
Matthieu Hautreux authored
In order to distinguish between slurm related cg and system related cg, ensure that all slurm related cgroup directories are created under a single directory. This directory is slurm or slurm_nodename in case of multiple-slurmd usage.
-
- Oct 12, 2011
-
-
Mark A. Grondona authored
Add the amount of memory allocated by slurm to the job or step to the debug message in memcg_initialize(). Also, change the message from debug to info, so that a user can see the information by using --slurmd-debug=1.
-
Mark A. Grondona authored
For debugging purposes, add a debug level message with some values of interest just after task_cgroup_memory has initialized.
-
Mark A. Grondona authored
Add a new configuration parameter MinRAMSpace which sets a lower bound on memory.limit_in_bytes and memory.memsw.limit_in_bytes . This is required in case an administrator or user sets an absurdly low value for memory limit, potentially causing the slurmstepd to be terminated by the OOM killer. MinRAMSpace is set in MB of RAM and is 30 by default. (An arbitrarily chosen value)
-
Mark A. Grondona authored
The use of whole percent values for cgroup.conf parameters such as AllowedRAMSpace, MaxRAMPercent, AllowedSwapSpace and MaxSwapPercent may be too coarse grained on systems with large amounts of memory. (e.g. 1% of 64G is over 650MB). This patch allows these percentage values to be arbitrary floating point numbers to allow finer grained tuning of these limits and parameters.
-
Mark A. Grondona authored
Treat a 0 byte memory limit from SLURM as unlimited and instead use MaxRAMPercent and MaxSwapPercent as RAM and Swap limits for the job/job step. This avoids creating a memory cgroup with limit_in_bytes = 0, which would end up causing the cgroup to OOM before slurmstepd could even be started. This also allows systems in which SLURM isn't explicitly allocating memory to use the task/cgroup plugin with ConstrainRAMSpace=yes.
-
Mark A. Grondona authored
Calculate the upper bound RAM in bytes and Swap in bytes that may be used by any one cgroup and apply this limit in the task/cgroup code.
-
Mark A. Grondona authored
There was some duplicated code in task_cgroup_memory_create. In order to facilitate extending this code in the future, refactor it into a common function memcg_initialize().
-
Mark A. Grondona authored
cgroups code currently assumes cgroup subsystems will be mounted under /cgroup, which is not the ideal location for many situations. Add a new cgroup.conf parameter to redefine the mount point to an arbitrary location. (for example, some systems may already have cgroupfs mounted under /dev/cgroup or /sys/fs/cgroup)
-
- Oct 11, 2011
-
-
Morris Jette authored
Cray: Add support for job reservations with node IDs that are not in numeric order. Fix for Bugzilla #5.
-
Matthieu Hautreux authored
With release_agent notified at the step cgroup level, the step cgroup can be removed while slurmstepd as not yet finished its internals epilog mechanisms. Inhibiting release agent at the step level and ensuring its proper removal helps to guarantee that the node will only be eligible for job execution when the resources will be completely available (no longer used by the job or the epilogs).
-
- Oct 05, 2011
-
-
Danny Auble authored
-
Morris Jette authored
-
Danny Auble authored
nested loops.
-