- Aug 09, 2023
-
-
Skyler Malinowski authored
Bug 17176
-
Skyler Malinowski authored
Bug 17274
-
Dominik Bartkiewicz authored
Triggered when a node registers with a configured CpuSpecList while slurmctld configuration has the node without CpuSpecList. The patch protects against a NULL dereference in bit_equal(). Bug 17322
-
- Aug 08, 2023
-
-
Ben Roberts authored
Extra 'a' in "...and not a reinitialize the..." Bug 17371
-
Albert Gil authored
Bug 16731
-
- Aug 07, 2023
-
-
David Gloe authored
If using a backup slurmctld and the primary goes down temporarily, slurmstepd uses the slurmctld timeout to determine when to retry contacting the primary. Send the value to slurmstepd so it waits the correct amount of time, rather than NO_VAL16/2. The slurmctld timeout is initialized to NO_VAL16 in init_slurm_conf() and the sleep time of slurmctld_timeout / 2 happens in slurm_send_recv_controller_msg(). Bug 17232
-
Ben Roberts authored
Bug 17360
-
Marshall Garey authored
This partially reverts commit dfb07974. Commit dfb07974 removed code that limits avail_cpus on a node. This commit restores that code only for jobs that do not request gres. For jobs that request gres, avail_cpus is limited later by gres_select_filter_sock_core. Prior to this commit, for jobs that do not request gres, avail_cpus is not limited based on the maximum number of tasks that can run on the node. Therefore, fewer nodes could be assigned to the job than required by tasks. For example, given nodes with 16 cpus, the following job needs 4 nodes. However, after commit dfb07974 and before this commit, the job would be assigned 3 nodes. Three nodes have sufficient cpus to fulfill the job request, although it does not fulfill the cpus_per_task request. srun --mem=0 -n4 --cpus-per-task=12 --exclusive hostname Bug 17185 Signed-off-by:
Marcin Stolarek <cinek@schedmd.com>
-
Tim Wickberg authored
-
- Aug 04, 2023
-
-
Ben Roberts authored
Bug 15867
-
Ben Roberts authored
Also clarify that PriorityJobFactor is what's used for the PriorityWeightPartition calculation. Bug 17236 Signed-off-by:
Nathan Rini <nate@schedmd.com>
-
- Aug 03, 2023
-
-
Marcin Stolarek authored
In case of CR_CPU_MEMORY can_job_run_on_node can set number of CPUs available to the job based on --mem-per-cpu to a value that is not a multiple of ThreadsPerCore. In this case we can't assume that removal of core always removes ThreadsPerCore. Instead of that assumption just reset the value number of available cores * ThreadsPerCore. Bug 17229
-
Benjamin Witham authored
This is an update of commit 49bcc0d4. Originally the message was at debug(), then changed to verbose(), and is now being upgraded to info(). Bug 16664, 17341.
-
- Aug 02, 2023
-
-
Ethan Simmons authored
Edit failing condition to not fail because of debug messages. Bug 14351
-
Ethan Simmons authored
Add subtest to check namespaces don't interfere with scheduling and uid of job Bug 15441
-
Jonathan de Gaston authored
Update test_123_1.py to work with new slurm-23.11 changes. The output from reservation errors was changed slightly. Bug 17203
-
Tom Johns authored
-
Jonathan de Gaston authored
Update test_116_18.py to work with slurm 23.11. Specify SelectType=select/linear in slurm.conf Bug 16860
-
Marshall Garey authored
Bug 17113
-
Marshall Garey authored
slurmctld only considers sockets when allocating resources for a job. NUMA nodes are not considered sockets unless numa_node_as_socket is configured. Bug 17113
-
Marshall Garey authored
In Slurm 23.02, options['uid'] in cli_filter.lua does not return the UID of the user but returns the result of --uid. This change was made to allow Linux user namespaces to work. In addition, because a cli_filter is not secure, an example of rejecting a job due to an invalid constraint is better done in a job_submit plugin. For these reasons, move and adapt the example code checking UID and interactive jobs to job_submit.lua.example. Bug 17075
-
Tim McMullan authored
Bug 16133
-
Ben Roberts authored
Bug 17330
-
Tim McMullan authored
Since container creation happens in the extern step, it is expected that _create_ns() will run only once per job per node. If the job namespace directory already exists, it is better to assume that BasePath is configured incorrectly and fail to launch the step than to continue and potentially break the container. Bug 16164
-
- Aug 01, 2023
-
-
Albert Gil authored
Previous logic was only checking a week out for overlapping reoccurring reservations. Although in some cases those reservation may replace their nodes and won't overlap, we cannot assume that. The new logic checks the relevant reoccurring cases advancing the necessary reoccurring periods in both reservations. It also handles weekdays and weekends properly. Although commit cd750f97 was not really introducing this issue, it increased the chance of overlapping reservations in 22.05.0. Bug 16731
-
Nathan Rini authored
Bug 17192
-
Nathan Rini authored
Bug 17192
-
Nathan Rini authored
Bug 17192
-
Nathan Rini authored
Bug 17192
-
Nathan Rini authored
Bug 17192
-
Marcin Stolarek authored
-
Marshall Garey authored
As of 23.02, SLURM_NTASKS is only set as an output environment variable for salloc and sbatch when --ntasks is explicitly requested. SLURM_NTASKS is no longer set based on calculation of other options. See commit ef513023 for sbatch and commit 57ec3eb5 for salloc. Bug 17108
-
Tom Johns authored
-
Ethan Simmons authored
Modify test39.21 to use manual node selection instead of --gres, which doesn't work with select/linear, to ensure resources are present for test. Bug 16845
-
Ethan Simmons authored
test_gpus_per_node uses --gpus-per-node, which isn't compatible with select/linear. Bug 16844
-
Ethan Simmons authored
Adjust resource limits (CPUs) to reflect actual node setup instead of minimal hard-coded limits when on select/linear. Bug 16836
-
Ethan Simmons authored
Test originally checks for exclusive node allocation via oversubscribtion. Extend these to check for select/linear also to determine exclusive node allocation. Bug 16828
-
Ethan Simmons authored
Test measures performance with large amount of jobs. Jobs can't be scheduled on the same node with select/linear, so this makes the performance of this test much more dependant on the actual configuration. Bug 16805
-
- Jul 31, 2023
-
-
Tim Wickberg authored
Just call net_stream_listen_ports() instead. Bug 16161. Signed-off-by:
Nathan Rini <nate@schedmd.com>
-
Felip Moll authored
It is possible to swap between cgroup and linux plugins. Also linux plugin has a performance penalty on systems with many tasks. Bug 17099
-