- Feb 21, 2013
-
-
Danny Auble authored
-
- Jan 16, 2013
-
-
Morris Jette authored
Without this change a high priority batch job may not start at submit time. In addtion, a pending job with mutltiple partitions be cancelled when the scheduler runs if any of it's partitions can not be used by the job.
-
- Dec 12, 2012
-
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
- Nov 22, 2012
-
-
Danny Auble authored
-
- Oct 25, 2012
-
-
Danny Auble authored
-
- Oct 24, 2012
-
-
Danny Auble authored
(removed some unused variables but many remain)
-
- Oct 23, 2012
-
-
Danny Auble authored
-
- Sep 25, 2012
-
-
Morris Jette authored
Fix to some un/pack logic Fix to test12.5 for new sacct help format Address various compiler warnings
-
Martin Perry authored
Attached is the energy accounting patch that Martin and Yiannis have been working. The framework is there, but the functionality it currently not working. They are both on vacation this week and then are back a week before the conference. I thought it would be better to send in order to get the framework and the structures in place for an official 2.5.0 instead of waiting. If you disagree, just let us know and we can send it again when the low level functionality working. Here is a short summary of our test results. 1. jobacct_gather/none + energy_accounting/none Looks OK. Did not find any errors. 2. jobacct_gather/linux or cgroup + energy_accounting/none Looks OK. Did not find any errors. 3. jobacct_gather/linux or cgroup + energy_accounting/rapl Slurmd aborts when you run a job that uses a node that does not support RAPL. This appears to be because of the error()/pexit() at line# 150/151 in energy_accounting_rapl.c. We need to change this code to just issue a debug message and return. For now, energy_accounting must not be configured if the cluster includes any nodes that do not support RAPL. The cpu frequency values reported by jobacct_gather are not correct. Again, there are obviously some problems, so if it would be better to wait for full functionality just let us know. It may be three weeks before they are able to spend some time on this to fix the problems, so that is why I thought you may prefer to have something that has the correctly data structures in sooner rather than later.
-
- Aug 10, 2012
-
-
Matthieu Hautreux authored
previous 20 minute time limit. The previous behavior would fail for large files 20 minutes into the transfer.
-
- Aug 09, 2012
-
-
Matthieu Hautreux authored
previous 20 minute time limit. The previous behavior would fail for large files 20 minutes into the transfer.
-
- Jul 19, 2012
-
-
Alejandro Lucero Palau authored
-
alejluther authored
Adding a reset level in reset_stats for controlling values to reset algorithm
-
- Jul 16, 2012
-
-
Morris Jette authored
This addresses trouble ticket 85
-
- Jul 03, 2012
-
-
Morris Jette authored
-
Alexjandro Lucero Palau authored
Add support for advanced reservation for specific cores rather than whole nodes. Current limiations: homogeneous cluster, nodes idle when reservation created, and no more than one reservation per node. Code is still under development. Work by Alejandro Lucero Palau, et. al, BSC.
-
- Jun 01, 2012
-
-
Morris Jette authored
-
- May 23, 2012
-
-
Morris Jette authored
Format is "name:used/total"
-
Morris Jette authored
-
- May 22, 2012
-
-
Danny Auble authored
-
Morris Jette authored
-
- May 11, 2012
-
-
Morris Jette authored
-
Morris Jette authored
-
- May 10, 2012
-
-
Morris Jette authored
-
- May 08, 2012
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- May 05, 2012
-
-
Morris Jette authored
-
Morris Jette authored
This eliminates one of four RPCs needed for each job.
-
- May 04, 2012
-
-
Morris Jette authored
-
Morris Jette authored
-
- Mar 20, 2012
-
-
Morris Jette authored
-
- Feb 22, 2012
-
-
Pär Andersson authored
-
- Jan 31, 2012
-
-
Didier GAZEN authored
Hi, With slurm 2.3.2 (or 2.3.3), I encounter the following error when trying to launch as root a command attached to a running user's job even if I use the --uid=<user> option : sila@suse112:~> squeue JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES CPUS NODELIST(REASON) 551 debug mysleep. sila RUNNING 0:02 UNLIMITED 1 1 n1 root@suse112:~ # srun --jobid=551 hostname srun: error: Unable to create job step: Access/permission denied <--normal behaviour root@suse112:~ # srun --jobid=551 --uid=sila hostname srun: error: Unable to create job step: Invalid user id <--problem By increasing slurmctld verbosity, the log files displays the follwing error : slurmctld: debug2: Processing RPC: REQUEST_JOB_ALLOCATION_INFO_LITE from uid=0 slurmctld: debug: _slurm_rpc_job_alloc_info_lite JobId=551 NodeList=n1 usec=1442 slurmctld: debug2: Processing RPC: REQUEST_JOB_STEP_CREATE from uid=0 slurmctld: error: Security violation, JOB_STEP_CREATE RPC from uid=0 to run as uid 1001 which occurs in function : _slurm_rpc_job_step_create (src/slurmctld/proc_req.c) Here's my patch to prevent the command from failing (but I'm not sure that there is no side effects) :
-
- Jan 27, 2012
-
-
Morris Jette authored
-
- Jan 19, 2012
-
-
Danny Auble authored
-
Danny Auble authored
all jobs would be returned even if the flag was set. Patch from Bill Brophy, Bull.
-
- Dec 28, 2011
-
-
Morris Jette authored
-