Skip to content
Snippets Groups Projects
  1. Feb 21, 2013
  2. Jan 16, 2013
    • Morris Jette's avatar
      Fix for scheduling batch jobs in multiple partitions · 04fbf26a
      Morris Jette authored
      Without this change a high priority batch job may not start at submit
      time. In addtion, a pending job with mutltiple partitions be cancelled
      when the scheduler runs if any of it's partitions can not be used by
      the job.
      04fbf26a
  3. Dec 12, 2012
  4. Nov 22, 2012
  5. Oct 25, 2012
  6. Oct 24, 2012
  7. Oct 23, 2012
  8. Sep 25, 2012
    • Morris Jette's avatar
      Patches to energy consumption logic · b57eabfa
      Morris Jette authored
      Fix to some un/pack logic
      Fix to test12.5 for new sacct help format
      Address various compiler warnings
      b57eabfa
    • Martin Perry's avatar
      Energy use collection logic · 86f60616
      Martin Perry authored
      Attached is the energy accounting patch that Martin and Yiannis have been working.  The framework is there, but the functionality it currently not working.  They are both on vacation this week and then are back a week before the conference.  I thought it would be better to send in order to get the framework and the structures in place for an official 2.5.0 instead of waiting.  If you disagree, just let us know and we can send it again when the low level functionality working.    Here is a short summary of our test results.
      
      1. jobacct_gather/none + energy_accounting/none
      
      Looks OK.  Did not find any errors.
      
      2.  jobacct_gather/linux or cgroup + energy_accounting/none
      
      Looks OK.  Did not find any errors.
      
      3.  jobacct_gather/linux or cgroup + energy_accounting/rapl
      
      Slurmd aborts when you run a job that uses a node that does not support RAPL.  This appears to be because of the error()/pexit() at line# 150/151 in energy_accounting_rapl.c.  We need to change this code to just issue a debug message and return.  For now, energy_accounting must not be configured if the cluster includes any nodes that do not support RAPL.
      
      The cpu frequency values reported by jobacct_gather are not correct.
      
      Again, there are obviously some problems, so if it would be better to wait for full functionality just let us know.  It may be three weeks before they are able to spend some time on this to fix the problems, so that is why I thought you may prefer to have something that has the correctly data structures in sooner rather than later.
      86f60616
  9. Aug 10, 2012
  10. Aug 09, 2012
  11. Jul 19, 2012
  12. Jul 16, 2012
  13. Jul 03, 2012
  14. Jun 01, 2012
  15. May 23, 2012
  16. May 22, 2012
  17. May 11, 2012
  18. May 10, 2012
  19. May 08, 2012
  20. May 05, 2012
  21. May 04, 2012
  22. Mar 20, 2012
  23. Feb 22, 2012
  24. Jan 31, 2012
    • Didier GAZEN's avatar
      Problem when using srun --uid in conjunction with --jobid (patch included) · e2b39c14
      Didier GAZEN authored
      Hi,
      
      With slurm 2.3.2 (or 2.3.3), I encounter the following error when
      trying to launch as root a command attached to a running user's job
      even if I use the --uid=<user> option :
      
      sila@suse112:~> squeue
         JOBID PARTITION     NAME     USER    STATE      TIME TIMELIMIT
      NODES   CPUS NODELIST(REASON)
           551     debug mysleep.     sila  RUNNING      0:02 UNLIMITED
      1      1 n1
      
      root@suse112:~ # srun --jobid=551 hostname
      srun: error: Unable to create job step: Access/permission denied
      <--normal behaviour
      
      root@suse112:~ # srun --jobid=551 --uid=sila hostname
      srun: error: Unable to create job step: Invalid user id <--problem
      
      By increasing slurmctld verbosity, the log files displays the follwing
      error :
      
      slurmctld: debug2: Processing RPC: REQUEST_JOB_ALLOCATION_INFO_LITE from
      uid=0
      slurmctld: debug:  _slurm_rpc_job_alloc_info_lite JobId=551 NodeList=n1
      usec=1442
      slurmctld: debug2: Processing RPC: REQUEST_JOB_STEP_CREATE from uid=0
      slurmctld: error: Security violation, JOB_STEP_CREATE RPC from uid=0 to
      run as uid 1001
      
      which occurs in function : _slurm_rpc_job_step_create
      (src/slurmctld/proc_req.c)
      
      Here's my patch to prevent the command from failing (but I'm not sure
      that there is no side effects) :
      e2b39c14
  25. Jan 27, 2012
  26. Jan 19, 2012
  27. Dec 28, 2011
Loading