- Nov 02, 2012
-
-
Morris Jette authored
Conflicts: META NEWS
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- Nov 01, 2012
-
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
or users in it
-
Morris Jette authored
the slurmd daemon would previously leak memory whenever reconfigured (SIGHUP or "scontrol reconfig" executed).
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- Oct 31, 2012
-
-
Danny Auble authored
reservation. The assoc_list is used to split out idle time on the reservation. It was deemed this wasn't that big of a deal when dealing with 'not' associations.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
The functionality should be unchanged, but the logic faster
-
Mark Nelson authored
launching that wouldn't be able to run to completion because of a GrpCPUMins limit.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
- Oct 30, 2012
-
-
Morris Jette authored
-
Morris Jette authored
Prior logic created a core_bitmap in the reservation rather than leaving it NULL when creating whole node reservations with topology
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
order as well.
-
- Oct 29, 2012
-
-
Morris Jette authored
-
Morris Jette authored
Now try to get max node count rather than minimizing leaf switches used. For example, if each leaf switch has 8 nodes then a request for -N4-16 would allocate 8 nodes (one leaf switch) rather than 16 nodes over two leaf switches.
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Anyhow, after applying the patch, I was still running into the same difficulty. Upon a closer look, I saw that I was still receiving the ALPS backend error in the slurmctld.log file. When I examined the code pertaining this and ran some SLURM-independent tests, I found that we were executing the do_basil_confirm function multiple times in the cases where it would fail. My independent tests show precisely the same behaviour; that is, if you make a reservation request, then successfully confirm it and then attempt to confirm it again, you receive this error message. However, the "apstat -rvv" command shows that the ALPS reservation is fine and therefore I concluded that this particular ALPS/BASIL message is more of an informational one and not a "show-stopper." In other words, I can consider the node ready at this point. As a simple work around, I currently just inserted an if-block immediately after the call to "basil_confirm" in function "do_basil_confirm" in ".../src/plugins/select/cray/basil_interface.c." The if-statment checks for "BE_BACKEND" and if this is the result then it prints an informational message to slurmctld.log and sets the variable rc=0 so that we can consider the node ready. This, now allows my prolog scripts to run and I can clearly see the SLURM message that I had placed in that if-block. However, I am not certain if we really should just allow this error code to pass through as it seems like it could be a fairly generic code and there could be various other causes of it where we would not wish to allow it to pass. I really only want to limit the number of calls to basil_confirm to one. Perhaps I could add a field to the job_record so that I can mark whether the ALPS reservation had been confirmed or not.
-
Morris Jette authored
Replace copy/clear with move/alloc
-
- Oct 27, 2012
-
-
Danny Auble authored
-
- Oct 26, 2012
-
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-