Commits · cd663c213c5c9d607dae7a03bf581bd956b74911 · tud-zih-energy / Slurm

Feb 27, 2015

Merge branch 'slurm-14.11' · cd663c21
Morris Jette authored 10 years ago
```
Conflicts:
	src/slurmctld/job_mgr.c
```
cd663c21
Display job's estimated NodeCount based off of partition's configured... · ce32018a
Brian Christiansen authored 10 years ago
```
Display job's estimated NodeCount based off of partition's configured resources rather than the whole system's.

Bug 1478
```
ce32018a
Cosmetic mods, no change in logic · b99fee15
Morris Jette authored 10 years ago

b99fee15
power/cray - Log cluster-wide power totals · 9e567ecc
Morris Jette authored 10 years ago
```
This provides a better global view of what the limits and caps are.
```
9e567ecc

power/cray developments · 3ff19460

Morris Jette authored 10 years ago

Remove time from "capmc get_node_energy_counter" call. If no recent
  data is available, no data is being returned, so just get latest
  information.
Initialize a variable to avoid xfree of uninitialized variable.
Correct joule to watt calculation (">" changed to "<")
Read configuration once when slurmctld starts rather than twice
Compute a node's power consumption with more precision based upon
  time to the microsecond

3ff19460

Feb 26, 2015
- Fixes to JSON configure path logic · eb9f22ca
  Morris Jette authored 10 years ago
  
  eb9f22ca
- Update Cray build/install instructions · ea51d8b8
  Morris Jette authored 10 years ago
  
  Add links to burst buffer and power management pages. Add JSON-C build/installation instructions.
  ea51d8b8
- Account all CPUs to the batch steps. · cc8c2e3e
  David Bigagli authored 10 years ago
  
  cc8c2e3e
- Add link to LLNL SRC (Scalable Checkpoint Restart) · 725a7cb0
  Morris Jette authored 10 years ago
  
  725a7cb0
- task/affinity - fix memory binding for cpusets · 701e5b33
  Morris Jette authored 10 years ago
  
  Previously, there was no binding of tasks to the appropriate NUMA. Based upon work by Josko Plazonic <plazonic@princeton.edu>.
  701e5b33
- task/affinity clean up · 3eff4e87
  Morris Jette authored 10 years ago
  
  Improved logging and some code restructuring. No change in logic.
  3eff4e87
- Revert "Remove unused variable." · bdd3a651
  David Bigagli authored 10 years ago
  
  This reverts commit e24a418b.
  bdd3a651
- Remove unused variable. · edd5ad26
  David Bigagli authored 10 years ago
  
  edd5ad26
- task/affinity clean up · 7b313990
  Morris Jette authored 10 years ago
  
  Improved logging and some code restructuring. No change in logic.
  7b313990
Feb 25, 2015
- Apply email notifications to entire job arrays · 9fa4909d
  Morris Jette authored 10 years ago
  
  Mail notifications on job BEGIN, END and FAIL now apply to a job array as a whole rather than generating individual email messages for each task in the job array.
  9fa4909d
- Revert "Remove unused variable." · 663ec8f2
  David Bigagli authored 10 years ago
  
  This reverts commit e24a418b.
  663ec8f2
- Remove unused variable. · e24a418b
  David Bigagli authored 10 years ago
  
  e24a418b
- Merge branch 'slurm-14.11' · 7e046346
  Morris Jette authored 10 years ago
  
  7e046346
- Add job_submit build instructions · ee90e55a
  Morris Jette authored 10 years ago
  
  ee90e55a
- select/alps - Reverse .my.cnf search order · 96363d42
  Morris Jette authored 10 years ago
  
  This is a variation on commit 5391b8cc Check $HOME/.my.cnf last rather than first to follow more standard search order
  96363d42
- topology/hypercube: Cosmetic changes for clean build, no change in logic · b626d838
  Morris Jette authored 10 years ago
  
  b626d838
Feb 24, 2015

Fix sprio showing wrong priority for job arrays until priority is recalculated. · 423029d8
Brian Christiansen authored 10 years ago
```
Bug 1469
```
423029d8
Add missing topology/hypercube plugin for SGI · fa6de30d
Michael A. Raymond authored 10 years ago

fa6de30d
Merge branch 'slurm-14.11' · 128148c1
Morris Jette authored 10 years ago

128148c1

cray/basil, read mysql creds from /root/.my.conf · 5391b8cc

Nina Suvanphim authored 10 years ago

The /root/.my.cnf would typically contain the login credentials for
root.  If those are needed for Slurm, then it should be checking
that directory.

(In reply to Nina Suvanphim from comment #0)
...
> const char *default_conf_paths[] = {
> "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line
> "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf",
> "/etc/mysql/my.cnf", NULL };

I'll also note that typically the $HOME/.my.cnf file would be
checked last rather than first.

5391b8cc

power/cray development · d32dac43
Morris Jette authored 10 years ago
```
Fix some logic related to power distribution across nodes
```
d32dac43
Merge branch 'slurm-14.11' · 5a133bcb
Morris Jette authored 10 years ago

5a133bcb
Add some SUG photo links · 3d67a89a
Morris Jette authored 10 years ago

3d67a89a
Fix code for apple computers SOL_TCP is not defined · ac0343be
Danny Auble authored 10 years ago

ac0343be
Fix wrong variables used in the wrapper functions needed for systems that · 8d0c9901
Danny Auble authored 10 years ago
```
don't support strong_alias
```
8d0c9901

power/cray development · acdec1f5

Morris Jette authored 10 years ago

Update power management web page: Add notes about powering nodes down/up
Prevent underflow in power distribution logic
Add logic to identify nodes in "ready" state. Only ready nodes can have
  their power caps modified
Don't change power cap if node not in ready state
Various improvements to logging
Refactor code to eliminate duplicate/repeated building of full NID list
Plug some memory leaks

acdec1f5

Feb 23, 2015

Fix test for scontrol change · 9cb22140

Morris Jette authored 10 years ago

Modify test 12.7 so that we specify a reason when setting a node DOWN
A recent change to the Slurm code now requires a reason

9cb22140

Feb 21, 2015
- power/cray: Read initial caps from capmc · 58da1582
  Morris Jette authored 10 years ago
  
  58da1582
Feb 20, 2015

scontrol: Require Reason when setting node DOWN · e7c61bdd
Morris Jette authored 10 years ago

e7c61bdd

power/cray work · 82de9635

Morris Jette authored 10 years ago

Correct capmc arguments to set power cap.
Convert "capmc get_node_energy_counter" to use hostlist expressin rather
   than listing every node in a comma separated list.
Log commands and args run by the plugin via the power_run_script()
   function in src/plugins/power/common/power_common.c.
Use hostlist to build condenced nid list for power cap set/clear functions.

82de9635

Merge branch 'slurm-14.11' · b8fbbf2b
Morris Jette authored 10 years ago

b8fbbf2b

Fix to GRES NoConsume logic · 33c48ac5

Dorian Krause authored 10 years ago

we came across the following error message in the slurmctld logs when
using non-consumable resources:

error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
is 0

The error comes from _job_dealloc():

node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
"potion", job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:3980
(job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:4190
job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
    at select_linear.c:2091
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
select_linear.c:3176
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at select_linear.c:3390
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at node_select.c:588
avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
exc_core_bitmap=0x0)
    at backfill.c:367

The cause of this problem is that _node_state_dup() in gres.c does not
duplicate the no_consume flag.
The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
which calls _node_state_dup().

Below is a simple patch to fix the problem. A "future-proof" alternative
might be to memcpy() from gres_ptr to new_gres and
only handle pointers separately.

33c48ac5

power/cray - compute energy consumption via capmc · d500de54
Morris Jette authored 10 years ago

d500de54

Feb 19, 2015
- Load lua-5.2 library if using lua5.2 for lua job submit plugin. · 408c108e
  Brian Christiansen authored 10 years ago
  
  Bug 1471
  408c108e
- Merge branch 'slurm-14.11' · 6b5ae328
  Morris Jette authored 10 years ago
  
  6b5ae328