Commit 194d21b3 authored by Robert Dietrich's avatar Robert Dietrich

extended docu

parent e560290d
......@@ -42,4 +42,21 @@ Finally, a symbolic link that points on a *pika-VERSION.conf* file has to be cre
To create a new PIKA software package, copy a *pika-VERSION.conf* file with a new version number and change the variables *PIKA_VERSION*, *COLLECTD_VERSION*, *LIKWID_VERSION* and, if necessary, *LIKWID_VERSION_SHA*.
## How components are connected
![Flow Graph](./flow_graph.svg)
\ No newline at end of file
![Flow Graph](./flow_graph.svg)
### What is written/read or send/received?
1) JOB_RECORD__WHOLE_NODE, JOB_RECORD__COMMENT, JOB_RECORD__WHOLE_NODE, JOB_RECORD__CPU_CNT, JOB_RECORD__CPU_IDS, JOB_RECORD__NODE_NAMES, JOB_RECORD__TIME_LIMIT, !!JOB_NAME!!, !!JOB_ACCOUNT!!
(The environment variables SLURM_JOB_ID, SLURM_NODELIST, SLURM_JOB_USER and SLURM_JOB_PARTITION are available within prolog.)
2) SLURM_JOB_ID, SLURM_JOB_USER, JOB_ACCOUNT,STATUS='running', NUM_NODES, SLURM_NODELIST,CPULIST,JOB_RECORD__CPU_CNT, START=`date +%s`, JOB_NAME, JOB_RECORD__TIME_LIMIT, SLURM_JOB_PARTITION, JOB_RECORD__WHOLE_NODE, ARRAY_ID<br>
NUM_NODES ... via $(nodeset -c ${SLURM_NODELIST}<br>
CPULIST ... generated from JOB_RECORD__CPU_IDS and JOB_RECORD__NODE_NAMES<br>
ARRAY_ID ... for non-array jobs 0, otherwise ... (currently not available)
3) Update STATUS='completed|timeout', JOB_END=`date +%s`, PROPERTY_ID for a SLURM_JOB_ID and START<br>
PROPERTY_ID ... bit field which defines several properties, e.g. monitoring was disabled, incomplete Slurm data<br>
(Delete jobs shorter than one minute.)
4) Chunks/batches of time-series data. For a complete list of metrics see [daemon/collectd](daemon/collectd).
5) TODO: Frank
\ No newline at end of file
......@@ -4,4 +4,24 @@
It is used by [pika_start_collectd.sh](../../job_control/slurm/taurus/pika_start_collectd.sh).
With PIKA 1.2 the read alignment options have changed and thus the template file to [pika-1.2_collectd_template.conf](pika-1.2_collectd_template.conf).
We added a new metric type for LIKWID. It is defined in [custom_types.db](collect_template.conf) and included with the collectd configuration.
\ No newline at end of file
We added a new metric type for LIKWID. It is defined in [custom_types.db](collect_template.conf) and included with the collectd configuration.
## List of collected metrics
| Metric | Proposed Name | Data Source | acquired per |
| :-------------- | :-------------- | :-------------- | :------------ |
| **CPU** | | | |
| &emsp;Usage | cpu\_usage | /proc/stat | hardware thread |
| &emsp;Main memory utilization | mem\_used | /proc/meminfo | node |
| &emsp;IPC | ipc | LIKWID | hardware thread |
| &emsp;FLOPS (SP-normalized) | flops\_any | LIKWID | hardware thread |
| &emsp;Main memory bandwidth | mem\_bw | LIKWID | CPU/socket |
| &emsp;Power consumption | rapl\_power | LIKWID | CPU/socket |
| &emsp;Infiniband bandwidth |ib\_bw | /sys/class/infiniband/... | Infiniband device |
| **I/O bandwidth & metadata** | | | |
| &emsp;Local disk | read\_bw, write\_bw & read\_ops, write\_ops | /proc/diskstats | disk |
| &emsp;Lustre | read\_bw, write\_bw & open, close, create,<br>seek, fsync, read\_requests, write\_requests | /proc/fs/lustre/llite/\*/stats | Lustre instance |
| **GPU** | | | |
| &emsp;Usage | gpu\_used | NVML | GPU |
| &emsp;Memory Utilization | gpu\_mem\_used | NVML | GPU |
| &emsp;Power Consumption | gpu\_power | NVML | GPU |
| &emsp;Temperature | gpu\_temperature | NVML | GPU |
\ No newline at end of file
......@@ -48,11 +48,16 @@
<glyph unicode="B" horiz-adv-x="1033" d="M 201,1462 L 614,1462 C 808,1462 948,1433 1035,1375 1122,1317 1165,1225 1165,1100 1165,1013 1141,942 1093,886 1044,829 974,793 881,776 L 881,766 C 1103,728 1214,611 1214,416 1214,285 1170,183 1082,110 993,37 870,0 711,0 L 201,0 201,1462 Z M 371,836 L 651,836 C 771,836 857,855 910,893 963,930 989,994 989,1083 989,1165 960,1224 901,1261 842,1297 749,1315 621,1315 L 371,1315 371,836 Z M 371,692 L 371,145 676,145 C 794,145 883,168 943,214 1002,259 1032,331 1032,428 1032,519 1002,585 941,628 880,671 787,692 662,692 L 371,692 Z"/>
<glyph unicode="9" horiz-adv-x="980" d="M 1061,838 C 1061,266 840,-20 397,-20 320,-20 258,-13 213,0 L 213,143 C 266,126 327,117 395,117 555,117 676,167 758,266 839,365 884,516 891,721 L 879,721 C 842,666 794,624 733,595 672,566 604,551 528,551 399,551 296,590 220,667 144,744 106,852 106,991 106,1143 149,1263 234,1351 319,1439 430,1483 569,1483 668,1483 755,1458 830,1407 904,1356 961,1281 1001,1184 1041,1086 1061,971 1061,838 Z M 569,1341 C 474,1341 400,1310 348,1249 296,1188 270,1102 270,993 270,897 294,822 342,767 390,712 463,684 561,684 622,684 678,696 729,721 780,746 820,779 849,822 878,865 893,909 893,956 893,1026 879,1091 852,1150 825,1209 787,1256 738,1290 689,1324 632,1341 569,1341 Z"/>
<glyph unicode="8" horiz-adv-x="1007" d="M 584,1483 C 717,1483 823,1452 901,1390 979,1328 1018,1242 1018,1133 1018,1061 996,995 951,936 906,877 835,823 737,774 856,717 940,658 990,596 1040,533 1065,461 1065,379 1065,258 1023,161 938,89 853,16 737,-20 590,-20 434,-20 314,14 230,83 146,151 104,248 104,373 104,540 206,671 410,764 318,816 252,872 212,933 172,993 152,1060 152,1135 152,1241 191,1326 270,1389 348,1452 453,1483 584,1483 Z M 268,369 C 268,289 296,227 352,182 407,137 485,115 586,115 685,115 763,138 818,185 873,232 901,296 901,377 901,442 875,499 823,550 771,600 680,649 551,696 452,653 380,606 335,555 290,503 268,441 268,369 Z M 582,1348 C 499,1348 433,1328 386,1288 339,1248 315,1195 315,1128 315,1067 335,1014 374,970 413,926 486,882 592,838 687,878 755,921 795,967 834,1013 854,1067 854,1128 854,1195 830,1249 782,1289 733,1328 667,1348 582,1348 Z"/>
<glyph unicode="5" horiz-adv-x="927" d="M 557,893 C 711,893 832,855 921,779 1009,702 1053,598 1053,465 1053,314 1005,195 909,109 812,23 679,-20 510,-20 345,-20 220,6 133,59 L 133,219 C 180,189 238,166 307,149 376,132 445,123 512,123 629,123 721,151 786,206 851,261 883,341 883,446 883,650 758,752 508,752 445,752 360,742 254,723 L 168,778 223,1462 950,1462 950,1309 365,1309 328,870 C 405,885 481,893 557,893 Z"/>
<glyph unicode="4" horiz-adv-x="1113" d="M 1130,336 L 913,336 913,0 754,0 754,336 43,336 43,481 737,1470 913,1470 913,487 1130,487 1130,336 Z M 754,487 L 754,973 C 754,1068 757,1176 764,1296 L 756,1296 C 724,1232 694,1179 666,1137 L 209,487 754,487 Z"/>
<glyph unicode="3" horiz-adv-x="980" d="M 1006,1118 C 1006,1025 980,948 928,889 875,830 801,790 705,770 L 705,762 C 822,747 909,710 966,650 1023,590 1051,511 1051,414 1051,275 1003,168 906,93 809,18 672,-20 494,-20 417,-20 346,-14 282,-3 217,9 155,30 94,59 L 94,217 C 157,186 225,162 297,146 368,129 436,121 500,121 753,121 879,220 879,418 879,595 740,684 461,684 L 317,684 317,827 463,827 C 577,827 667,852 734,903 801,953 834,1023 834,1112 834,1183 810,1239 761,1280 712,1321 645,1341 561,1341 497,1341 437,1332 380,1315 323,1298 259,1266 186,1219 L 102,1331 C 162,1378 231,1416 310,1443 388,1470 470,1483 557,1483 699,1483 809,1451 888,1386 967,1321 1006,1231 1006,1118 Z"/>
<glyph unicode="2" horiz-adv-x="980" d="M 1061,0 L 100,0 100,143 485,530 C 602,649 680,733 717,784 754,835 782,884 801,932 820,980 829,1032 829,1087 829,1165 805,1227 758,1273 711,1318 645,1341 561,1341 500,1341 443,1331 389,1311 334,1291 274,1255 207,1202 L 119,1315 C 254,1427 400,1483 559,1483 696,1483 804,1448 882,1378 960,1307 999,1213 999,1094 999,1001 973,910 921,819 869,728 772,614 629,475 L 309,162 309,154 1061,154 1061,0 Z"/>
<glyph unicode="1" horiz-adv-x="530" d="M 715,0 L 553,0 553,1042 C 553,1129 556,1211 561,1288 547,1274 531,1259 514,1244 497,1229 417,1164 276,1049 L 188,1163 575,1462 715,1462 715,0 Z"/>
<glyph unicode="0" horiz-adv-x="1007" d="M 1069,733 C 1069,480 1029,292 950,167 870,42 748,-20 584,-20 427,-20 307,44 225,172 143,299 102,486 102,733 102,988 142,1177 221,1300 300,1423 421,1485 584,1485 743,1485 863,1421 946,1292 1028,1163 1069,977 1069,733 Z M 270,733 C 270,520 295,366 345,269 395,172 475,123 584,123 695,123 775,172 825,271 874,369 899,523 899,733 899,943 874,1097 825,1195 775,1292 695,1341 584,1341 475,1341 395,1293 345,1197 295,1100 270,946 270,733 Z"/>
<glyph unicode="/" horiz-adv-x="742" d="M 731,1462 L 186,0 20,0 565,1462 731,1462 Z"/>
<glyph unicode="." horiz-adv-x="266" d="M 152,106 C 152,151 162,185 183,208 203,231 232,242 270,242 309,242 339,231 361,208 382,185 393,151 393,106 393,63 382,29 360,6 338,-17 308,-29 270,-29 236,-29 208,-18 186,3 163,24 152,58 152,106 Z"/>
<glyph unicode="," horiz-adv-x="318" d="M 350,238 L 365,215 C 348,148 323,71 290,-18 257,-106 223,-188 188,-264 L 63,-264 C 81,-195 101,-109 123,-7 144,95 159,177 168,238 L 350,238 Z"/>
<glyph unicode=")" horiz-adv-x="477" d="M 524,561 C 524,386 498,222 447,71 395,-80 320,-212 223,-324 L 63,-324 C 156,-199 227,-59 276,94 325,247 350,403 350,563 350,726 326,884 278,1038 229,1192 157,1333 61,1462 L 223,1462 C 321,1345 396,1210 447,1056 498,901 524,736 524,561 Z"/>
<glyph unicode="&amp;" horiz-adv-x="1403" d="M 414,1171 C 414,1125 426,1081 450,1040 474,998 515,948 573,889 659,939 719,985 753,1028 786,1070 803,1119 803,1174 803,1225 786,1267 752,1300 717,1332 671,1348 614,1348 555,1348 507,1332 470,1300 433,1268 414,1225 414,1171 Z M 569,129 C 730,129 863,180 969,283 L 532,707 C 458,662 406,624 375,595 344,565 322,533 307,499 292,465 285,426 285,383 285,305 311,243 363,198 414,152 483,129 569,129 Z M 113,379 C 113,466 136,542 183,609 229,676 312,743 432,811 375,874 337,922 317,955 296,988 280,1022 268,1057 256,1092 250,1129 250,1167 250,1267 283,1345 348,1401 413,1457 504,1485 621,1485 729,1485 814,1457 876,1402 938,1346 969,1268 969,1169 969,1098 946,1032 901,972 856,911 781,850 676,788 L 1083,397 C 1120,438 1150,487 1173,543 1195,598 1214,659 1229,725 L 1397,725 C 1352,534 1283,390 1192,291 L 1491,0 1262,0 1077,178 C 998,107 918,57 837,26 756,-5 665,-20 565,-20 422,-20 311,15 232,86 153,157 113,254 113,379 Z"/>
<glyph unicode=" " horiz-adv-x="529"/>
</font>
......@@ -102,7 +107,7 @@
</font>
</defs>
<defs class="TextShapeIndex">
<g ooo:slide="id1" ooo:id-list="id3 id4 id5 id6 id7 id8 id9 id10 id11 id12 id13 id14 id15 id16 id17 id18 id19 id20 id21 id22 id23 id24 id25 id26 id27 id28 id29 id30 id31 id32 id33 id34 id35 id36 id37 id38 id39 id40 id41 id42 id43 id44 id45 id46 id47 id48 id49 id50 id51 id52 id53 id54 id55"/>
<g ooo:slide="id1" ooo:id-list="id3 id4 id5 id6 id7 id8 id9 id10 id11 id12 id13 id14 id15 id16 id17 id18 id19 id20 id21 id22 id23 id24 id25 id26 id27 id28 id29 id30 id31 id32 id33 id34 id35 id36 id37 id38 id39 id40 id41 id42 id43 id44 id45 id46 id47 id48 id49 id50 id51 id52 id53 id54 id55 id56 id57 id58"/>
</defs>
<defs class="EmbeddedBulletChars">
<g id="bullet-char-template-57356" transform="scale(0.00048828125,-0.00048828125)">
......@@ -150,24 +155,24 @@
<g class="Page">
<g class="com.sun.star.drawing.CustomShape">
<g id="id3">
<rect class="BoundingBox" stroke="none" fill="none" x="4277" y="2004" width="2931" height="4514"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 5742,6499 L 4295,6499 4295,2022 7189,2022 7189,6499 5742,6499 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 5742,6499 L 4295,6499 4295,2022 7189,2022 7189,6499 5742,6499 Z"/>
<rect class="BoundingBox" stroke="none" fill="none" x="4375" y="2004" width="2833" height="4814"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 5791,6799 L 4393,6799 4393,2022 7189,2022 7189,6799 5791,6799 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 5791,6799 L 4393,6799 4393,2022 7189,2022 7189,6799 5791,6799 Z"/>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
<g id="id4">
<rect class="BoundingBox" stroke="none" fill="none" x="3974" y="2138" width="3080" height="4780"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 5514,6899 L 3992,6899 3992,2156 7035,2156 7035,6899 5514,6899 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 5514,6899 L 3992,6899 3992,2156 7035,2156 7035,6899 5514,6899 Z"/>
<rect class="BoundingBox" stroke="none" fill="none" x="4062" y="2170" width="2957" height="4849"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 5540,7000 L 4080,7000 4080,2188 7000,2188 7000,7000 5540,7000 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 5540,7000 L 4080,7000 4080,2188 7000,2188 7000,7000 5540,7000 Z"/>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
<g id="id5">
<rect class="BoundingBox" stroke="none" fill="none" x="3675" y="2284" width="3211" height="4934"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 5280,7199 L 3693,7199 3693,2302 6867,2302 6867,7199 5280,7199 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 5280,7199 L 3693,7199 3693,2302 6867,2302 6867,7199 5280,7199 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="353px" font-weight="400"><tspan class="TextPosition" x="4032" y="7005"><tspan fill="rgb(0,0,0)" stroke="none">Compute Node</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="3782" y="2382" width="3037" height="4837"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 5300,7200 L 3800,7200 3800,2400 6800,2400 6800,7200 5300,7200 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 5300,7200 L 3800,7200 3800,2400 6800,2400 6800,7200 5300,7200 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="353px" font-weight="400"><tspan class="TextPosition" x="4052" y="7006"><tspan fill="rgb(0,0,0)" stroke="none">Compute Node</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
......@@ -188,14 +193,14 @@
</g>
<g class="com.sun.star.drawing.CustomShape">
<g id="id8">
<rect class="BoundingBox" stroke="none" fill="none" x="4388" y="4188" width="212" height="695"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="53" stroke-linejoin="miter" d="M 4492,4855 L 4493,4411"/>
<path fill="rgb(0,0,0)" stroke="none" d="M 4494,4215 L 4598,4425 4388,4425 4494,4215 Z"/>
<rect class="BoundingBox" stroke="none" fill="none" x="4388" y="4186" width="212" height="695"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="53" stroke-linejoin="miter" d="M 4492,4853 L 4493,4409"/>
<path fill="rgb(0,0,0)" stroke="none" d="M 4494,4213 L 4598,4423 4388,4423 4494,4213 Z"/>
</g>
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id9">
<rect class="BoundingBox" stroke="none" fill="none" x="4393" y="4224" width="3001" height="732"/>
<rect class="BoundingBox" stroke="none" fill="none" x="4393" y="4224" width="1808" height="732"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="4643" y="4654"><tspan fill="rgb(0,0,0)" stroke="none">start/stop</tspan></tspan></tspan></text>
</g>
</g>
......@@ -209,9 +214,9 @@
</g>
<g class="com.sun.star.drawing.CustomShape">
<g id="id11">
<rect class="BoundingBox" stroke="none" fill="none" x="2670" y="5090" width="1349" height="212"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="53" stroke-linejoin="miter" d="M 3991,5197 L 2893,5195"/>
<path fill="rgb(0,0,0)" stroke="none" d="M 2697,5195 L 2907,5090 2907,5300 2697,5195 Z"/>
<rect class="BoundingBox" stroke="none" fill="none" x="2668" y="5088" width="1349" height="212"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="53" stroke-linejoin="miter" d="M 3989,5195 L 2891,5193"/>
<path fill="rgb(0,0,0)" stroke="none" d="M 2695,5193 L 2905,5088 2905,5298 2695,5193 Z"/>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
......@@ -243,10 +248,10 @@
</g>
<g class="com.sun.star.drawing.CustomShape">
<g id="id15">
<rect class="BoundingBox" stroke="none" fill="none" x="7687" y="1982" width="4015" height="2343"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 9694,4306 L 7705,4306 7705,2000 11683,2000 11683,4306 9694,4306 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 9694,4306 L 7705,4306 7705,2000 11683,2000 11683,4306 9694,4306 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="353px" font-weight="400"><tspan class="TextPosition" x="8031" y="4112"><tspan fill="rgb(0,0,0)" stroke="none">Service VM with SSD</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="7881" y="1982" width="3820" height="2343"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 9791,4306 L 7899,4306 7899,2000 11682,2000 11682,4306 9791,4306 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 9791,4306 L 7899,4306 7899,2000 11682,2000 11682,4306 9791,4306 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="353px" font-weight="400"><tspan class="TextPosition" x="8128" y="4112"><tspan fill="rgb(0,0,0)" stroke="none">Service VM with SSD</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
......@@ -268,16 +273,16 @@
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id18">
<rect class="BoundingBox" stroke="none" fill="none" x="7527" y="1459" width="2074" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="7777" y="1889"><tspan fill="rgb(0,0,0)" stroke="none">taurusi4038</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="7479" y="1483" width="2074" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="7729" y="1913"><tspan fill="rgb(0,0,0)" stroke="none">taurusi4038</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
<g id="id19">
<rect class="BoundingBox" stroke="none" fill="none" x="7675" y="4892" width="4015" height="2326"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 9682,7199 L 7693,7199 7693,4910 11671,4910 11671,7199 9682,7199 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 9682,7199 L 7693,7199 7693,4910 11671,4910 11671,7199 9682,7199 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="353px" font-weight="400"><tspan class="TextPosition" x="8019" y="7005"><tspan fill="rgb(0,0,0)" stroke="none">Service VM with SSD</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="7881" y="4892" width="3808" height="2326"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 9785,7199 L 7899,7199 7899,4910 11670,4910 11670,7199 9785,7199 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 9785,7199 L 7899,7199 7899,4910 11670,4910 11670,7199 9785,7199 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="353px" font-weight="400"><tspan class="TextPosition" x="8122" y="7005"><tspan fill="rgb(0,0,0)" stroke="none">Service VM with SSD</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
......@@ -299,8 +304,8 @@
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id22">
<rect class="BoundingBox" stroke="none" fill="none" x="9708" y="4400" width="2074" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="9958" y="4830"><tspan fill="rgb(0,0,0)" stroke="none">taurusi4039</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="9804" y="4400" width="2074" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="10054" y="4830"><tspan fill="rgb(0,0,0)" stroke="none">taurusi4039</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
......@@ -312,8 +317,8 @@
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id24">
<rect class="BoundingBox" stroke="none" fill="none" x="2700" y="4669" width="1601" height="732"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="2950" y="5099"><tspan fill="rgb(0,0,0)" stroke="none">read</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="2696" y="4621" width="1298" height="732"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="2946" y="5071"><tspan fill="rgb(0,0,0)" stroke="none">read</tspan></tspan><tspan class="TextPosition" x="3551" y="4958"><tspan font-size="197px" fill="rgb(0,0,0)" stroke="none">1)</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
......@@ -321,7 +326,7 @@
<rect class="BoundingBox" stroke="none" fill="none" x="12582" y="1982" width="4015" height="2337"/>
<path fill="rgb(217,217,217)" stroke="none" d="M 14589,4300 L 12600,4300 12600,2000 16578,2000 16578,4300 14589,4300 Z"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="35" stroke-linejoin="miter" d="M 14589,4300 L 12600,4300 12600,2000 16578,2000 16578,4300 14589,4300 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="353px" font-weight="400"><tspan class="TextPosition" x="12884" y="4106"><tspan fill="rgb(0,0,0)" stroke="none">access to big storage</tspan></tspan></tspan></text>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="353px" font-weight="400"><tspan class="TextPosition" x="12712" y="4101"><tspan fill="rgb(0,0,0)" stroke="none">... lots of storage space</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.CustomShape">
......@@ -337,8 +342,8 @@
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id27">
<rect class="BoundingBox" stroke="none" fill="none" x="14678" y="1495" width="2101" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="14928" y="1925"><tspan fill="rgb(0,0,0)" stroke="none">taurusi4039</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="14738" y="1495" width="2101" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="14988" y="1925"><tspan fill="rgb(0,0,0)" stroke="none">taurusi4039</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.TextShape">
......@@ -349,9 +354,9 @@
</g>
<g class="com.sun.star.drawing.CustomShape">
<g id="id29">
<rect class="BoundingBox" stroke="none" fill="none" x="9588" y="7170" width="212" height="625"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="53" stroke-linejoin="miter" d="M 9692,7767 L 9693,7393"/>
<path fill="rgb(0,0,0)" stroke="none" d="M 9694,7197 L 9798,7407 9588,7407 9694,7197 Z"/>
<rect class="BoundingBox" stroke="none" fill="none" x="9588" y="7168" width="212" height="625"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="53" stroke-linejoin="miter" d="M 9692,7765 L 9693,7391"/>
<path fill="rgb(0,0,0)" stroke="none" d="M 9694,7195 L 9798,7405 9588,7405 9694,7195 Z"/>
</g>
</g>
<g class="com.sun.star.drawing.TextShape">
......@@ -376,8 +381,8 @@
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id33">
<rect class="BoundingBox" stroke="none" fill="none" x="11300" y="-41" width="2201" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="11550" y="389"><tspan fill="rgb(0,0,0)" stroke="none">taurusi4039</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="11360" y="-1" width="2201" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="11610" y="429"><tspan fill="rgb(0,0,0)" stroke="none">taurusi4039</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.TextShape">
......@@ -388,7 +393,7 @@
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id35">
<rect class="BoundingBox" stroke="none" fill="none" x="12616" y="1300" width="1601" height="732"/>
<rect class="BoundingBox" stroke="none" fill="none" x="12616" y="1300" width="1285" height="732"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="12866" y="1730"><tspan fill="rgb(0,0,0)" stroke="none">write</tspan></tspan></tspan></text>
</g>
</g>
......@@ -445,9 +450,9 @@
</g>
<g class="com.sun.star.drawing.CustomShape">
<g id="id42">
<rect class="BoundingBox" stroke="none" fill="none" x="6872" y="8391" width="1156" height="212"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="53" stroke-linejoin="miter" d="M 8000,8498 L 7095,8496"/>
<path fill="rgb(0,0,0)" stroke="none" d="M 6899,8496 L 7109,8391 7109,8601 6899,8496 Z"/>
<rect class="BoundingBox" stroke="none" fill="none" x="6872" y="8389" width="1156" height="212"/>
<path fill="none" stroke="rgb(0,0,0)" stroke-width="53" stroke-linejoin="miter" d="M 8000,8496 L 7095,8494"/>
<path fill="rgb(0,0,0)" stroke="none" d="M 6899,8494 L 7109,8389 7109,8599 6899,8494 Z"/>
</g>
</g>
<g class="com.sun.star.drawing.LineShape">
......@@ -526,14 +531,35 @@
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id54">
<rect class="BoundingBox" stroke="none" fill="none" x="6972" y="7924" width="1601" height="701"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="7222" y="8354"><tspan fill="rgb(0,0,0)" stroke="none">read</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="13704" y="2076" width="1638" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="13954" y="2506"><tspan fill="rgb(0,0,0)" stroke="none">InfluxDB</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id55">
<rect class="BoundingBox" stroke="none" fill="none" x="13704" y="2076" width="1638" height="642"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="13954" y="2506"><tspan fill="rgb(0,0,0)" stroke="none">InfluxDB</tspan></tspan></tspan></text>
<rect class="BoundingBox" stroke="none" fill="none" x="7100" y="4973" width="901" height="412"/>
<path fill="rgb(255,255,255)" fill-opacity="0.698" stroke="rgb(255,255,255)" stroke-opacity="0.698" d="M 7550,5384 L 7100,5384 7100,4973 8000,4973 8000,5384 7550,5384 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="7110" y="5298"><tspan fill="rgb(0,0,0)" stroke="none">send</tspan></tspan><tspan class="TextPosition" x="7753" y="5185"><tspan font-size="197px" fill="rgb(0,0,0)" stroke="none">2)</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id56">
<rect class="BoundingBox" stroke="none" fill="none" x="6900" y="5676" width="1201" height="412"/>
<path fill="rgb(255,255,255)" fill-opacity="0.698" stroke="rgb(255,255,255)" stroke-opacity="0.698" d="M 7500,6087 L 6900,6087 6900,5676 8100,5676 8100,6087 7500,6087 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="6910" y="6001"><tspan fill="rgb(0,0,0)" stroke="none">update</tspan></tspan><tspan class="TextPosition" x="7850" y="5888"><tspan font-size="197px" fill="rgb(0,0,0)" stroke="none">3)</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id57">
<rect class="BoundingBox" stroke="none" fill="none" x="7237" y="2565" width="901" height="412"/>
<path fill="rgb(255,255,255)" fill-opacity="0.698" stroke="rgb(255,255,255)" stroke-opacity="0.698" d="M 7687,2976 L 7237,2976 7237,2565 8137,2565 8137,2976 7687,2976 Z"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="7247" y="2890"><tspan fill="rgb(0,0,0)" stroke="none">send</tspan></tspan><tspan class="TextPosition" x="7890" y="2777"><tspan font-size="197px" fill="rgb(0,0,0)" stroke="none">4)</tspan></tspan></tspan></text>
</g>
</g>
<g class="com.sun.star.drawing.TextShape">
<g id="id58">
<rect class="BoundingBox" stroke="none" fill="none" x="6903" y="7900" width="1298" height="732"/>
<text class="TextShape"><tspan class="TextParagraph" font-family="Open Sans, sans-serif" font-size="282px" font-weight="400"><tspan class="TextPosition" x="7153" y="8350"><tspan fill="rgb(0,0,0)" stroke="none">read</tspan></tspan><tspan class="TextPosition" x="7758" y="8237"><tspan font-size="197px" fill="rgb(0,0,0)" stroke="none">5)</tspan></tspan></tspan></text>
</g>
</g>
</g>
......
# SLURM Prolog and Epilog Scripts
Currently, PIKA uses a global prolog and epilog script for each system or partition. The subfolders contain platform dependent prolog and epilog scripts. The prolog and epilog scripts is encapsulated in several parts. The main scripts, [prolog.sh](prolog.sh) and [epilog.sh](epilog.sh), are structured as follows.
**Adjust the path after the first *source* command in [prolog.sh](prolog.sh) and [epilog.sh](epilog.sh) according to your setup.**
## prolog.sh
1. Get PIKA environment variables <br>
--> [pika-current.conf](../../../pika-current.conf) <br>
&emsp;&emsp;--> [pika.conf](../../../pika.conf)
2. Check for active jobs and make this job visible to other prologs
3. Setup debugging
4. Install the PIKA package <br>
--> [pika_install_prolog_include.sh](pika_install_prolog_include.sh)
5. Check/setup logrotate & wait until PIKA python is available
6. Determine master node
7. Set default values for the SLURM environment (in case Redis database is down) <br>
--> [pika_slurm_env.sh](pika_slurm_env.sh)
8. Get job metadata from Redis database <br>
--> [pika_get_metadata_prolog_include.sh](pika_get_metadata_prolog_include.sh) <br>
&emsp;&emsp;--> [pika_slurm_env_redis_new.py](pika_slurm_env_redis_new.py)
9. Start/stop collectd <br>
--> [pika_collectd_prolog_include.sh](pika_collectd_prolog_include.sh) <br>
&emsp;&emsp;--> [pika_start_collectd.sh](pika_start_collectd.sh)
10. Send job metadata to MariaDB <br>
--> [pika_save_metadata_prolog_include.sh](pika_save_metadata_prolog_include.sh) (uses [pika_utils.sh](pika_utils.sh))
## epilog.sh
1. Get PIKA environment variables <br>
--> [pika-current.conf](../../../pika-current.conf) <br>
&emsp;&emsp;--> [pika.conf](../../../pika.conf)
2. Check if prolog was called and debug file is available
3. Read SLURM environment from file (created in during prolog)
4. Determine master node
5. Update job metadata <br>
--> [pika_update_metadata_epilog_include.sh](pika_update_metadata_epilog_include.sh) (uses [pika_utils.sh](pika_utils.sh))
6. Set LIKWID counters if LIKWID is used with direct access mode
7. Cleanup local data
......@@ -12,38 +12,36 @@ import argparse
from time import sleep
from itertools import count, groupby
#from pprint import pprint
def list_to_ranges(L):
G=(list(x) for _,x in groupby(L, lambda x,c=count(): next(c)-x))
return ",".join("-".join(map(str,(g[0],g[-1])[:len(g)])) for g in G)
def main(job_id, debug_path, env_file, force):
redis_host = os.environ["REDIS_HOST"]
redis_port = os.environ["REDIS_PORT"]
redis_password = os.environ["REDIS_PASSWORD"]
connection = redis.StrictRedis(host=redis_host, port=redis_port, password=redis_password, socket_timeout=10, socket_connect_timeout=10)
debug_file = None
if debug_path:
debug_file_path = debug_path + "/memcache_" + str(job_id)
debug_file = open(debug_file_path,'w')
debug_file.write("debug before: {0} {1}\n".format(job_id, time.time()))
redis_host = os.environ.get('REDIS_HOST')
if redis_host is None:
if debug_file:
debug_file.write("REDIS_HOST is is not set!\n")
sys.exit("REDIS_HOST is is not set!")
connection = redis.StrictRedis(host=redis_host, port=6379, socket_timeout=10, socket_connect_timeout=10)
slurm_env_string = None
haveConnectionError = False
try:
slurm_env_string = connection.get("prope_" + str(job_id))
slurm_env_string = connection.get("pika_" + str(job_id))
except: # redis.exceptions.TimeoutError:
haveConnectionError = True
t = 0
while slurm_env_string == None and t < 10:
try:
slurm_env_string = connection.get("prope_" + str(job_id))
slurm_env_string = connection.get("pika_" + str(job_id))
except: # redis.exceptions.TimeoutError:
haveConnectionError = True
continue
......@@ -68,6 +66,8 @@ def main(job_id, debug_path, env_file, force):
except:
slurm_env = {}
#pprint(slurm_env)
if debug_path and debug_file:
if slurm_env:
debug_file.write(str(slurm_env))
......@@ -77,13 +77,13 @@ def main(job_id, debug_path, env_file, force):
# check if this job is exclusive and no monitoring is requested
monitoring_on = -1
if slurm_env:
nodes_shared = str(slurm_env['shared'])
nodes_exclusive = str(slurm_env['JOB_RECORD__WHOLE_NODE'])
if nodes_shared == "OK":
if nodes_exclusive == "0":
monitoring_on = 1
else:
#check no_monitoring comment
slurm_comment = str(slurm_env['comment'])
slurm_comment = str(slurm_env['JOB_RECORD__COMMENT'])
if 'no_monitoring' in slurm_comment:
monitoring_on = 0
else:
......@@ -96,115 +96,63 @@ def main(job_id, debug_path, env_file, force):
# print result as return value of the script
print(monitoring_on)
def save_job_env(env_file, slurm_env, connection, debug_file):
# set default values
start_time = 0
work_dir = "'n/a'"
exclusive = 0
partition_name = "'n/a'"
num_cpus = 0
cpu_allocated = "'n/a'"
job_name = "'n/a'"
walltime = 0
walltime_formatted = "'n/a'"
cpu_allocated = "'n/a'"
job_array_id = "None"
account = "'n/a'"
num_cores = 0
job_user = None
if "SLURM_JOB_USER" in os.environ:
job_user = os.environ["SLURM_JOB_USER"]
# get selected job information
if slurm_env:
start_time = str(slurm_env['start_time'])
partition_name = str(slurm_env['partition'])
nodes_shared = str(slurm_env['shared'])
work_dir = str(slurm_env['work_dir'])
if nodes_shared == "OK":
total_cpus_allocated = int(slurm_env['num_cpus'])
node_count = int(slurm_env['num_nodes'])
#print "Shared: " + str(total_cpus_allocated / node_count) + " cpus per node on partition " + partition_name
# get partition data
partition_data_string = connection.get(partition_name)
partition_data = cPickle.loads(partition_data_string)
#partition_data = ast.literal_eval(str(partition_data_string))
try:
cpus_avail = int(partition_data['max_cpus_per_node'])
#print(str(partition_data))
except:
cpus_avail = -1
#print(str(partition_data))
if (total_cpus_allocated / node_count) == cpus_avail:
#print "Exclusive with " + str(total_cpus_allocated / node_count) + " cpus per node on partition " + partition_name
exclusive = 1
else:
#print "Exclusive flag set by user"
exclusive = 1
exclusive = str(slurm_env['JOB_RECORD__WHOLE_NODE'])
#determine number of CPUs (hardware threads)
num_cpus = int(slurm_env['JOB_RECORD__CPU_CNT'])
cpu_allocated = ''
if exclusive == 0:
for key, value in slurm_env['cpus_alloc_layout'].items():
cpu_allocated += str(key) + str('[') + str(list_to_ranges(value)) + str('],')
for idx, val in enumerate(slurm_env['JOB_RECORD__CPU_IDS']):
cpu_allocated += slurm_env['JOB_RECORD__NODE_NAMES'][idx] + str('[') + str(list_to_ranges(val)) + str('],')
#remove last comma from string
cpu_allocated = cpu_allocated[:-1]
#remove last comma from string
cpu_allocated = cpu_allocated[:-1]
try:
job_name = str(slurm_env['name'])
except:
job_name = 'corrupt'
# try:
# job_name = str(slurm_env['name'])
# except:
# job_name = 'corrupt'
#determine job user
if not job_user:
job_user = pwd.getpwuid(slurm_env['user_id']).pw_name
try:
walltime = slurm_env['time_limit']
walltime = slurm_env['JOB_RECORD__TIME_LIMIT']
except:
walltime = 0
#convert walltime from minutes to seconds
walltime *= 60
walltime_formatted = slurm_env['time_limit_str']
job_array_id = slurm_env['array_job_id']
#determine account
account = str(slurm_env['account'])
#determine number of cores
num_cores = str(slurm_env['num_cpus'])
#account = str(slurm_env['account'])
if env_file:
f = open(env_file,'w')
print( "#!/bin/bash", file=f )
print( "export PIKA_JOB_START=" + str(start_time), file=f )
print( "export PIKA_JOB_USER=" + str(job_user), file=f )
print( "export PIKA_JOB_EXCLUSIVE=" + str(exclusive), file=f )
print( "export PIKA_JOB_PARTITION=" + partition_name, file=f )
print( "export PIKA_JOB_NUM_CORES=" + str(num_cpus), file=f )
print( "export PIKA_JOB_CPUS_ALLOCATED=" + str(cpu_allocated), file=f )
print( "export PIKA_JOB_NAME='" + job_name + "'", file=f )
print( "export PIKA_JOB_WALLTIME=" + str(walltime), file=f )
print( "export PIKA_JOB_WALLTIME_FORMATTED=" + str(walltime_formatted), file=f )
print( "export PIKA_JOB_CPUS_ALLOCATED=" + str(cpu_allocated), file=f )
print( "export PIKA_JOB_ARRAY_ID=" + str(job_array_id), file=f )
print( "export PIKA_JOB_ACCOUNT=" + str(account), file=f )
print( "export PIKA_JOB_NUM_CORES=" + str(num_cores), file=f )
print( "export PIKA_JOB_WORK_DIR=" + str(work_dir), file=f )
f.close()
if debug_file:
debug_file.write("\nReservation: " + str(exclusive))
debug_file.write("\nPartition: " + partition_name)
debug_file.write("\nJob Name: " + str(slurm_env['name']))
debug_file.write("\nWalltime: " + str(slurm_env['time_limit']))
#debug_file.write("\n" + str(cpu_allocated))
debug_file.write("\n\n")
debug_file.close()
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment