Skip to content
Snippets Groups Projects
Commit 627f8624 authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Merge branch 'update-pika' into 'preview'

update pika.md

See merge request zih/hpcsupport/hpc-compendium!1087
parents 85b9a558 c46d22bd
No related branches found
No related tags found
3 merge requests!1103Automated merge from preview to main,!1102Automated merge from preview to main,!1087update pika.md
doc.zih.tu-dresden.de/docs/software/misc/pika_download_jobdata.png

10.3 KiB

......@@ -58,14 +58,15 @@ PIKA provides the following runtime metrics:
|Metric| Hardware Unit| Sampling Frequency|
|---|---|---:|
|CPU Usage|CPU core|30s|
|IPC (instructions per cycle)|CPU core|60s|
|FLOPS (normalized to single precision) |CPU core|60s|
|CPU Usage|CPU core (average across hardware threads)|30s|
|IPC (instructions per cycle)|CPU core (sum over hardware threads)|60s|
|FLOPS (normalized to single precision) |CPU core (sum over hardware threads)|60s|
|Main Memory Bandwidth|CPU socket|60s|
|CPU Power|CPU socket|60s|
|Main Memory Utilization|node|30s|
|I/O Bandwidth (local, Lustre) |node|30s|
|I/O Metadata (local, Lustre) |node|30s|
|Network Bandwidth|node|30s|
|GPU Usage|GPU device|30s|
|GPU Memory Utilization|GPU device|30s|
|GPU Power Consumption|GPU device|30s|
......@@ -92,11 +93,12 @@ The sampling frequency cannot be changed by the user.
If the current partition supports simultaneous multithreading (SMT) the maximum number of hardware
threads per physical core is displayed in the SMT column. The Slurm configuration on ZIH systems
disables SMT by default. Therefore, in the example below, only a maximum CPU usage of 0.5 can be
achieved, since PIKA combines two hardware threads per physical core. If you want to use SMT, you
must set the Slurm environment variable `SLURM_HINT=multithread`. In this case, `srun` distributes
the tasks to all available hardware threads, thus a CPU usage of 1 can be reached. However, the SMT
configuration only refers to the `srun` command. For single node jobs without `srun` command the
tasks are automatically distributed to all available hardware threads.
achieved, as PIKA determines the average value over two hardware threads per physical core.
If you want to use SMT, you must set the Slurm environment variable `SLURM_HINT=multithread`.
In this case, `srun` distributes the tasks to all available hardware threads, thus a CPU usage of 1
can be reached. However, the SMT configuration only refers to the `srun` command. For single node
jobs without `srun` command the tasks are automatically distributed to all available hardware
threads.
![SMT Mode](misc/pika_smt_2.png)
{: align="center"}
......@@ -105,7 +107,8 @@ tasks are automatically distributed to all available hardware threads.
To reduce the amount of recorded data, PIKA summarizes per hardware thread metrics to the
corresponding physical core. In terms of simultaneous multithreading (SMT), PIKA only provides
performance data per physical core.
performance data per physical core. For CPU usage, the average value per measurement point across
all hardware threads is calculated, while for IPC and FLOPS, the sum per measurement point is determined.
The following table explains different timeline visualization modes.
By default, each timeline shows the average value over all hardware units (HUs) per measured
......@@ -128,6 +131,34 @@ To identify imbalances between HUs over time, the visualization modes *Best* and
first indicator how much the HUs differ in terms of resource usage. The timelines *Best* and
*Lowest* show the recorded performance data of the best/lowest average HU over time.
!!! note "More Details"
If you want to conduct further analysis, you can download the job data as json-file(s) via the
button in the top right section:
![Downlaod Jobdata](misc/pika_download_jobdata.png){ align=left}
The options are
- Metadata: Data shown in table (project, start, end, ...), jobscript, min/max/mean statistics
- Performance Data: Data records of all metrics of every distinct device (CPU cores, GPUs, ...)
- Cluster Data: Metadata of used partition
<br>
??? example "Example: Visualize every CPU core that was allocated for the Job"
```python
#in JupyterLab/Jupyter Notebook, using pandas and matplotlib
#download the "Performance Data" and save as "jobdata.json"
%pylab widget
from pandas import read_json
data = read_json('/tmp/jobdata.json', lines=True)
for cpu in data['cpu_used'][0]['core']['series']:
plot(cpu['data'], lw=0.5)
```
## Footprint Visualization
Complementary to the timeline visualization of one specific job, statistics on metadata and
......@@ -160,13 +191,6 @@ flags in the job script:
**Note:** Disabling PIKA monitoring is possible only for exclusive jobs!
## Known Issues
The PIKA metric FLOPS is not supported by the Intel Haswell CPU architecture.
However, PIKA provides this metric to show the computational intensity.
**Do not rely on FLOPS on Haswell!** We use the event `AVX_INSTS_CALC` which counts the `insertf128`
instruction.
## Case Studies
### Idle CPUs
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment