Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
hpc-compendium
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ZIH
hpcsupport
hpc-compendium
Commits
627f8624
Commit
627f8624
authored
8 months ago
by
Martin Schroschk
Browse files
Options
Downloads
Plain Diff
Merge branch 'update-pika' into 'preview'
update pika.md See merge request
zih/hpcsupport/hpc-compendium!1087
parents
85b9a558
c46d22bd
No related branches found
No related tags found
3 merge requests
!1103
Automated merge from preview to main
,
!1102
Automated merge from preview to main
,
!1087
update pika.md
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
doc.zih.tu-dresden.de/docs/software/misc/pika_download_jobdata.png
+0
-0
0 additions, 0 deletions
...u-dresden.de/docs/software/misc/pika_download_jobdata.png
doc.zih.tu-dresden.de/docs/software/pika.md
+40
-16
40 additions, 16 deletions
doc.zih.tu-dresden.de/docs/software/pika.md
with
40 additions
and
16 deletions
doc.zih.tu-dresden.de/docs/software/misc/pika_download_jobdata.png
0 → 100644
+
0
−
0
View file @
627f8624
10.3 KiB
This diff is collapsed.
Click to expand it.
doc.zih.tu-dresden.de/docs/software/pika.md
+
40
−
16
View file @
627f8624
...
...
@@ -58,14 +58,15 @@ PIKA provides the following runtime metrics:
|Metric| Hardware Unit| Sampling Frequency|
|---|---|---:|
|CPU Usage|CPU core|30s|
|IPC (instructions per cycle)|CPU core|60s|
|FLOPS (normalized to single precision) |CPU core|60s|
|CPU Usage|CPU core
(average across hardware threads)
|30s|
|IPC (instructions per cycle)|CPU core
(sum over hardware threads)
|60s|
|FLOPS (normalized to single precision) |CPU core
(sum over hardware threads)
|60s|
|Main Memory Bandwidth|CPU socket|60s|
|CPU Power|CPU socket|60s|
|Main Memory Utilization|node|30s|
|I/O Bandwidth (local, Lustre) |node|30s|
|I/O Metadata (local, Lustre) |node|30s|
|Network Bandwidth|node|30s|
|GPU Usage|GPU device|30s|
|GPU Memory Utilization|GPU device|30s|
|GPU Power Consumption|GPU device|30s|
...
...
@@ -92,11 +93,12 @@ The sampling frequency cannot be changed by the user.
If the current partition supports simultaneous multithreading (SMT) the maximum number of hardware
threads per physical core is displayed in the SMT column. The Slurm configuration on ZIH systems
disables SMT by default. Therefore, in the example below, only a maximum CPU usage of 0.5 can be
achieved, since PIKA combines two hardware threads per physical core. If you want to use SMT, you
must set the Slurm environment variable
`SLURM_HINT=multithread`
. In this case,
`srun`
distributes
the tasks to all available hardware threads, thus a CPU usage of 1 can be reached. However, the SMT
configuration only refers to the
`srun`
command. For single node jobs without
`srun`
command the
tasks are automatically distributed to all available hardware threads.
achieved, as PIKA determines the average value over two hardware threads per physical core.
If you want to use SMT, you must set the Slurm environment variable
`SLURM_HINT=multithread`
.
In this case,
`srun`
distributes the tasks to all available hardware threads, thus a CPU usage of 1
can be reached. However, the SMT configuration only refers to the
`srun`
command. For single node
jobs without
`srun`
command the tasks are automatically distributed to all available hardware
threads.

{: align="center"}
...
...
@@ -105,7 +107,8 @@ tasks are automatically distributed to all available hardware threads.
To reduce the amount of recorded data, PIKA summarizes per hardware thread metrics to the
corresponding physical core. In terms of simultaneous multithreading (SMT), PIKA only provides
performance data per physical core.
performance data per physical core. For CPU usage, the average value per measurement point across
all hardware threads is calculated, while for IPC and FLOPS, the sum per measurement point is determined.
The following table explains different timeline visualization modes.
By default, each timeline shows the average value over all hardware units (HUs) per measured
...
...
@@ -128,6 +131,34 @@ To identify imbalances between HUs over time, the visualization modes *Best* and
first indicator how much the HUs differ in terms of resource usage. The timelines
*Best*
and
*Lowest*
show the recorded performance data of the best/lowest average HU over time.
!!! note "More Details"
If you want to conduct further analysis, you can download the job data as json-file(s) via the
button in the top right section:
{ align=left}
The options are
- Metadata: Data shown in table (project, start, end, ...), jobscript, min/max/mean statistics
- Performance Data: Data records of all metrics of every distinct device (CPU cores, GPUs, ...)
- Cluster Data: Metadata of used partition
<br>
??? example "Example: Visualize every CPU core that was allocated for the Job"
```python
#in JupyterLab/Jupyter Notebook, using pandas and matplotlib
#download the "Performance Data" and save as "jobdata.json"
%pylab widget
from pandas import read_json
data = read_json('/tmp/jobdata.json', lines=True)
for cpu in data['cpu_used'][0]['core']['series']:
plot(cpu['data'], lw=0.5)
```
## Footprint Visualization
Complementary to the timeline visualization of one specific job, statistics on metadata and
...
...
@@ -160,13 +191,6 @@ flags in the job script:
**Note:**
Disabling PIKA monitoring is possible only for exclusive jobs!
## Known Issues
The PIKA metric FLOPS is not supported by the Intel Haswell CPU architecture.
However, PIKA provides this metric to show the computational intensity.
**Do not rely on FLOPS on Haswell!**
We use the event
`AVX_INSTS_CALC`
which counts the
`insertf128`
instruction.
## Case Studies
### Idle CPUs
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment