diff --git a/doc.zih.tu-dresden.de/docs/software/misc/pika_download_jobdata.png b/doc.zih.tu-dresden.de/docs/software/misc/pika_download_jobdata.png new file mode 100644 index 0000000000000000000000000000000000000000..cbfa3267eab512e2ffc008ba49028c76ff572586 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/pika_download_jobdata.png differ diff --git a/doc.zih.tu-dresden.de/docs/software/pika.md b/doc.zih.tu-dresden.de/docs/software/pika.md index 117f8d00c635666c5938229630a8a31d5a75b310..38007fbf1a8149663a99f64e76bca1e03862773c 100644 --- a/doc.zih.tu-dresden.de/docs/software/pika.md +++ b/doc.zih.tu-dresden.de/docs/software/pika.md @@ -58,14 +58,15 @@ PIKA provides the following runtime metrics: |Metric| Hardware Unit| Sampling Frequency| |---|---|---:| -|CPU Usage|CPU core|30s| -|IPC (instructions per cycle)|CPU core|60s| -|FLOPS (normalized to single precision) |CPU core|60s| +|CPU Usage|CPU core (average across hardware threads)|30s| +|IPC (instructions per cycle)|CPU core (sum over hardware threads)|60s| +|FLOPS (normalized to single precision) |CPU core (sum over hardware threads)|60s| |Main Memory Bandwidth|CPU socket|60s| |CPU Power|CPU socket|60s| |Main Memory Utilization|node|30s| |I/O Bandwidth (local, Lustre) |node|30s| |I/O Metadata (local, Lustre) |node|30s| +|Network Bandwidth|node|30s| |GPU Usage|GPU device|30s| |GPU Memory Utilization|GPU device|30s| |GPU Power Consumption|GPU device|30s| @@ -92,11 +93,12 @@ The sampling frequency cannot be changed by the user. If the current partition supports simultaneous multithreading (SMT) the maximum number of hardware threads per physical core is displayed in the SMT column. The Slurm configuration on ZIH systems disables SMT by default. Therefore, in the example below, only a maximum CPU usage of 0.5 can be -achieved, since PIKA combines two hardware threads per physical core. If you want to use SMT, you -must set the Slurm environment variable `SLURM_HINT=multithread`. In this case, `srun` distributes -the tasks to all available hardware threads, thus a CPU usage of 1 can be reached. However, the SMT -configuration only refers to the `srun` command. For single node jobs without `srun` command the -tasks are automatically distributed to all available hardware threads. +achieved, as PIKA determines the average value over two hardware threads per physical core. +If you want to use SMT, you must set the Slurm environment variable `SLURM_HINT=multithread`. +In this case, `srun` distributes the tasks to all available hardware threads, thus a CPU usage of 1 +can be reached. However, the SMT configuration only refers to the `srun` command. For single node +jobs without `srun` command the tasks are automatically distributed to all available hardware +threads.  {: align="center"} @@ -105,7 +107,8 @@ tasks are automatically distributed to all available hardware threads. To reduce the amount of recorded data, PIKA summarizes per hardware thread metrics to the corresponding physical core. In terms of simultaneous multithreading (SMT), PIKA only provides - performance data per physical core. + performance data per physical core. For CPU usage, the average value per measurement point across + all hardware threads is calculated, while for IPC and FLOPS, the sum per measurement point is determined. The following table explains different timeline visualization modes. By default, each timeline shows the average value over all hardware units (HUs) per measured @@ -128,6 +131,34 @@ To identify imbalances between HUs over time, the visualization modes *Best* and first indicator how much the HUs differ in terms of resource usage. The timelines *Best* and *Lowest* show the recorded performance data of the best/lowest average HU over time. +!!! note "More Details" + + If you want to conduct further analysis, you can download the job data as json-file(s) via the + button in the top right section: + + { align=left} + The options are + + - Metadata: Data shown in table (project, start, end, ...), jobscript, min/max/mean statistics + - Performance Data: Data records of all metrics of every distinct device (CPU cores, GPUs, ...) + - Cluster Data: Metadata of used partition + + <br> + + ??? example "Example: Visualize every CPU core that was allocated for the Job" + + ```python + #in JupyterLab/Jupyter Notebook, using pandas and matplotlib + #download the "Performance Data" and save as "jobdata.json" + + %pylab widget + from pandas import read_json + + data = read_json('/tmp/jobdata.json', lines=True) + for cpu in data['cpu_used'][0]['core']['series']: + plot(cpu['data'], lw=0.5) + ``` + ## Footprint Visualization Complementary to the timeline visualization of one specific job, statistics on metadata and @@ -160,13 +191,6 @@ flags in the job script: **Note:** Disabling PIKA monitoring is possible only for exclusive jobs! -## Known Issues - -The PIKA metric FLOPS is not supported by the Intel Haswell CPU architecture. -However, PIKA provides this metric to show the computational intensity. -**Do not rely on FLOPS on Haswell!** We use the event `AVX_INSTS_CALC` which counts the `insertf128` -instruction. - ## Case Studies ### Idle CPUs