Skip to content
Snippets Groups Projects

Added initial complete version of new performance engineering overview document.

Merged Holger Brunst requested to merge 290-new-page-overview-performance-analysis into preview
Compare and Show latest version
1 file
+ 49
44
Compare changes
  • Side-by-side
  • Inline
@@ -7,7 +7,7 @@
Performance engineering encompasses the techniques applied during a systems development life cycle
to ensure the non-functional requirements for performance (such as throughput, latency, or memory
usage) will be met.
Often it is also referred to as systems performance engineering within systems engineering, and
Often, it is also referred to as systems performance engineering within systems engineering, and
software performance engineering or application performance engineering within software engineering
[[Wikipedia]](https://en.wikipedia.org/wiki/Performance_engineering).
@@ -64,53 +64,57 @@ interrupt-driven sampling during run time.
!!! note "During measurement, raw performance data is collected"
When an instrumented application is executed, the additional commands introduced during the
When an instrumented application is executed, the additional instructions introduced during the
instrumentation phase collect and record the data required to evaluate the performance properties
of the code.
Unfortunately, the measurement itself has a certain influence on the performance of the instrumented
code.
Whether the perturbations introduced have a significant effect on the behavior depends on the
specific structure of the code to be investigated.
In many cases the perturbations will be rather small so that the overall results can be considered
In many cases, the perturbations will be rather small, so that the overall results can be considered
to be a realistic approximation of the corresponding properties of the non-instrumented code.
Yet, it is always advisable to compare the runtime of instrumented applications with their original
non-instrumented counterpart.
#### Profile
!!! hint "Performance profiles hold aggregated data (e.g. time spent in function `foo()`)"
!!! hint "Performance profiles hold aggregated data (e.g. total time spent in function `foo()`)"
A performance profile provides aggregated metrics like _time_ or _number of calls_ for a list of
functions, loops or similar as depicted in the following table:
| Function | Time (s) | Calls | Time (%) |
|----------|---------:|------:|---------:|
| `main()` | 2s | 1 | 1 |
| `foo()` | 80s | 100 | 40 |
| `bar()` | 118s | 9000 | 59 |
| Function | Total Time | Calls | Percentage |
|----------|-----------:|------:|-----------:|
| `main()` | 2 s | 1 | 1% |
| `foo()` | 80 s | 100 | 40% |
| `bar()` | 118 s | 9000 | 59% |
#### Trace
!!! hint "Traces consist of a sorted list of timed application events/samples (e.g. enter function `foo()`
at 0.11 s)"
!!! hint
In contrast to performance [profiles](profile) performance traces consist of individual application
samples or events that are recorded with a timestamp.
Traces consist of a sorted list of timed application events/samples (e.g. enter function
`foo()` at 0.11 s).
In contrast to performance [profiles](#profile), performance traces consist of individual
application samples or events that are recorded with a timestamp.
A trace that corresponds to the profile recording above could look as follows:
| Timestamp | Data Type | Parameter |
|--------|----------------|-----------------|
| 0.1s | Enter Function | `main()` |
| 0.11s | Enter Function | `foo()` |
| 0.12s | Enter Function | `bar()` |
| 0.15s | Exit Function | `bar()` |
| 0.16s | Enter Function | `bar()` |
| 0.17s | Leave Function | `bar()` |
| | _many more events..._ | |
| 200.0s | Exit Function | `main()` |
| Timestamp | Data Type | Parameter |
|----------:|----------------|-----------------|
| 0.10 s | Enter Function | `main()` |
| 0.11 s | Enter Function | `foo()` |
| 0.12 s | Enter Function | `bar()` |
| 0.15 s | Exit Function | `bar()` |
| 0.16 s | Enter Function | `bar()` |
| 0.17 s | Exit Function | `bar()` |
| | _many more events..._ | |
| 200.00 s | Exit Function | `main()` |
!!! hint
!!! hint "Traces enable more sophisticated analysis at the cost of potentially very large amounts
of raw data"
Traces enable more sophisticated analysis at the cost of potentially very large amounts of raw
data.
Apparently, the size of a performance trace depends on the recorded time whereas a profile does not.
Likewise, a trace can tell you when a specific action in your application happened whereas a profile
@@ -125,15 +129,15 @@ name it) to derive meaningful, well-defined performance metrics like data rates,
performance events of interest, etc.
This step is typically hidden to the user and taken care of automatically once the raw data was
collected.
Some tools however provide an independent analysis front-end that allows specifying the type of
Some tools, however, provide an independent analysis front-end that allows specifying the type of
analysis to carry out on the raw data.
### Presentation
!!! note "Presenting performance metrics graphically fosters human intuition"
After processing the raw performance data the resulting metrics are usually presented in the form of
a report that makes use of tables or charts known from programs like Excel.
After processing the raw performance data, the resulting metrics are usually presented in the form
of a report that makes use of tables or charts known from programs like Excel.
In this step the reduction of the data complexity simplifies the evaluation of the data by software
developers.
Yet, data reductions have the potential to hide important facts or details.
@@ -156,7 +160,7 @@ restarted from beginning.
At ZIH, the following performance engineering tools are installed and maintained:
### [lo2s](lo2s.md)
### lo2s
!!! hint "Easy to use application and system performance trace recorder supporting Vampir"
@@ -170,10 +174,12 @@ See [lo2s](lo2s.md) for further details.
Once the data have been recorded the tool [Vampir](vampir.md) needs to be invoked to study the data
graphically.
### [MUST](mpi_usage_error_detection.md)
### MUST
!!! hint
!!! hint "Advanced communication error detection for applications using the Message Passing Interface
(MPI) standard"
Advanced communication error detection for applications using the Message Passing Interface
(MPI) standard.
[MUST](mpi_usage_error_detection.md) checks your application for communication errors if the MPI
library is used.
@@ -181,16 +187,15 @@ It does not require any [instrumentation](#instrumentation).
The checks of a given MPI application are done by simply replacing `srun` with `mustrun` when the
application is started.
The data analysis of the fixed metrics is fully integrated and does not require any user actions.
The correctness results are written to a html-formatted output file which can be inspected with a
The correctness results are written to an HTML-formatted output file, which can be inspected with a
web browser.
See [MUST](mpi_usage_error_detection.md) for further details.
### [PAPI](papi.md)
### PAPI
!!! hint "Portable reading of CPU performance metrics like FLOPS"
The [PAPI](papi.md) library allows software developers to read CPU performance counters in a
platform independent way.
platform-independent way.
Native usage of the library requires to manually [instrument](#instrumentation) an application by
adding library calls to the source code of the application under investigation.
Data [measurement](#measurement) happens whenever the PAPI library is called.
@@ -199,7 +204,7 @@ Software developers have to process the data by themselves to obtain meaningful
Tools like [Score-P](#score-p) have built-in support for PAPI.
Therefore, native usage of the PAPI library is usually not needed.
### [Perf Tools](perf_tools.md)
### Perf Tools
!!! hint "Easy to use Linux-integrated performance data recording and analysis"
@@ -209,11 +214,11 @@ The [measurement](#measurement) of a given application is done by simply prefixi
executable with `perf`.
Perf has two modes of operation (`perf stat`, `perf record`), which both record [profile](#profile)
raw data.
While the first mode is very basic the second mode records more data.
Use `perf report` to analyze the raw output data of `perf record` and produces a performance report.
While the first mode is very basic, the second mode records more data.
Use `perf report` to analyze the raw output data of `perf record` and produce a performance report.
See [Linux perf](perf_tools.md) for further details.
### [PIKA](pika.md)
### PIKA
!!! hint "Very easy to use performance visualization of entire batch jobs"
@@ -227,7 +232,7 @@ actions.
Performance metrics are accessible via the
[PIKA web service](https://selfservice.tu-dresden.de/services/1663599/).
### [Score-P](scorep.md)
### Score-P
!!! hint "Complex and powerful performance data recording and analysis of parallel applications"
@@ -242,20 +247,20 @@ Many raw data sources are supported by Score-P.
It requires some time, training, and practice to fully benefit from the tool's features.
See [Score-P](scorep.md) for further details.
### [Slurm Profiler](../jobs_and_resources/slurm_profiling.md)
### Slurm Profiler
!!! hint "Easy to use performance visualization of entire batch jobs"
The [Slurm Profiler](../jobs_and_resources/slurm_profiling.md) gathers performance data from every
task/node of a given [batch job](../jobs_and_resources/slurm.md).
It records a coarse grained [trace](#trace) for subsequent analysis.
It records a coarse-grained [trace](#trace) for subsequent analysis.
[Instrumentation](#instrumentation) of the applications under test is not needed.
The data analysis of the given set of system metrics needs to be initiated by the user with a
command line interface.
The resulting performance metrics are accessible in a simple graphical front-end that provides
time/performance graphs.
### [Vampir](vampir.md)
### Vampir
!!! hint "Complex and powerful performance data visualization of parallel applications"
Loading