Holger Brunst · 8b9e7178 · 2af95a1c · 5f932f45 · 772fcb2c · 400ee273
--- a/doc.zih.tu-dresden.de/docs/software/performance_engineering_overview.md

+ 49

− 44
+++ b/doc.zih.tu-dresden.de/docs/software/performance_engineering_overview.md

+ 49

− 44
 @@ -7,7 +7,7 @@
 Performance engineering encompasses the techniques applied during a systems development life cycle
 to ensure the non-functional requirements for performance (such as throughput, latency, or memory
 usage) will be met.
-Often it is also referred to as systems performance engineering within systems engineering, and
+Often, it is also referred to as systems performance engineering within systems engineering, and
 software performance engineering or application performance engineering within software engineering
 [[Wikipedia]](https://en.wikipedia.org/wiki/Performance_engineering).

 @@ -64,53 +64,57 @@ interrupt-driven sampling during run time.

 !!! note "During measurement, raw performance data is collected"

-When an instrumented application is executed, the additional commands introduced during the
+When an instrumented application is executed, the additional instructions introduced during the
 instrumentation phase collect and record the data required to evaluate the performance properties
 of the code.
 Unfortunately, the measurement itself has a certain influence on the performance of the instrumented
 code.
 Whether the perturbations introduced have a significant effect on the behavior depends on the
 specific structure of the code to be investigated.
-In many cases the perturbations will be rather small so that the overall results can be considered
+In many cases, the perturbations will be rather small, so that the overall results can be considered
 to be a realistic approximation of the corresponding properties of the non-instrumented code.
 Yet, it is always advisable to compare the runtime of instrumented applications with their original
 non-instrumented counterpart.

 #### Profile

-!!! hint "Performance profiles hold aggregated data (e.g. time spent in function `foo()`)"
+!!! hint "Performance profiles hold aggregated data (e.g. total time spent in function `foo()`)"

 A performance profile provides aggregated metrics like _time_ or _number of calls_ for a list of
 functions, loops or similar as depicted in the following table:

-| Function | Time (s) | Calls | Time (%) |
-|----------|---------:|------:|---------:|
-| `main()` |   2s     | 1     | 1        |
-| `foo()`  |  80s     | 100   | 40       |
-| `bar()`  | 118s     | 9000  | 59       |
+| Function | Total Time | Calls | Percentage |
+|----------|-----------:|------:|-----------:|
+| `main()` |        2 s |     1 |       1%   |
+| `foo()`  |       80 s |   100 |      40%   |
+| `bar()`  |      118 s |  9000 |      59%   |

 #### Trace

-!!! hint "Traces consist of a sorted list of timed application events/samples (e.g. enter function `foo()`
-at 0.11 s)"
+!!! hint

-In contrast to performance [profiles](profile) performance traces consist of individual application
-samples or events that are recorded with a timestamp.
+    Traces consist of a sorted list of timed application events/samples (e.g. enter function
+    `foo()` at 0.11 s).
+
+In contrast to performance [profiles](#profile), performance traces consist of individual
+application samples or events that are recorded with a timestamp.
 A trace that corresponds to the profile recording above could look as follows:

-| Timestamp | Data Type   | Parameter       |
-|--------|----------------|-----------------|
-|  0.1s  | Enter Function | `main()`        |
-|  0.11s | Enter Function | `foo()`         |
-|  0.12s | Enter Function | `bar()`         |
-|  0.15s | Exit Function  | `bar()`         |
-|  0.16s | Enter Function | `bar()`         |
-|  0.17s | Leave Function | `bar()`         |
-|        | _many more events..._  |         |
-| 200.0s | Exit Function  | `main()`        |
+| Timestamp | Data Type      | Parameter       |
+|----------:|----------------|-----------------|
+|    0.10 s | Enter Function | `main()`        |
+|    0.11 s | Enter Function | `foo()`         |
+|    0.12 s | Enter Function | `bar()`         |
+|    0.15 s | Exit Function  | `bar()`         |
+|    0.16 s | Enter Function | `bar()`         |
+|    0.17 s | Exit Function  | `bar()`         |
+|           | _many more events..._  |         |
+|  200.00 s | Exit Function  | `main()`        |
+
+!!! hint

-!!! hint "Traces enable more sophisticated analysis at the cost of potentially very large amounts
-of raw data"
+    Traces enable more sophisticated analysis at the cost of potentially very large amounts of raw
+    data.

 Apparently, the size of a performance trace depends on the recorded time whereas a profile does not.
 Likewise, a trace can tell you when a specific action in your application happened whereas a profile
 @@ -125,15 +129,15 @@ name it) to derive meaningful, well-defined performance metrics like data rates,
 performance events of interest, etc.
 This step is typically hidden to the user and taken care of automatically once the raw data was
 collected.
-Some tools however provide an independent analysis front-end that allows specifying the type of
+Some tools, however, provide an independent analysis front-end that allows specifying the type of
 analysis to carry out on the raw data.

 ### Presentation

 !!! note "Presenting performance metrics graphically fosters human intuition"

-After processing the raw performance data the resulting metrics are usually presented in the form of
-a report that makes use of tables or charts known from programs like Excel.
+After processing the raw performance data, the resulting metrics are usually presented in the form
+of a report that makes use of tables or charts known from programs like Excel.
 In this step the reduction of the data complexity simplifies the evaluation of the data by software
 developers.
 Yet, data reductions have the potential to hide important facts or details.
 @@ -156,7 +160,7 @@ restarted from beginning.

 At ZIH, the following performance engineering tools are installed and maintained:

-### [lo2s](lo2s.md)
+### lo2s

 !!! hint "Easy to use application and system performance trace recorder supporting Vampir"

 @@ -170,10 +174,12 @@ See [lo2s](lo2s.md) for further details.
 Once the data have been recorded the tool [Vampir](vampir.md) needs to be invoked to study the data
 graphically.

-### [MUST](mpi_usage_error_detection.md)
+### MUST
+
+!!! hint

-!!! hint "Advanced communication error detection for applications using the Message Passing Interface
-(MPI) standard"
+    Advanced communication error detection for applications using the Message Passing Interface
+    (MPI) standard.

 [MUST](mpi_usage_error_detection.md) checks your application for communication errors if the MPI
 library is used.
 @@ -181,16 +187,15 @@ It does not require any [instrumentation](#instrumentation).
 The checks of a given MPI application are done by simply replacing `srun` with `mustrun` when the
 application is started.
 The data analysis of the fixed metrics is fully integrated and does not require any user actions.
-The correctness results are written to a html-formatted output file which can be inspected with a
+The correctness results are written to an HTML-formatted output file, which can be inspected with a
 web browser.
-See [MUST](mpi_usage_error_detection.md) for further details.

-### [PAPI](papi.md)
+### PAPI

 !!! hint "Portable reading of CPU performance metrics like FLOPS"

 The [PAPI](papi.md) library allows software developers to read CPU performance counters in a
-platform independent way.
+platform-independent way.
 Native usage of the library requires to manually [instrument](#instrumentation) an application by
 adding library calls to the source code of the application under investigation.
 Data [measurement](#measurement) happens whenever the PAPI library is called.
 @@ -199,7 +204,7 @@ Software developers have to process the data by themselves to obtain meaningful
 Tools like [Score-P](#score-p) have built-in support for PAPI.
 Therefore, native usage of the PAPI library is usually not needed.

-### [Perf Tools](perf_tools.md)
+### Perf Tools

 !!! hint "Easy to use Linux-integrated performance data recording and analysis"

 @@ -209,11 +214,11 @@ The [measurement](#measurement) of a given application is done by simply prefixi
 executable with `perf`.
 Perf has two modes of operation (`perf stat`, `perf record`), which both record [profile](#profile)
 raw data.
-While the first mode is very basic the second mode records more data.
-Use `perf report` to analyze the raw output data of `perf record` and produces a performance report.
+While the first mode is very basic, the second mode records more data.
+Use `perf report` to analyze the raw output data of `perf record` and produce a performance report.
 See [Linux perf](perf_tools.md) for further details.

-### [PIKA](pika.md)
+### PIKA

 !!! hint "Very easy to use performance visualization of entire batch jobs"

 @@ -227,7 +232,7 @@ actions.
 Performance metrics are accessible via the
 [PIKA web service](https://selfservice.tu-dresden.de/services/1663599/).

-### [Score-P](scorep.md)
+### Score-P

 !!! hint "Complex and powerful performance data recording and analysis of parallel applications"

 @@ -242,20 +247,20 @@ Many raw data sources are supported by Score-P.
 It requires some time, training, and practice to fully benefit from the tool's features.
 See [Score-P](scorep.md) for further details.

-### [Slurm Profiler](../jobs_and_resources/slurm_profiling.md)
+### Slurm Profiler

 !!! hint "Easy to use performance visualization of entire batch jobs"

 The [Slurm Profiler](../jobs_and_resources/slurm_profiling.md) gathers performance data from every
 task/node of a given [batch job](../jobs_and_resources/slurm.md).
-It records a coarse grained [trace](#trace) for subsequent analysis.
+It records a coarse-grained [trace](#trace) for subsequent analysis.
 [Instrumentation](#instrumentation) of the applications under test is not needed.
 The data analysis of the given set of system metrics needs to be initiated by the user with a
 command line interface.
 The resulting performance metrics are accessible in a simple graphical front-end that provides
 time/performance graphs.

-### [Vampir](vampir.md)
+### Vampir

 !!! hint "Complex and powerful performance data visualization of parallel applications"