Newer
Older
The framework of operators for OTF2 modification (FROOM) can help, when one has one or more trace archives and does not only want to view them in Vampir, but make modifications. This could have several reasons:
- the measurement infrastructure did not support a unified measurement, so creating one trace archive from many is required
- the trace archive should be made smaller because it should be shared with someone
- the trace archive contains irrelevant or private information which should be removed
- some external data should be incorporated into a trace
- some data should be extracted
Other tools provide only some of the necessary functionality to do the tasks mentioned above, whereas FROOM let's you express a complex pipeline of trace archive modifications that is then used to create a new trace archive. Because the output is again a trace archive, the same tools can be used to analyse and view it as before the use of FROOM. The DSL is easy to learn and apply to your set of trace archives.
git clone git@gitlab.hrz.tu-chemnitz.de:s2817051--tu-dresden.de/froom.git
Install the dependencies first:
sudo apt-get install flex bison libcsv-dev libjansson-dev
```
and download the latest OTF2 library from https://perftools.pages.jsc.fz-juelich.de/cicd/otf2
For the installation of OTF2, you need the Python modules `six` and `jinja2`, which can be installed using:
```bash
sudo apt-get install python3-six python3-jinja2
```
After that, configure the OTF2 installation: Please use option `--verbose` to see whether `Python for generator` support is enabled. If it is, you should see a line similar to this:
```
Python for generator: yes, using /usr/bin/python3 with 'jinja2' module in version 3.1.2
```
Proceed using `make` as usual.
Once OTF2 is installed, use the following commands to build the `froom-interpreter`:
make OTF2_BASE= #put the path to the root folder of the OTF2 installation here
```
From now on, it should be sufficient to update to the most recent version by using:
```bash
git pull
### Operator overview
The list of operators in alphabetical order:
| Name | Description | Link | Command line program |
| ---- | ----------- | ---- | -------------------- |
| ChromeTraceSource | Transform a trace collection that was recorded for viewing in Chrome web browser into OTF2 | [Chrome trace example](#transforming-a-trace-recorded-for-viewing-in-chrome-web-browser-into-otf2) | froom-from-chrome-trace |
| CSVMetricSource | Transform metric data contained in a CSV file into OTF2 trace data | [CSV example](#add-data-from-a-csv-file-to-a-trace) | froom-from-csv-metric |
| LocationRemover | Remove one or more location based on the number of events it has | [Location removing example](#remove-locations-from-a-trace-archive-based-on-the-number-of-events) | froom-remove-location |
| MPICommAdaptor | Add message events based on enter events of a specific format | [Messaging example](#merging-artificially-generated-mpi-events) | froom-merge-messages |

Jan Frenzel
committed
| OTF2Source | Read trace data from a trace archive | (used by most examples) | (part of most command line programs) |
| OTF2Sink | Write trace data to a trace archive | (used by all examples) | (part of all command line programs) |
| RegionRemover | Remove regions based on their names | [Region removing example](#remove-regions-from-a-trace-archive-based-on-their-name) | froom-remove-region |
| Renamer | Changes strings, i. e. region names | [Renaming example](#mergingunification-and-renaming-of-trace-data) | froom-rename |
| TimeSlicer | Creates a new trace archive containing only events from a particular time interval | [Time Slicing example](#cut-out-a-time-slice-from-an-existing-trace-archive) | froom-slice |
| Unifier | Merges/unifies traces of multiple archives into one | [Merging example](#mergingunification-and-renaming-of-trace-data) | froom-unify |
#### Merging/Unification (and Renaming) of Trace Data
In this example, it is assumed, that one application with two parallel processes was running, but the measurement environment lacked the support to collect the performance data in one trace archive. Thus, two trace archives are existing with a similar timestamp range. Without loss of generality, it is assumed that the application used a master/worker approach with one of the processes being the master, the other being the worker. The goal is to create a single trace archive containing the performance data from both trace archives. To make the two measured processes clearly distinguishable, the "Master thread" in each process should be renamed to "Master" or "Worker", respectively.
The task can be expressed in a file `unify-master-worker.froom`:
```
OTF2Source(master) -> Renamer("Master thread" -> "Master") -> newMaster;
OTF2Source(worker) -> Renamer("Master thread" -> "Worker") -> Unifier(newMaster) -> OTF2Sink(unified);
```
Now, the task can be applied to some files, e. g. `./traces/master/traces.otf2` and `./worker/traces.otf2`, from the command line:
```bash
$ froom-interpreter unify-master-worker.froom master=traces/master/traces.otf2 \
worker=traces/worker/traces.otf2 unified=traces/unified
```
The first argument to the interpreter specifies the task to be solved, further arguments can be given to resolve variables used in the task description.
#### Cut out a Time Slice from an existing trace archive
The task to retain only the trace data in the interval from 2 seconds to 10.1 seconds is described in `timeslice.froom`:
OTF2Source(in) -> TimeSlicer(from=2, to=10.1) -> OTF2Sink(out);
The new trace archive can be created like this:
$ froom-interpreter timeslice.froom in=traces/to-be-cut/traces.otf2 out=traces/final
```
#### Add data from a CSV file to a trace
The task is described in `csv-adder.froom`:
```
OTF2Source("traces/input/traces.otf2") -> trace;
// The time column contains relative timestamps (offset from some starting point) and we only need one metric:
CSVMetricSource("metrics.csv", itemseparator="\n", propertyseparator=",", time=Column(1),
tickspersecond=10000, referencetime="2023-02-10T15:48:54.123", metrics=(value=Column(2),
unit="B")) -> metric;
// The time column contains ISO-formatted timestamps, and we need 2 metrics:
CSVMetricSource("metrics.csv", itemseparator="\n", propertyseparator=",", isotime=Column(1),
metrics=(value=Column(2),unit="B"),(value=Column(3),unit="ms")) -> metric;
metric -> Unifier(trace) -> OTF2Sink("traces/final");
This time, the paths are contained in the script, so the new trace archive can be created using:
```bash
$ froom-interpreter csv-adder.froom
```
#### Remove locations from a trace archive based on the number of events
If locations need to removed that have only very few events that are meaningless to the analysis, put a script similar to this into `remove-locations.froom`:
```
OTF2Source("input/traces.otf2") -> LocationRemover(
eventCount <= /* any number that you find suitable: */ 42
```
The operation can be applied like this:
```bash
$ froom-interpreter remove-locations.froom
```
#### Remove regions from a trace archive based on their name
If regions need to removed that share a common name pattern, put a script similar to this into `remove-regions.froom` (You can use extended regular expressions to specify the pattern!):
```
OTF2Source("input/traces.otf2")
-> RegionRemover("^Log:.+broadcast") -> OTF2Sink("final");
```
The operation can be applied like this:
```bash
$ froom-interpreter remove-regions.froom
```
#### Merging artificially generated MPI events
In some situations, communication between processes should be recorded, but the communication is not directly supported by OTF2. In that case, the communication can often be recorded as normal region enter and leave events and in a second step mapped to MPI events with FROOM. FROOM takes the enter events and transforms them into `MPI_SEND` or `MPI_RECV` events, while removing the corresponding leave events. The input trace archives for FROOM can use arbitrary numbers as identifiers for communication partners (e.g. hashes built using IP addresses and port numbers). The only conditions that FROOM puts on such input traces is, that between two communication partners, these identifiers stay constant. The following format is required for the region names:
FAKE_<operation>;<sender>;<receiver>;<messageSize>;<tag>
```
where:
- `<operation>` is either `SEND` or `RECV`
- `<sender>`/`<receiver>` is a string not containing `;`
- `<messageSize>` is an integer representing the size of the message sent in the range `[0-18446744073709551615]` (`uint64_t`)
- `<tag>` is an integer representing the tag of the message in the range `[0-4294967295]` (`uint32_t`)
`otf2-print -A traces.otf2` should give lines such as the following:
```
ENTER 42 1683289351077941 Region: "FAKE_RECV;localhost:4711;myserver:7077;1407;9" <131>
```
The FROOM script to merge trace archives and modify MPI ranks is given in `mpi-adaptor.froom`:
```
OTF2Source("spark-org.apache.spark.deploy.master.Master/traces.otf2")
-> Renamer("Master thread" -> "Master") -> newmaster;
OTF2Source("spark-org.apache.spark.deploy.worker.Worker/traces.otf2")
-> Renamer("Master thread" -> "Worker") -> newworker;
OTF2Source("spark-org.apache.spark.executor.CoarseGrainedExecutorBackend/traces.otf2")
-> Renamer("Master thread" -> "Executor") -> newexecutor;
OTF2Source("spark-org.apache.spark.deploy.SparkSubmit/traces.otf2")
-> Renamer("Master thread" -> "Client") -> newclient;
newclient -> Unifier(newexecutor, newworker, newmaster)
//apply communication:
-> MPICommAdaptor -> OTF2Sink("unified-mpi");
```
The operation can be applied like this:
```bash
$ froom-interpreter mpi-adaptor.froom
```
#### Transforming a trace recorded for viewing in Chrome web browser into OTF2
When you have recorded a trace in Chrome trace format (e. g. using Tensorflow), you can transform it into OTF2 using the following `chrome.froom` script:
```
ChromeTraceSource(chrometrace) -> OTF2Sink ("transformed");
```
The operation can be applied like this:
```bash
$ froom-interpreter chrome.froom chrometrace=mychrometrace.json
```
This work was supported by the German Federal Ministry of Education and Research (BMBF, SCADS22B) and the Saxon State Ministry for Science, Culture and Tourism (SMWK) by funding the competence center for Big Data and AI ”ScaDS.AI Dresden/Leipzig”. The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.
If you use FROOM for your work, we would be happy if you cite the following paper:
Jan Frenzel, Apurv Deepak Kulkarni, Sebastian Döbel, Bert Wesarg, Maximilian Knespel, and Holger Brunst. 2023. FROOM: A Framework of Operators for OTF2 Modification. In Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W 2023), November 12--17, 2023, Denver, CO, USA. ACM, New York, NY, USA 9 Pages. https://doi.org/10.1145/3624062.3624209
The hope is that this README.md is self-explanatory. Please open issues if you feel that an improvement is necessary.
Contributions are very welcome! Feel free to share ideas by opening issues or contributing code.
This project is active, but the maintainers have a lot of other things to do. Thus, improvements (bug fixes, new features) only appear from time to time. So, please contribute!