Commit 2cdb89f7 authored by Robert Dietrich's avatar Robert Dietrich

added documentation on the PIKA collectd configuration, per-socket metrics and...

added documentation on the PIKA collectd configuration, per-socket metrics and LIKWID perfgroups for PIKA
parent 69b5f295
# PIKA Collection Daemon, LIKWID Perfgroups and Log Rotation
* PIKA uses collectd to collect runtime metrics.
PIKA uses the collection daemon collectd to read metrics and send them to an instance of InfluxDB. To capture all metrics relevant to PIKA, we have developed some additional plugins (see [collectd/collectd-plugins/](collectd/collectd-plugins/)).
* We added LIKWID perfgroups to collect FLOPS, IPC, main memory bandwidth and power consumption. (See pika_metrics_#.txt in [likwid/perfgroups/](likwid/perfgroups/)*CPU_ARCH*)
Since collectd does not provide log rotation, we added this feature based on *logrotate*.
Furthermore, we us this mechanism to regularily check the log files for errors, summarize them into a shared directory and send a report once per day to registered email addresses (only if errors occurred).
* Since collectd does not provide log rotation, we implemented it including ability to grep for errors.
\ No newline at end of file
## PIKA Perfgroups for LIKWID
PIKA uses its own LIKWID perfgroups to avoid or minimize multiplexing of hardware counters.
The collected metrics are
* FLOPS,
* IPC,
* main memory bandwidth and
* power consumption.
PIKA perfgroup files are named *pika_metrics_#.txt*, where *#* being a number (see pika_metrics_#.txt in [likwid/perfgroups/](likwid/perfgroups/)*CPU_ARCH*).
### What should I do if no PIKA groups exist for my CPU architecture yet?
If no pika perfgroup files exist for a CPU architecture, they can be derived from existing LIKWID perfgroups.
IPC or CPI (the inverse) is defined in almost every LIKWID group, because it is provided by a fixed counter register.
To determine the FLOPS there are usually several predefined groups for the different types of FLOPS.
Underlying counters are usually obtained via programmable counter registers, which are only available in limited quantity.
Since PIKA summarizes all types of FLOPS into a single FLOPS value (*flops_any*), normalized to single precision, it might be possible to get such a value with only one perfgroup.
If this is not possible multiple PIKA perfgroup files have to define the individual FLOPS types, e.g. *flops_dp*, *flops_sp*, *flops_avx*, which are then summarized by our collectd LIKWID plugin.
Main memory bandwidth and power consumption are often per-socket counters (at least for Intel CPUs). Predefined LIKWID perfgroups are called *MEM* and *ENERGY*.
For PIKA, we aggregate read and write bandwidth into one value.
## Per-Core or Per-Socket Metrik?
To determine whether a metric is per core or per socket, only a test run helps.
For example, use a benchmark from [LIKWID bench](https://github.com/RRZE-HPC/likwid/wiki/Likwid-Bench) and measure the counters with *likwid-perfctr*:
$ likwid-perfctr -g FLOPS likwid-bench -t triad_avx -w S0:128KB -w S1:128KB
The given example measures the LIKWID perfgroup *FLOPS* and executes the AVX version of the triad benchmark on socket *0* and *1* and thus on all threads in a dual-socket compute node.
If all but two threads show a value of *0*, the metric is per-socket.
List such metrics (comma-separated) with the option *PerSocketMetrics* in the LIKWID plugin block of the [collectd configuration file](collectd/pika-1.2_collectd_template.conf) to avoid sending unwanted zero values to the database.
\ No newline at end of file
......@@ -7,6 +7,69 @@ Read plugins:
Write Plugins:
* InfluxDB (Python3)
## collectd Configuration
To adjust the collectd configuration there are global and per-plugin options (in the configuration file). *Interval* and *AlignRead* can be set as global options for all plugins, but also overwritten on a per-plugin base. *AlignReadOffset* can only be set per plugin.
| Option | Default | PIKA Default | Description |
|----|----|----|----|
| Interval | 10 | 30 | specifies a sampling interval in seconds, at which read callbacks are triggered |
| AlignRead | not set / false | true | time aligns the call to read functions is to a multiple of the read interval, e.g. to obtain round timestamps or collect metrics with the same timestamps on different systems |
| AlignReadOffset | not set / 0 | 0.0* (different for each plugin) | specifies an offset (in seconds) by which the call to time-aligned read functions is postponed, automatically enables AlignRead for the plugin |
The PIKA plugins provide the following options within the plugin configuration block:
### LIKWID Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| NormalizeFlops | not set / false | name of the normalized FLOPS metric, e.g. *flops_any* |
| AccessMode | 0 | 0 for direct access (only as root or with perf), 1 to use the access daemon |
| Mtime | 10 | measurement time per LIKWID perfgroup |
| Groups | not set| comma separated perfgroups, e.g. "pika_metrics_1,pika_metrics_2" |
| PerSocketMetrics | not set | comma separated list of per socket metrics, e.g. "mem_bw,rapl_power" |
| MaxValues | value depends on the counter bitwidth (provided by LIKWID) | comma-separated list of metrics with their maximum value (*metric:max*), e.g. "ipc:10,flops:1e11,mem_bw:1e12" |
| PerCore | not set / false | summarize / add up metrics per core (by default metrics are reported per hardware thread) |
| Verbose | 1 | LIKWID verbosity output level |
### GPU NVIDIA Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| InstanceByGPUIndex | true | true | add the GPU ID to the plugin instance |
| InstanceByGPUName | true | false | add the GPU name to the plugin instance |
### Lustre Bandwidth Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| path | not set | not set | comma-separated list of paths to Lustre file system instances |
| instances | not set | not set | comma-separated list of Lustre instances (*FileSystemName-MagicNumber*)
| fsname_and_mount | use the root mount for each file system name | "*:/ws" (use the mount points that end with "ws" for all file system names) | specifies the file system name(s) and the mount points ending, separated by colon (option can appear multiple times per plugin) |
| recheck_limit | not set / 0 | 360 | check the the Lustre setup every *VALUE* reads |
### Infiniband Bandwidth Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| devices | not set | not set | comma-separated list of Infiniband device directory paths |
| directory | "/sys/class/infiniband" | not set | path to the directory of Infiniband devices |
| recheck_limit | not set / 0 | 1440 | check the the Lustre setup every *VALUE* reads |
### InfluxDB Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| host | localhost | ??? | name or IP of the InfluxDB host |
| port | 8086 | 8086 | port number |
| user | not set | ??? | user name |
| pwd | not set | ??? | password |
| db | not set | prope | name of the database |
| batch_size | 200 | depends on number of cores per host | number of values that are buffered until being sent to InfluxDB |
| cache_size | 2000 | depends on number of cores per host | number of values that are cached before dropping them |
| StoreRates | false | true | if enabled, only the difference between following values are stored for collectd counter and derive types |
| PerCore | not set | "cpu:avg" | comma-separated list of metrics, where values shall be aggregated to per-core values, naming: *plugin1:aggregate,plugin2.aggregate* with *aggregate* may be either *sum* or *avg* |
| ssl | false | false | specify communication with database shall be SSL encrypted |
## Installation of Required Tools and C-Plugins
The plugins have been tested with collectd 5.10.0, but should also work with other versions. Make sure that Python3 is available before installing collectd. If you have an existing Python3 installation, it should be sufficient to install influxdb via pip3.
......
#
# Config file for collectd(1).
# Please read collectd.conf(5) for a list of options.
# http://collectd.org/
#
##############################################################################
# Global settings for the daemon. #
##############################################################################
#HostnameReplace
TypesDB "CD_INST_PATH/share/collectd/types.db"
TypesDB "CUSTOM_TYPES_DIR/custom_types.db"
Interval 30
AlignRead true
ReadThreads 2
WriteThreads 1
##############################################################################
# Logging #
#----------------------------------------------------------------------------#
# Plugins which provide logging functions should be loaded first, so log #
# messages generated when loading or configuring other plugins can be #
# accessed. #
##############################################################################
LoadPlugin logfile
<Plugin logfile>
LogLevel info
File "/tmp/pika_collectd.log" #STDOUT
Timestamp true
PrintSeverity false
</Plugin>
##############################################################################
# LoadPlugin section #
##############################################################################
# plugin read functions are executed in reverse order?
<LoadPlugin memory>
Interval 30
</LoadPlugin>
<Plugin memory>
ValuesAbsolute true
ValuesPercentage false
</Plugin>
<LoadPlugin cpu>
Interval 30
AlignReadOffset 0.02
</LoadPlugin>
<Plugin cpu>
ReportByCpu true
ReportByState false
ValuesPercentage true
ReportNumCpu false
ReportGuestState false
SubtractGuestState false
</Plugin>
<LoadPlugin disk>
Interval 30
AlignReadOffset 0.04
</LoadPlugin>
<Plugin disk>
Disk "sda"
IgnoreSelected false
</Plugin>
<LoadPlugin gpu_nvidia>
Interval 30
AlignReadOffset 0.06
</LoadPlugin>
<Plugin gpu_nvidia>
# InstanceByGPUIndex false
InstanceByGPUName false
</Plugin> #gpu_nvidia_end
<LoadPlugin likwid>
Interval 60
AlignReadOffset 30.1
</LoadPlugin>
<Plugin likwid>
NormalizeFlops flops_any
AccessMode 0 # 1 for accessdaemon, 0 for direct access (only as root or with perf)
Mtime 15
Groups "pika_metrics_1,pika_metrics_2"
# by default metrics are reported per hardware thread
PerSocketMetrics "mem_bw,rapl_power"
MaxValues "ipc:10,flops:1e11,mem_bw:1e12"
PerCore true
Verbose 1
</Plugin> #likwid_end
<LoadPlugin python>
Interval 30
AlignReadOffset 0.08
</LoadPlugin>
<Plugin python>
ModulePath "CD_PLUGINS_PYTHON"
LogTraces true
Interactive false
Import "influx_write"
<Module influx_write>
#INFLUXHOST
#INFLUXPORT
#INFLUXUSER
#INFLUXPWD
#INFLUXDBNAME
batch_size 500
cache_size 4000
StoreRates true
PerCore "cpu:avg" #"likwid_cpu:sum" #plugin1:aggregate,plugin2.aggregate
ssl false
</Module>
Import "ib_bw"
<Module ib_bw>
#devices "/sys/class/infiniband/mlx4_0"
#directory "/sys/class/infiniband"
recheck_limit 1440
</Module>
Import "lustre_bw"
<Module lustre_bw>
#path "Lustre instance paths (comma separated)"
fsname_and_mount "*:/ws" # for all file systems AND mount points that end with '/ws'
recheck_limit 360 # every 3h
</Module>
</Plugin>
LoadPlugin unixsock
<Plugin unixsock>
SocketFile "/tmp/pika_collectd_unixsock" #socket for notifications
SocketGroup "root"
SocketPerms "0770"
DeleteSocket true
</Plugin>
#LoadPlugin write_log
##############################################################################
# Filter configuration #
##############################################################################
# Load required matches:
LoadPlugin match_regex
LoadPlugin target_scale
LoadPlugin target_set
PreCacheChain "pika"
<Chain "pika">
### ignore other than memory used
<Rule "mem_used_only">
<Match "regex">
Plugin "^memory$"
TypeInstance "^[f|s|c|b]"
</Match>
Target "stop"
</Rule>
# for the disk plugin, ignore other than disc_octets and disk_ops
<Rule "disk_o_only">
<Match "regex">
Plugin "^disk$"
Type "^(p|disk_[t|m|i])" #starts with p or disk_t|i|m
#Type "^(?!disk_o).+" # do not start with "disk_o" # does not work with collectd
</Match>
Target "stop"
</Rule>
# rename "disc_octets" to "bytes"
<Rule "rename_disk_octets">
<Match "regex">
Plugin "^disk$"
Type "^disk_octets$"
</Match>
<Target "set">
TypeInstance "bytes"
</Target>
Target "write"
Target "stop"
</Rule>
# no need to have an additional "disk" in the field name
<Rule "rename_disk_ops">
<Match "regex">
Plugin "^disk$"
Type "^disk_ops$"
</Match>
<Target "set">
TypeInstance "ops"
</Target>
Target "write"
Target "stop"
</Rule>
# rename CPU "active" to "used" and multiply each value by 0.01
<Rule "handle_cpu_active">
<Match "regex">
Plugin "^cpu$"
TypeInstance "^active$"
</Match>
<Target "scale">
Factor 0.01
</Target>
<Target "set">
TypeInstance "used"
</Target>
Target "write"
Target "stop"
</Rule>
# handle all rules for the gpu_nvidia plugin
<Rule "handle_gpu_nvidia">
<Match "regex">
Plugin "^gpu_nvidia$"
</Match>
<Target jump>
Chain "handle_gpu_nvidia"
</Target>
# set plugin name to nvml for metrics not handled in chain
<Target "set">
Plugin "nvml"
</Target>
</Rule>
</Chain>
<Chain "handle_gpu_nvidia">
<Rule "nvml_no_freq">
<Match "regex">
Type "^freq" #frequency for multiprocessor and memory
</Match>
Target "stop"
</Rule>
<Rule "nvml_no_freemem">
<Match "regex">
TypeInstance "^free"
</Match>
Target "stop"
</Rule>
<Rule "rename_temperature">
<Match "regex">
Type "^temp"
</Match>
<Target "set">
Plugin "nvml"
TypeInstance "temp"
</Target>
Target "write"
Target "stop"
</Rule>
<Rule "rename_memory">
<Match "regex">
Type "^memory$"
TypeInstance "^used$"
</Match>
<Target "set">
Plugin "nvml"
TypeInstance "mem_used"
</Target>
Target "write"
Target "stop"
</Rule>
<Rule "handle_gpu_used">
<Match "regex">
TypeInstance "gpu_used$"
</Match>
<Target "scale">
Factor 0.01
</Target>
<Target "set">
Plugin "nvml"
</Target>
Target "write"
Target "stop"
</Rule>
</Chain>
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment