Commit 4d3865d4 authored by root's avatar root
parents 084024cb ef84970d
# PIKA Collection Daemon, LIKWID Perfgroups and Log Rotation
PIKA uses the collection daemon collectd to read metrics and send them to an instance of InfluxDB. To capture all metrics relevant to PIKA, we have developed some additional plugins (see [collectd/collectd-plugins/](collectd/collectd-plugins/)).
Since collectd does not provide log rotation, we added this feature based on *logrotate*.
Furthermore, we us this mechanism to regularily check the log files for errors, summarize them into a shared directory and send a report once per day to registered email addresses (only if errors occurred).
## PIKA Perfgroups for LIKWID
PIKA uses its own LIKWID perfgroups to avoid or minimize multiplexing of hardware counters.
The collected metrics are
* FLOPS,
* IPC,
* main memory bandwidth and
* power consumption.
PIKA perfgroup files are named *pika_metrics_#.txt*, where *#* being a number (see pika_metrics_#.txt in [likwid/perfgroups/](likwid/perfgroups/)*CPU_ARCH*).
### What should I do if no PIKA groups exist for my CPU architecture yet?
If no pika perfgroup files exist for a CPU architecture, they can be derived from existing LIKWID perfgroups.
IPC or CPI (the inverse) is defined in almost every LIKWID group, because it is provided by a fixed counter register.
To determine the FLOPS there are usually several predefined groups for the different types of FLOPS.
Underlying counters are usually obtained via programmable counter registers, which are only available in limited quantity.
Since PIKA summarizes all types of FLOPS into a single FLOPS value (*flops_any*), normalized to single precision, it might be possible to get such a value with only one perfgroup.
If this is not possible multiple PIKA perfgroup files have to define the individual FLOPS types, e.g. *flops_dp*, *flops_sp*, *flops_avx*, which are then summarized by our collectd LIKWID plugin.
Main memory bandwidth and power consumption are often per-socket counters (at least for Intel CPUs). Predefined LIKWID perfgroups are called *MEM* and *ENERGY*.
For PIKA, we aggregate read and write bandwidth into one value.
## Per-Core or Per-Socket Metrik?
To determine whether a metric is per core or per socket, only a test run helps.
For example, use a benchmark from [LIKWID bench](https://github.com/RRZE-HPC/likwid/wiki/Likwid-Bench) and measure the counters with *likwid-perfctr*:
$ likwid-perfctr -g FLOPS likwid-bench -t triad_avx -w S0:128KB -w S1:128KB
The given example measures the LIKWID perfgroup *FLOPS* and executes the AVX version of the triad benchmark on socket *0* and *1* and thus on all threads in a dual-socket compute node.
If all but two threads show a value of *0*, the metric is per-socket.
List such metrics (comma-separated) with the option *PerSocketMetrics* in the LIKWID plugin block of the [collectd configuration file](collectd/pika-1.2_collectd_template.conf) to avoid sending unwanted zero values to the database.
\ No newline at end of file
# Configuration of collectd
[collect_template.conf](collect_template.conf) is a template for the collectd configuration file.
It is used by [pika_start_collectd.sh](../../job_control/slurm/taurus/pika_start_collectd.sh).
With PIKA 1.2 the read alignment options have changed and thus the template file to [pika-1.2_collectd_template.conf](pika-1.2_collectd_template.conf).
We added a new metric type for LIKWID. It is defined in [custom_types.db](collect_template.conf) and included with the collectd configuration.
\ No newline at end of file
# Additional Python and C plugins for collectd
Read plugins:
* LIKWID (C)
* Infiniband bandwidth (sum of send and receive) (Python3)
* Lustre read/write bandwidth and Lustre metadata (Python3)
Write Plugins:
* InfluxDB (Python3)
## collectd Configuration
To adjust the collectd configuration there are global and per-plugin options (in the configuration file). *Interval* and *AlignRead* can be set as global options for all plugins, but also overwritten on a per-plugin base. *AlignReadOffset* can only be set per plugin.
| Option | Default | PIKA Default | Description |
|----|----|----|----|
| Interval | 10 | 30 | specifies a sampling interval in seconds, at which read callbacks are triggered |
| AlignRead | not set / false | true | time aligns the call to read functions is to a multiple of the read interval, e.g. to obtain round timestamps or collect metrics with the same timestamps on different systems |
| AlignReadOffset | not set / 0 | 0.0* (different for each plugin) | specifies an offset (in seconds) by which the call to time-aligned read functions is postponed, automatically enables AlignRead for the plugin |
The PIKA plugins provide the following options within the plugin configuration block:
### LIKWID Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| NormalizeFlops | not set / false | name of the normalized FLOPS metric, e.g. *flops_any* |
| AccessMode | 0 | 0 for direct access (only as root or with perf), 1 to use the access daemon |
| Mtime | 10 | measurement time per LIKWID perfgroup |
| Groups | not set| comma separated perfgroups, e.g. "pika_metrics_1,pika_metrics_2" |
| PerSocketMetrics | not set | comma separated list of per socket metrics, e.g. "mem_bw,rapl_power" |
| MaxValues | value depends on the counter bitwidth (provided by LIKWID) | comma-separated list of metrics with their maximum value (*metric:max*), e.g. "ipc:10,flops:1e11,mem_bw:1e12" |
| PerCore | not set / false | summarize / add up metrics per core (by default metrics are reported per hardware thread) |
| Verbose | 1 | LIKWID verbosity output level |
### GPU NVIDIA Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| InstanceByGPUIndex | true | true | add the GPU ID to the plugin instance |
| InstanceByGPUName | true | false | add the GPU name to the plugin instance |
### Lustre Bandwidth Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| path | not set | not set | comma-separated list of paths to Lustre file system instances |
| instances | not set | not set | comma-separated list of Lustre instances (*FileSystemName-MagicNumber*)
| fsname_and_mount | use the root mount for each file system name | "*:/ws" (use the mount points that end with "ws" for all file system names) | specifies the file system name(s) and the mount points ending, separated by colon (option can appear multiple times per plugin) |
| recheck_limit | not set / 0 | 360 | check the the Lustre setup every *VALUE* reads |
### Infiniband Bandwidth Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| devices | not set | not set | comma-separated list of Infiniband device directory paths |
| directory | "/sys/class/infiniband" | not set | path to the directory of Infiniband devices |
| recheck_limit | not set / 0 | 1440 | check the the Lustre setup every *VALUE* reads |
### InfluxDB Plugin
| Option | Plugin Default | PIKA Default | Description |
|----|----|----|----|
| host | localhost | ??? | name or IP of the InfluxDB host |
| port | 8086 | 8086 | port number |
| user | not set | ??? | user name |
| pwd | not set | ??? | password |
| db | not set | prope | name of the database |
| batch_size | 200 | depends on number of cores per host | number of values that are buffered until being sent to InfluxDB |
| cache_size | 2000 | depends on number of cores per host | number of values that are cached before dropping them |
| StoreRates | false | true | if enabled, only the difference between following values are stored for collectd counter and derive types |
| PerCore | not set | "cpu:avg" | comma-separated list of metrics, where values shall be aggregated to per-core values, naming: *plugin1:aggregate,plugin2.aggregate* with *aggregate* may be either *sum* or *avg* |
| ssl | false | false | specify communication with database shall be SSL encrypted |
## Installation of Required Tools and C-Plugins
The plugins have been tested with collectd 5.10.0, but should also work with other versions. Make sure that Python3 is available before installing collectd. If you have an existing Python3 installation, it should be sufficient to install influxdb via pip3.
### Collectd
To use the *AlignRead* and *AlignReadOffset* options in the collectd configuration file, a patch from the patches folder has to be applied. A respective pull request has been opened (https://github.com/collectd/collectd/pull/3327).
If *AlignRead* is set to *true*, the call to read functions is time aligned to a multiple of the read interval, which allows round timestamps or the same timestamps across systems to be recorded. *AlignReadOffset* specifies an offset, in seconds, the call to time-aligned read functions is delayed.
Build collectd from sources (including the AlignRead feature):
~~~~
# get collectd sources
git clone https://github.com/rdietric/collectd.git
git checkout alignread
# configure collectd build
cd collectd
./build.sh
PYTHON_CONFIG=$PYTHON_ROOT/bin/python3-config ./configure --prefix=${COLLECTD_INST_PATH} --with-cuda=${CUDA_PATH}
# add the path where the nvml library is located, if building on a system without NVIDIA GPU
export LIBRARY_PATH=$PATH_TO_NVML_LIBRARY:$LIBRARY_PATH
# add paths to plugin.h and collectd.h and to nvml.h as configure for the gpu-nvidia plugin is broken
export C_INCLUDE_PATH=$PWD/src:$PWD/src/daemon:$CUDA_PATH/include:$C_INCLUDE_PATH
# build and install collectd
make -j; make install
~~~~
### LIKWID
Likwid is available at https://github.com/RRZE-HPC/likwid.git. You can also use a release version and apply a patch in the folder *patches* (if available).
~~~~
# get likwid sources
LIKWID_VERSION=4.3.3
wget https://github.com/RRZE-HPC/likwid/archive/likwid-${LIKWID_VERSION}.tar.gz
tar xfz likwid-${LIKWID_VERSION}.tar.gz
cd likwid-likwid-${LIKWID_VERSION}
patch -p0 < $PATH_TO_PATCH/prope_likwid-${LIKWID_VERSION}_src.patch
# set Likwid install path ($LIKWID_INST_PATH), perf_event as counter source and disable building the access daemon
sed -i "/^PREFIX = .*/ s|.*|PREFIX = $LIKWID_INST_PATH|" config.mk
sed -i "/^ACCESSMODE = .*/ s|.*|ACCESSMODE = perf_event|" config.mk
sed -i "/^BUILDDAEMON = .*/ s|.*|BUILDDAEMON = false|" config.mk
make -j4; make install
cd ..
~~~~
### InfluxDB
Download a package from https://portal.influxdata.com/downloads/ and install it according to the instructions.
Add the InfluxDB module to your Python 3 installation with `pip3 install influxdb`
### Plugins
Only the C plugin(s) have to be build.
Build Likwid plugin:
~~~~
export LIKWID_ROOT=/likwid/install/path
export COLLECTD_ROOT=/collectd/install/path
export COLLECTD_SRC=/collectd/sources/src
export COLLECTD_BUILD_DIR=/collectd/build/dir
cd c; make
~~~~
## Testing
### Singularity Container
The PIKA data collection can be tested in a singularity container (see [singularity folder](singularity))
### Test collectd
For testing purposes collectd can be run in foreground with `-f`:
~~~~
$COLLECTD_INSTALL_PATH/sbin/collectd -f -C $COLLECTD_CONF_FILE
~~~~
[pika_collectd.conf](pika_collectd.conf) is a sample configuration file for collectd.
Before running collectd, paths in this file have to be adapted, e.g. the path to `custom_types.db`, which is needed for the likwid plugin.
You should also disable plugins (comment out), where the resources are not available on the system, e.g. Lustre and Infiniband, if you are working on your own notebook.
### Likwid Permission Requirements
If you use Likwid with perf_event as access mode, you may not have permission to collect metrics.
If this happens, you can set perf_event_paranoid to 0 (requires root privileges).
`sh -c 'echo 0 >/proc/sys/kernel/perf_event_paranoid'`
See https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
# change the following four lines according to your install and source paths
LIKWID_ROOT ?= /usr/local
COLLECTD_ROOT ?= /usr/local/collectd/$(COLLECTD_VERSION)
COLLECTD_SRC ?= /usr/local/sources/collectd-$(COLLECTD_VERSION)/src
COLLECTD_BUILD_DIR ?= $(COLLECTD_SRC)/..
COLLECTD_INC = $(COLLECTD_ROOT)/include/collectd
CC = gcc
CFLAGS = -g -Wall -I$(LIKWID_ROOT)/include -I$(COLLECTD_INC) -I$(COLLECTD_SRC) -I$(COLLECTD_SRC)/daemon -I$(COLLECTD_SRC)/utils/common/ -I$(COLLECTD_BUILD_DIR)/src
LDFLAGS = -L$(LIKWID_ROOT)/lib -L$(COLLECTD_ROOT) -lm
OBJECTS=*.o
all: likwid
likwid:
$(CC) -DHAVE_CONFIG_H $(CFLAGS) -std=c99 -shared -fpic -o $(COLLECTD_ROOT)/lib/collectd/likwid.so likwid.c $(LDFLAGS) -llikwid
clean:
rm -f $(OBJECTS)
# C plugins for collectd
## LIKWID plugin
Before running `make`, set the following environment variables according to your source and install paths of collectd and LIKWID.
* LIKWID_ROOT (LIKWID install path)
* COLLECTD_ROOT (collectd install path)
* COLLECTD_SRC (collectd sources)
* COLLECTD_BUILD_DIR (collectd build directory)
Make generates $COLLECTD_ROOT/lib/collectd/likwid.so.
#define _POSIX_C_SOURCE 199309L //required for timespec and nanosleep() in c99
#include <math.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <float.h>
#include <time.h>
#include <likwid.h>
#ifdef TEST_LIWKID
#include <inttypes.h>
#include <time.h>
#define STATIC_ARRAY_SIZE(a) (sizeof(a) / sizeof(*(a)))
/********* Collectd time stuff ***********/
#define TIME_T_TO_CDTIME_T_STATIC(t) (((cdtime_t)(t)) << 30)
#define TIME_T_TO_CDTIME_T(t) \
(cdtime_t) { TIME_T_TO_CDTIME_T_STATIC(t) }
#define NS_TO_CDTIME_T(ns) \
(cdtime_t) { \
((((cdtime_t)(ns)) / 1000000000) << 30) | \
((((((cdtime_t)(ns)) % 1000000000) << 30) + 500000000) / 1000000000) \
}
#define TIMESPEC_TO_CDTIME_T(ts) \
NS_TO_CDTIME_T(1000000000ULL * (ts)->tv_sec + (ts)->tv_nsec)
#define CDTIME_T_TO_TIME_T(t) \
(time_t) { (time_t)(((t) + (1 << 29)) >> 30) }
/* Type for time as used by "utils_time.h" */
typedef uint64_t cdtime_t;
cdtime_t cdtime(void) /* {{{ */
{
int status;
struct timespec ts = {0, 0};
status = clock_gettime(CLOCK_REALTIME, &ts);
if (status != 0) {
printf("cdtime: clock_gettime failed\n");
return 0;
}
return TIMESPEC_TO_CDTIME_T(&ts);
} /* }}} cdtime_t cdtime */
/********* END: Collectd time stuff ***********/
#ifdef DEBUG
#define DEBUG(...) plugin_log(0, __VA_ARGS__)
#else
#define DEBUG(...)
#endif
#define ERROR(...) plugin_log(0, __VA_ARGS__)
#define WARNING(...) plugin_log(0, __VA_ARGS__)
#define NOTICE(...) plugin_log(0, __VA_ARGS__)
#define INFO(...) plugin_log(0, __VA_ARGS__)
void plugin_log(int level, const char *format, ...) {
char msg[1024];
va_list ap;
va_start(ap, format);
vsnprintf(msg, sizeof(msg), format, ap);
msg[sizeof(msg) - 1] = '\0';
va_end(ap);
fprintf(stderr, "%s\n", msg);
}
typedef void *notification_t;
typedef void *user_data_t;
#else
// headers required for collectd
#include "collectd.h"
#include "common.h" /* collectd auxiliary functions */
#include "plugin.h" /* plugin_register_*, plugin_dispatch_values */
#endif
#define PLUGIN_NAME "likwid"
static bool plugin_disabled = false;
/*! counter register access mode (default: direct access / perf_event) */
static int accessMode = 0;
/*! measurement time per event/metric group in nanoseconds (default: 10 sec) */
struct timespec mTime = {10, 0};
/*! measurement time per group in cdtime_t */
static cdtime_t mTimeCd = 0;
/*! Likwid verbosity output level (default: 1) */
static int likwid_verbose = 1;
/*! Normalize FLOPS to single precision? (default: false) */
static bool normalizeFlops = false;
/*! Summarize multiple FLOPS metrics into single precision FLOPS (can be true
* only if multiple FLOPS metrics are monitored) */
static bool summarizeFlops = false;
/*! Name of the normalized FLOPS metric */
static char *normalizedFlopsName = "flops_any";
/*! storage to normalize FLOPS values */
static double *flopsValues = NULL;
/*! \brief Maximum values for metrics */
typedef struct {
char *metricName;
double maxValue;
} max_value_t;
static max_value_t *maxValues = NULL;
static int numMaxValues = 0;
static uint64_t counterLimit = 0;
/*! \brief Metric type */
typedef struct {
char *name; /*!< metric name */
uint8_t xFlops; /*!< if > 0, it is a FLOPS metric and the value is
the multiplier for normalization */
bool perCpu; /*!< true, if values are per CPU, otherwise per socket is assumed
*/
double *perCoreValues; /*! Sum up HW thread values to core granularity */
double maxValue;
} metric_t;
/*! \brief Metric group type */
typedef struct {
int id; /*!< group ID */
char *name; /*!< group name */
int numMetrics; /*!< number of metrics in this group */
metric_t *metrics; /*!< metrics in this group */
} metric_group_t;
static int numGroups = 0;
static metric_group_t *metricGroups = NULL;
/* required thread array */
static int numThreads = 0; /**< number of HW threads to be monitored */
static int *hwThreads = NULL; /**< array of apic IDs to be monitored */
/*! per-socket metrics */
static int numSockets = 0;
static int *socketThreadIndices =
NULL; /*!< threads containing the per-socket data */
static int numSocketMetrics = 0;
static char **perSocketMetrics = NULL; /*!< array of per socket metric names */
/*!< Optional: sum up hardware thread values to cores, if SMT is enabled. */
static bool summarizePerCore = false;
static uint32_t numCores = 0;
/*! \brief Thread to core mapping structures */
static int *coreIndices = NULL; /*!< Index into the per core data array */
static uint32_t *coreIds = NULL; /*!< ID of the physical core by core index */
/*! Define an own strdup() as in C99 no strdup prototype is available. */
static char *mystrdup(const char *s) {
size_t len = strlen(s) + 1;
char *result = (char *)malloc(len);
if (result == (char *)0)
return (char *)0;
return (char *)memcpy(result, s, len);
}
/*! Determines by metric name, whether this is a per CPU or per socket
metric. The default is "per CPU" */
static bool _isMetricPerCPU(const char *metric) {
for (int i = 0; i < numSocketMetrics; i++) {
if (0 == strncmp(perSocketMetrics[i], metric, 6)) {
return false;
}
}
return true;
}
/*! \brief Initializes the event sets to be monitored. */
static void _setupGroups() {
if (NULL == metricGroups) {
ERROR(PLUGIN_NAME "No metric groups allocated! Plugin not initialized?");
return;
}
INFO(PLUGIN_NAME ": Setup metric group(s)");
int numFlopMetrics = 0;
// set the group IDs and metric names
for (int g = 0; g < numGroups; g++) {
if (metricGroups[g].name != NULL) {
int gid = perfmon_addEventSet(metricGroups[g].name);
if (gid < 0) {
metricGroups[g].id = -2;
INFO(PLUGIN_NAME ": Failed to add group %s to LIKWID perfmon module "
"(return code: %d)",
metricGroups[g].name, gid);
} else {
// set the group ID
metricGroups[g].id = gid;
// get number of metrics for this group
int numMetrics = perfmon_getNumberOfMetrics(gid);
metricGroups[g].numMetrics = numMetrics;
if (numMetrics == 0) {
WARNING(PLUGIN_NAME ": Group %s has no metrics!",
metricGroups[g].name);
continue;
}
// allocate metric array
metric_t *metrics = (metric_t *)malloc(numMetrics * sizeof(metric_t));
if (NULL == metrics) {
metricGroups[g].numMetrics = 0;
metricGroups[g].id = -2;
WARNING(
PLUGIN_NAME
": Disable group %s as memory for metrics could not be allocated",
metricGroups[g].name);
continue;
}
// set the pointer to the allocated memory for metrics
metricGroups[g].metrics = metrics;
// set metric names and set initial values to -1
for (int m = 0; m < numMetrics; m++) {
metrics[m].name = perfmon_getMetricName(gid, m);
// determine if metric is per CPU or per socket (by name)
metrics[m].perCpu = _isMetricPerCPU(metrics[m].name);
// normalize flops, if enabled
if (normalizeFlops && 0 == strncmp("flops", metrics[m].name, 5)) {
numFlopMetrics++;
size_t flopsStrLen = strlen(metrics[m].name);
// if metric is named exactly like the user-defined normalized FLOPS name, normalization of FLOPS is not needed
if (0 == strcmp(normalizedFlopsName, metrics[m].name)) {
normalizeFlops = false;
metrics[m].xFlops = 0;
INFO(PLUGIN_NAME ": Found metric %s. No normalization needed.", metrics[m].name);
}
// double precision to single precision = factor 2
else if (flopsStrLen >= 8 && 0 == strncmp("dp", metrics[m].name + 6, 2)) {
metrics[m].xFlops = 2;
}
// // avx to single precision = factor 4
else if (flopsStrLen >= 9 && 0 == strncmp("avx", metrics[m].name + 6, 3)) {
metrics[m].xFlops = 4;
} else // assume single precision otherwise
{
metrics[m].xFlops = 1;
}
} else {
metrics[m].xFlops = 0;
}
// if HW thread values should be summarized to cores, allocate per
// metric arrays
if (summarizePerCore) {
metrics[m].perCoreValues =
(double *)malloc(numCores * sizeof(double));
if (NULL == metrics[m].perCoreValues) {
WARNING(PLUGIN_NAME
": Malloc failed. Cannot summarize per core!");
summarizePerCore = false;
}
// initialize to invalid values, which will not be submitted
for (int i = 0; i < numCores; i++) {
metrics[m].perCoreValues[i] = -1.0;
}
}
// set maximum value of metric
if(counterLimit != 0) {
metrics[m].maxValue = (double)counterLimit;
} else {
metrics[m].maxValue = DBL_MAX;
}
for (int i = 0; i < numMaxValues; i++) {
if (0 == strncmp(metrics[m].name, maxValues[i].metricName,
strlen(maxValues[i].metricName))) {
metrics[m].maxValue = maxValues[i].maxValue;
}
}
} // END for metrics
}
} else {
// set group ID to invalid
metricGroups[g].id = -1;
}
} // END: for groups
// check if FLOPS have to be aggregated (if more than one FLOP metric is
// collected), which requires to allocate memory for each metric per core
if (numFlopMetrics > 1) {
INFO(PLUGIN_NAME ": Different FLOPS are aggregated.");
summarizeFlops = true;
flopsValues = (double *)malloc(numThreads * sizeof(double));
if (flopsValues) {
// initialize with -1 (invalid value)
for(int i = 0; i < numThreads; i++){
flopsValues[i] = -1.0;
}
} else {
WARNING(PLUGIN_NAME ": Could not allocate memory for normalization of "
"FLOPS. Disable summarization of FLOPS.");
summarizeFlops = false;
}
}
// no need to handle different FLOPS in the same metric group, as this could
// be handled directly in the Likwid metric group files
}
static int likwid_plugin_finalize(void) {
INFO(PLUGIN_NAME ": %s:%d", __FUNCTION__, __LINE__);
// perfmon_finalize(); // segfault
affinity_finalize();
numa_finalize();
topology_finalize();
// free memory where CPU IDs are stored
// INFO(PLUGIN_NAME ": free allocated memory");
if (NULL != hwThreads) {
free(hwThreads);
}
if (NULL != metricGroups) {
for (int i = 0; i < numGroups; i++) {
// memory for group names have been allocated with strdup
if (NULL != metricGroups[i].name) {
free(metricGroups[i].name);
}
}
free(metricGroups);
if (flopsValues) {
free(flopsValues);
}
}
return 0;
}
/*! \brief Initialize the LIKWID monitoring environment */
static int _init_likwid(void) {
topology_init();
numa_init();
affinity_init();
CpuTopology_t cputopo = get_cpuTopology();
HWThread *threadPool = cputopo->threadPool;
numThreads = cputopo->numHWThreads;
hwThreads = (int *)malloc(numThreads * sizeof(int));
if (NULL == hwThreads) {
ERROR(PLUGIN_NAME ": malloc of APIC ID array failed!");
likwid_plugin_finalize();
return 1;
}
for (int i = 0; i < numThreads; i++) {
hwThreads[i] = (int)threadPool[i].apicId;
}
HPMmode(accessMode);
perfmon_setVerbosity(likwid_verbose);
perfmon_init(numThreads, hwThreads);
// determine the HW threads that provide the per-socket data
numSockets = cputopo->numSockets;
socketThreadIndices = malloc(numSockets * sizeof(int));
if (NULL == socketThreadIndices) {
ERROR(PLUGIN_NAME ": malloc of socket thread index array failed!");
return 1;
}
int currentSocketIdx = 0;
for (int i = 0; i < numThreads; i++) {
uint32_t socketId = threadPool[i].packageId;
bool found = false;
for (int s = 0; s < currentSocketIdx; s++) {
if (socketThreadIndices[s] == socketId) {
found = true;
break;
}
}
if (!found) {
socketThreadIndices[currentSocketIdx] = i;
INFO(PLUGIN_NAME ": Collecting per-socket metrics with thread %d", i);
currentSocketIdx++;
if (currentSocketIdx == numSockets) {
break;
}
}
}
// handle per-core summarization
uint32_t numThreadsPerCore = cputopo->numThreadsPerCore;
if (summarizePerCore == false || numThreadsPerCore == 1) {
summarizePerCore = false;