README.md 3.37 KB
Newer Older
Robert Dietrich's avatar
Robert Dietrich committed
1 2 3 4 5 6
# PIKA - Center-Wide and Job-Aware Cluster Monitoring

PIKA is an infrastructure for continuous monitoring and analysis of HPC systems. 
It uses the collection daemon collectd, InfluxDB to store time-series data and MariaDB to store job metadata. 
Furthermore, it provides a powerful web-frontend for the visualization of job data. 

7 8
Files that are required for the execution of the monitoring daemon (collectd), are located in the daemon folder. This includes the collectd configuration file and the LIKWID event group files as well as scripts that are periodically triggered to perform log rotation and error detection. 
Prolog and epilog scripts ensure that the PIKA package is installed and the daemon is running. Corresponding files are located in the folder job_control. 
Robert Dietrich's avatar
Robert Dietrich committed
9
Scripts for post processing, such as the generation of footprints, are located in the post_processing folder. 
10 11
Scripts to determine the scalability and the overhead of the monitoring as well as regression tests are located in the test folder. 

Robert Dietrich's avatar
Robert Dietrich committed
12 13 14
## Installation
The software stack consists of several components and tools. 
To simplify the installation, appropriate install scripts are available. 
15 16 17
For detailed install instructions see the [README.md](install/README.md) in the install directory.

## Configuration
18
Five files are used to configure the software stack: 
19

20
* *pika.conf* 
Robert Dietrich's avatar
Robert Dietrich committed
21
contains the global version independent configuration variables. It also sets some environment variables that are used in the job prolog and epilog. It uses `source` to read the environment variables from *.pika_access*.
22 23 24 25 26 27
* *.pika_access* 
exports the environment variables with the access parameters for the databases. Thus, this file should have restricted read access. You can use [pika_access_template](pika_access_template) to create this file. 
* *pika-VERSION.conf* 
is used for versioning of the PIKA package. It sets the PIKA package version along with the used version of collectd, LIKWID and Python. Finally, it uses `source` to read the environment variables from *pika.conf*. 
* *pika_utils.conf* 
provides utility functions for prolog, epilog and other bash scripts. 
28

29 30 31 32
Edit *pika.conf* an change the variables *LOCAL_STORE*, *PIKA_LOGPATH* and *PIKA_INSTALL_PATH* according to your needs or system setup. 
*LOCAL_STORE* specifies the path where temporary files are placed during prolog and read by the epilog script. It is also used for locking of the install and collectd start procedure. 
*PIKA_LOGPATH* specifies the path where the collectd log file *pika_collectd.log* will be written to.
*PIKA_INSTALL_PATH* specifies the path where the PIKA software (binaries, libraries, etc.) is installed to.
33

34 35 36
Edit *pika-VERSION.conf* and set the variable *PIKA_ROOT* to the path where the PIKA sources (and also the *.conf files) are located. 
This file also specifies the collectd batch size (number of metric values that are collected until being sent to the database) with the variable *PIKA_COLLECTD_BATCH_SIZE*. 
Furthermore, it does some exception handling for different types of nodes.
Frank Winkler's avatar
Frank Winkler committed
37

38 39 40 41 42
Finally, a symbolic link that points on a *pika-VERSION.conf* file has to be created an named pika-current.conf. For example:

    ln -s pika-1.2.conf pika-current.conf

To create a new PIKA software package, copy a *pika-VERSION.conf* file with a new version number and change the variables *PIKA_VERSION*, *COLLECTD_VERSION*, *LIKWID_VERSION* and, if necessary, *LIKWID_VERSION_SHA*.