Commit 100f7a0f authored by Robert Dietrich's avatar Robert Dietrich

added some install instructions and removed a deprecated README file

parent 6f1d82b4
# PIKA Install Instructions
The installation consists of two parts, the package for the node monitoring daemon and the databases for storage and archiving.
## Node Monitoring Daemon
We prepared simple bash scripts that can be used to generate the monitoring package.
*pika_install.conf*, a symbolic link to a *pika_install-VERSION.conf* file, sets a few environment variables that are used by the install scripts located in the *compute_node* directory.
*install_pika.sh* is the main script. It deletes any old installation of the PIKA package. Then it runs *install_python3.sh*, *install_likwid.sh* and *install_collectd.sh* (For a manual installation, make sure that collectd is installed last.) Finally, a tarball is created which can be unpacked on the target nodes.
All scripts source *pika_install.conf* and can be executed standalone. Called as *root*, the *PIKA_INSTALL_PATH* is used as build and install directory, otherwise *PIKA_BUILD_PATH* is used.
The default LIKWID access mode is perf_event. *install_pika.sh* and *install_likwid.sh* optionally accept the paramter *direct* to use the direct access mode to the MSR registers.
Depending on the configuration (OS, software, hardware, etc.) different packages have to be created for each system. This may require adding more if branches to the configuration files. A PIKA package should be created on a node of the partition or system on which it will later run.
## Databases for Time-Series Data and Metadata
It is recommended to install the databases on different systems or virtual machines, each having access to fast storage.
### InfluxDB
See https://portal.influxdata.com/downloads/ for download and install instructions.
Some recommendations for the configuration (*/etc/influxdb/influxdb.conf*):
bind-address = ":8088"
[meta]
retention-autocreate = false
[data]
index-version = "inmem"
query-log-enabled = false
cache-max-memory-size = "2g"
[coordinator]
write-timeout = "29s"
max-concurrent-queries = 0
[retention]
enabled = true
check-interval = "6h"
[shard-precreation]
enabled = true
check-interval = "1h"
advance-period = "3h"
[monitor]
store-interval = "30s" # disable or longer
[http]
enabled = true
bind-address = ":8086"
auth-enabled = true
log-enabled = false
max-connection-limit = 0
Add InfluxDB http endpoint to firewall policy, e.g. for RHEL with
firewall-cmd --zone=public --permanent --add-port=8086/tcp
systemctl restart firewalld
Start the InfluxDB service, e.g. with
systemctl start influxdb
Create users, a database and a retention policy using the InfluxDB shell (*influx*):
CREATE USER admin WITH PASSWORD 'password' WITH ALL PRIVILEGES
CREATE DATABASE pika
CREATE RETENTION POLICY shortterm ON pika DURATION 28d REPLICATION 1 SHARD DURATION 7d DEFAULT
CREATE RETENTION POLICY longterm ON pika DURATION INF REPLICATION 1 SHARD DURATION 7d
CREATE USER readonly WITH PASSWORD 'password'
GRANT READ ON "pika" TO "readonly"
### MariaDB
Install MariaDB, e.g. using the instructions from https://mariadb.com/resources/blog/installing-mariadb-10-on-centos-7-rhel-7/
Add MariaDB access port to your firewall policy, e.g. for RHEL with
firewall-cmd --zone=public --permanent --add-port=3306/tcp
systemctl restart firewalld
#!/bin/bash
#all compute nodes (omitting 3179-3180)
COMPUTE_NODES="taurusi[1001-1270,2001-2108,3001-3178,4001-4264,5001-5612,6001-6612],taurusknl[1-32],taurussmp[1-7]"
#check python on all nodes
clush -w $COMPUTE_NODES "ls /opt/prope/sw/python/2.7.14/bin/python; echo -e '\n'" > clush_output_1.txt 2>&1
#install third-library modules for python
clush -w $COMPUTE_NODES "PATH=/opt/prope/sw/python/2.7.14/bin ; pip install python-memcached" > clush_output.txt 2>&1
#Test job status:
sacct -j 14819977 -o Elapsed,State,ExitCode
or
scontrol show job [job_id]
#Stop all diamond daemons on all compute nodes:
clush -t 30 -B -u 30 -w $COMPUTE_NODES "source /sw/taurus/tools/prope/sw/prope_config; /sw/taurus/tools/prope/diamond-setup/diamond-manager stop"
#delete prope python from all compute nodes
clush -w $COMPUTE_NODES "rm -rf /opt/prope"
#untar python on all compute nodes
#COMPUTE_NODES="taurusi5252"
clush -w $COMPUTE_NODES "tar -xzf /sw/taurus/tools/prope/prope-1.0.tar.gz -C /opt"
clush -t 30 -B -u 30 -w $COMPUTE_NODES "rm -rf /opt/prope"
clush -t 30 -B -u 30 -w $COMPUTE_NODES "tar -xzf /sw/taurus/tools/prope/prope-2.1.tar.gz -C /opt"
##NEW
#get current compute nodes
COMPUTE_NODES=$(sinfo -o %N --noheader)
#kill diamond on all compute nodes
clush -t 30 -B -u 30 -w $COMPUTE_NODES "sudo /usr/local/sbin/kill_all_diamond.sh"
#install new prope version on all compute nodes
clush -t 30 -B -u 30 -w $COMPUTE_NODES "sudo /usr/local/sbin/deploy_prope.sh"
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment