diff --git a/doc.zih.tu-dresden.de/docs/software/energy_measurement.md b/doc.zih.tu-dresden.de/docs/software/energy_measurement.md new file mode 100644 index 0000000000000000000000000000000000000000..36c0abc661d10a828384c35155fe6a5c6074a301 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/software/energy_measurement.md @@ -0,0 +1,217 @@ +# Energy Measurement Infrastructure + +The Intel Haswell nodes of ZIH system are equipped with power instrumentation that allow the +recording and accounting of power dissipation and energy consumption data. The data is made +available through several different interfaces, which are described below. + +## Summary of Measurement Interfaces + +| Interface | Sensors | Rate | +|:-------------------------------------------|:----------------|:--------------------------------| +| Dataheap (C, Python, VampirTrace, Score-P) | Blade, (CPU) | 1 Sa/s | +| HDEEM\* (C, Score-P) | Blade, CPU, DDR | 1 kSa/s (Blade), 100 Sa/s (VRs) | +| HDEEM Command Line Interface | Blade, CPU, DDR | 1 kSa/s (Blade), 100 Sa/s (VR) | +| Slurm Accounting (`sacct`) | Blade | Per Job Energy | +| Slurm Profiling (HDF5) | Blade | Up to 1 Sa/s | + +!!! note + + Please specify `--partition=haswell --exclusive` along with your job request if you wish to use + HDEEM. + +### Accuracy, Temporal and Spatial Resolution + +In addition to the above mentioned interfaces, you can access the measurements through a +[C API](#using-the-hdeem-c-api) to get the full temporal and spatial resolution: + +- ** Blade:**1 kSa/s for the whole node, includes both sockets, DRAM, + SSD, and other on-board consumers. Since the system is directly + water cooled, no cooling components are included in the blade + consumption. +- **Voltage regulators (VR):** 100 Sa/s for each of the six VR + measurement points, one for each socket and four for eight DRAM + lanes (two lanes bundled). + +The GPU blades also have 1 Sa/s power instrumentation but have a lower accuracy. + +HDEEM measurements have an accuracy of 2 % for Blade (node) measurements, and 5 % for voltage +regulator (CPU, DDR) measurements. + +## Command Line Interface + +The HDEEM infrastructure can be controlled through command line tools that are made available by +loading the `hdeem` module. They are commonly used on the node under test to start, stop, and +query the measurement device. + +- `startHdeem`: Start a measurement. After the command succeeds, the + measurement data with the 1000 / 100 Sa/s described above will be + recorded on the Board Management Controller (BMC), which is capable + of storing up to 8h of measurement data. +- `stopHdeem`: Stop a measurement. No further data is recorded and + the previously recorded data remains available on the BMC. +- `printHdeem`: Read the data from the BMC. By default, the data is + written into a CSV file, whose name can be controlled using the + `-o` argument. +- `checkHdeem`: Print the status of the measurement device. +- `clearHdeem`: Reset and clear the measurement device. No further + data can be read from the device after this command is executed + before a new measurement is started. + +## Integration in Application Performance Traces + +The per-node power consumption data can be included as metrics in application traces by using the +provided metric plugins for Score-P (and VampirTrace). The plugins are provided as modules and set +all necessary environment variables that are required to record data for all nodes that are part of +the current job. + +For 1 Sa/s Blade values (Dataheap): + +- [Score-P](scorep.md): use the module `scorep-dataheap` +- [VampirTrace](../archive/vampirtrace.md): use the module `vampirtrace-plugins/power-1.1` + (**Remark:** VampirTrace is outdated!) + +For 1000 Sa/s (Blade) and 100 Sa/s (CPU{0,1}, DDR{AB,CD,EF,GH}): + +- [Score-P](scorep.md): use the module `scorep-hdeem`. This + module requires a recent version of `scorep/sync-...`. Please use + the latest that fits your compiler and MPI version. + +By default, the modules are set up to record the power data for the nodes they are used on. For +further information on how to change this behavior, please use module show on the respective module. + +!!! example "Example usage with `gcc`" + + ```console + marie@haswell$ module load scorep/trunk-2016-03-17-gcc-xmpi-cuda7.5 + marie@haswell$ module load scorep-dataheap + marie@haswell$ scorep gcc application.c -o application + marie@haswell$ srun ./application + ``` + +Once the application is finished, a trace will be available that allows you to correlate application +functions with the component power consumption of the parallel application. + +!!! note + + For energy measurements, only tracing is supported in Score-P/VampirTrace. + The modules therefore disables profiling and enables tracing, + please use [Vampir](vampir.md) to view the trace. + + +{: align="center"} + +!!! note + + The power measurement modules `scorep-dataheap` and `scorep-hdeem` are dynamic and only + need to be loaded during execution. However, `scorep-hdeem` does require the application to + be linked with a certain version of Score-P. + +By default, `scorep-dataheap` records all sensors that are available. Currently this is the total +node consumption and the CPUs. `scorep-hdeem` also records all available sensors +(node, 2x CPU, 4x DDR) by default. You can change the selected sensors by setting the environment +variables: + +!!! note + + The power measurement modules `scorep-dataheap` and `scorep-hdeem` are + dynamic and only need to be loaded during execution. + However, `scorep-hdeem` does require the application to be linked with + a certain version of Score-P. + +??? hint "For HDEEM" + `export SCOREP_METRIC_HDEEM_PLUGIN=Blade,CPU*` + +??? hint "For Dataheap" + `export SCOREP_METRIC_DATAHEAP_PLUGIN=localhost/watts` + +For more information on how to use Score-P, please refer to the [respective documentation](scorep.md). + +## Access Using Slurm Tools + +[Slurm](../jobs_and_resources/slurm.md) maintains its own database of job information, including +energy data. There are two main ways of accessing this data, which are described below. + +### Post-Mortem Per-Job Accounting + +This is the easiest way of accessing information about the energy consumed by a job and its job +steps. The Slurm tool `sacct` allows users to query post-mortem energy data for any past job or job +step by adding the field `ConsumedEnergy` to the `--format` parameter: + +```console +marie@login $ sacct --format="jobid,jobname,ntasks,submit,start,end,ConsumedEnergy,nodelist,state" -j 3967027 + JobID JobName NTasks Submit Start End ConsumedEnergy NodeList State +------------ ---------- -------- ------------------- ------------------- ------------------- -------------- --------------- ---------- +3967027 bash 2014-01-07T12:25:42 2014-01-07T12:25:52 2014-01-07T12:41:20 taurusi1159 COMPLETED +3967027.0 sleep 1 2014-01-07T12:26:07 2014-01-07T12:26:07 2014-01-07T12:26:18 0 taurusi1159 COMPLETED +3967027.1 sleep 1 2014-01-07T12:29:06 2014-01-07T12:29:06 2014-01-07T12:29:16 1.67K taurusi1159 COMPLETED +3967027.2 sleep 1 2014-01-07T12:33:25 2014-01-07T12:33:25 2014-01-07T12:33:36 1.84K taurusi1159 COMPLETED +3967027.3 sleep 1 2014-01-07T12:34:06 2014-01-07T12:34:06 2014-01-07T12:34:11 1.09K taurusi1159 COMPLETED +3967027.4 sleep 1 2014-01-07T12:38:03 2014-01-07T12:38:03 2014-01-07T12:39:44 18.93K taurusi1159 COMPLETED +``` + +This example job consisted of 5 job steps, each executing a sleep of a different length. Note that the +`ConsumedEnergy` metric is only applicable to exclusive jobs. + +### Slurm Energy Profiling + +The `srun` tool offers several options for profiling job steps by adding the `--profile` parameter. +Possible profiling options are `All`, `Energy`, `Task`, `Lustre`, and `Network`. In all cases, the +profiling information is stored in an HDF5 file that can be inspected using available HDF5 tools, +e.g., `h5dump`. The files are stored under `/scratch/profiling/` for each job, job step, and node. A +description of the data fields in the file can be found +[in the official documentation](http://slurm.schedmd.com/hdf5_profile_user_guide.html#HDF5). +In general, the data files +contain samples of the current **power** consumption on a per-second basis: + +```console +marie@login $ srun --partition haswell64 --acctg-freq=2,energy=1 --profile=energy sleep 10 +srun: job 3967674 queued and waiting for resources +srun: job 3967674 has been allocated resources +marie@login $ h5dump /scratch/profiling/marie/3967674_0_taurusi1073.h5 +[...] + DATASET "Energy_0000000002 Data" { + DATATYPE H5T_COMPOUND { + H5T_STRING { + STRSIZE 24; + STRPAD H5T_STR_NULLTERM; + CSET H5T_CSET_ASCII; + CTYPE H5T_C_S1; + } "Date_Time"; + H5T_STD_U64LE "Time"; + H5T_STD_U64LE "Power"; + H5T_STD_U64LE "CPU_Frequency"; + } + DATASPACE SIMPLE { ( 1 ) / ( 1 ) } + DATA { + (0): { + "", + 1389097545, # timestamp + 174, # power value + 1 + } + } + } +``` + +## Using the HDEEM C API + +Please specify `--partition=haswell --exclusive` along with your job request if you wish to use HDEEM. + +Please download the official documentation at +[http://www.bull.com/download-hdeem-library-reference-guide](http://www.bull.com/download-hdeem-library-reference-guide). + +The HDEEM header and sample code are locally installed on the nodes. + +??? hint "HDEEM header location" + + `/usr/include/hdeem.h` + +??? hint "HDEEM sample location" + + `/usr/share/hdeem/sample/` + +## Further Information and Citing + +More information can be found in the paper +[HDEEM: high definition energy efficiency monitoring](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7016382) +by Daniel Hackenberg et al. Please cite this paper if you are using HDEEM for your scientific work. diff --git a/doc.zih.tu-dresden.de/docs/software/misc/energy_measurements-vampir.png b/doc.zih.tu-dresden.de/docs/software/misc/energy_measurements-vampir.png new file mode 100644 index 0000000000000000000000000000000000000000..68bdbe318fc451ebb25a1938b70bb21905ad4358 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/energy_measurements-vampir.png differ diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index d01ed64792dbe5f90f3d99562946ace2929a279d..7cb8a72a85d94553c0bc2e46a23216885c74cfa1 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -72,6 +72,7 @@ nav: - PIKA: software/pika.md - Perf Tools: software/perf_tools.md - Vampir: software/vampir.md + - Energy Measurement: software/energy_measurement.md - Data Life Cycle Management: - Overview: data_lifecycle/overview.md - Filesystems: @@ -189,7 +190,7 @@ markdown_extensions: - attr_list - footnotes - pymdownx.tabbed: - alternate_style: true + alternate_style: True extra: tud_homepage: https://tu-dresden.de diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index 262f5eeae1b153648d59418137b8bac2dc2cf5fb..8d81f8032d8f01c58cc410c962e14e369b50c617 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -1,135 +1,284 @@ personal_ws-1.1 en 203 +ALLREDUCE +APIs +AVX Abaqus Addon Addons -ALLREDUCE Altix Amber Amdahl's -analytics Analytics -anonymized Ansys -APIs -AVX -awk +BLAS +BMC BeeGFS +CCM +CFX +CLI +CMake +COMSOL +CONFIG +CPU +CPUID +CPUs +CSV +CUDA +CXFS +CentOS +Chemnitz +DCV +DDP +DDR +DFG +DMTCP +DNS +Dask +DataFrames +DataParallel +Dataheap +DistributedDataParallel +DockerHub +Dockerfile +Dockerfiles +EPYC +ESSL +EasyBlocks +EasyBuild +EasyConfig +Espresso +FFT +FFTW +FMA +Flink +FlinkExample +Fortran +GBit +GDB +GDDR +GFLOPS +GPU +GPUs +GROMACS +GUIs +Galilei +Gauss +Gaussian +GiB +GitHub +GitLab +GitLab's +Gloo +HBM +HDEEM +HDF +HDFS +HDFView +HPC +HPE +HPL +Horovod +Hostnames +IOPS +IPs +ISA +ImageNet +Infiniband +Instrumenter +Itanium +Jupyter +JupyterHub +JupyterLab +KNL +Keras +Kunststofftechnik +LAMMPS +LAPACK +LINPACK +Leichtbau +Linter +LoadLeveler +MEGWARE +MIMD +MKL +MNIST +MathKernel +MathWorks +Mathematica +Memcheck +MiB +Microarchitecture +Miniconda +MobaXTerm +Montecito +Mpi +Multiphysics +Multithreading +NAMD +NCCL +NFS +NGC +NODELIST +NRINGS +NUM +NUMA +NUMAlink +NVLINK +NVMe +NWChem +Neptun +NumPy +Nutzungsbedingungen +Nvidia +OME +OPARI +OTF +OmniOpt +OpenACC +OpenBLAS +OpenCL +OpenGL +OpenMP +OpenMPI +OpenSSH +Opteron +PAPI +PESSL +PGI +PMI +PSOCK +Pandarallel +Perf +PiB +Pika +PowerAI +Pre +Preload +Pthread +Pthreads +PuTTY +PyTorch +PythonAnaconda +Quantum +Quickstart +README +RHEL +RSA +RSS +RStudio +ResNet +Rmpi +Rsync +Runtime +SFTP +SGEMM +SGI +SHA +SHMEM +SLES +SLURMCluster +SMP +SMT +SSHFS +STAR +SUSE +SXM +Sandybridge +Saxonid +ScaDS +ScaLAPACK +Scalasca +SciPy +Scikit +Slurm +SparkExample +SubMathKernel +Superdome +TBB +TCP +TFLOPS +TensorBoard +TensorFlow +Theano +ToDo +Torchvision +Trition +VASP +VMSize +VMs +VPN +Valgrind +Vampir +VampirServer +VampirTrace +VampirTrace's +VirtualGL +WebVNC +WinSCP +Workdir +XArray +XGBoost +XLC +XLF +Xeon +Xming +ZIH +ZIH's +ZSH +analytics +anonymized +awk benchmarking -BLAS broadwell bsub bullx -CCM ccNUMA centauri -CentOS -CFX cgroups checkpointing -Chemnitz citable -CLI -CMake -COMSOL conda config -CONFIG cpu -CPU -CPUID cpus -CPUs crossentropy css -CSV -CUDA cuDNN -CXFS dask -Dask dataframes -DataFrames datamover -DataParallel dataset -DCV ddl -DDP -DDR -DFG distr -DistributedDataParallel -DMTCP -DNS -Dockerfile -Dockerfiles -DockerHub dockerized dotfile dotfiles downtime downtimes -EasyBlocks -EasyBuild -EasyConfig ecryptfs engl english env -EPYC -Espresso -ESSL facto fastfs -FFT -FFTW filesystem filesystems flink -Flink -FlinkExample -FMA foreach -Fortran -Galilei -Gauss -Gaussian -GBit -GDB -GDDR -GFLOPS gfortran -GiB gifferent -GitHub -GitLab -GitLab's glibc -Gloo gnuplot gpu -GPU -GPUs gres -GROMACS -GUIs hadoop haswell -HBM -HDF -HDFS -HDFView hiera horovod -Horovod horovodrun hostname -Hostnames hpc -HPC hpcsupport -HPE -HPL html hvd hyperparameter @@ -138,256 +287,110 @@ hyperthreading icc icpc ifort -ImageNet img -Infiniband init inode -Instrumenter -IOPS -IPs ipynb -ISA -Itanium jobqueue jpg jss jupyter -Jupyter -JupyterHub -JupyterLab -Keras -KNL -Kunststofftechnik -LAMMPS -LAPACK lapply -Leichtbau -LINPACK linter -Linter lmod -LoadLeveler localhost lsf lustre markdownlint -Mathematica -MathKernel -MathWorks matlab -MEGWARE mem -Memcheck -MiB -Microarchitecture -MIMD -Miniconda mkdocs -MKL -MNIST -MobaXTerm modenv modenvs modulefile -Montecito mountpoint mpi -Mpi -mpicc mpiCC +mpicc mpicxx mpif mpifort mpirun multicore multiphysics -Multiphysics multithreaded -Multithreading -NAMD natively nbgitpuller nbsp -NCCL -Neptun -NFS -NGC nodelist -NODELIST -NRINGS ntasks -NUM -NUMA -NUMAlink -NumPy -Nutzungsbedingungen -Nvidia -NVLINK -NVMe -NWChem -OME -OmniOpt -OPARI -OpenACC -OpenBLAS -OpenCL -OpenGL -OpenMP openmpi -OpenMPI -OpenSSH -Opteron -OTF overfitting pandarallel -Pandarallel -PAPI parallelization parallelize parallelized parfor pdf perf -Perf performant -PESSL -PGI -PiB -Pika pipelining -PMI png -PowerAI ppc pre -Pre -Preload preloaded preloading prepend preprocessing -PSOCK -Pthread -Pthreads pty -PuTTY pymdownx -PythonAnaconda pytorch -PyTorch -Quantum queue quickstart -Quickstart randint reachability -README reproducibility requeueing resnet -ResNet -RHEL -Rmpi rome romeo -RSA -RSS -RStudio -Rsync runnable runtime -Runtime sacct salloc -Sandybridge -Saxonid sbatch -ScaDS scalability scalable -ScaLAPACK -Scalasca scancel -Scikit -SciPy scontrol scp scs -SFTP -SGEMM -SGI -SHA -SHMEM -SLES -Slurm -SLURMCluster -SMP -SMT -SparkExample spython squeue srun ssd -SSHFS -STAR stderr stdout subdirectories subdirectory -SubMathKernel -Superdome -SUSE -SXM -TBB -TCP -TensorBoard tensorflow -TensorFlow -TFLOPS -Theano tmp todo -ToDo toolchain toolchains torchvision -Torchvision tracefile tracefiles tracepoints transferability -Trition undistinguishable unencrypted uplink userspace -Valgrind -Vampir -VampirServer -VampirTrace -VampirTrace's -VASP vectorization venv virtualenv -VirtualGL -VMs -VMSize -VPN -WebVNC -WinSCP -Workdir workspace workspaces -XArray -Xeon -XGBoost -XLC -XLF -Xming yaml zih -ZIH -ZIH's -ZSH