Skip to content
Snippets Groups Projects
Commit 4fe35c56 authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Merge branch 'preview' of gitlab.hrz.tu-chemnitz.de:zih/hpcsupport/hpc-compendium into issue136

parents e2f66b37 99ff0cd7
No related branches found
No related tags found
3 merge requests!322Merge preview into main,!319Merge preview into main,!271Review KNL page
*package-lock.json
*package.json
*node_modules
**venv/
\ No newline at end of file
**venv/
doc.zih.tu-dresden.de/public/
......@@ -45,7 +45,7 @@ Check spelling for changed md-files:
stage: test
script:
- docker run --rm -w /src -e CI_MERGE_REQUEST_TARGET_BRANCH_NAME "${DOCKER_IMAGE}"
doc.zih.tu-dresden.de/util/check-spelling-changes.sh
doc.zih.tu-dresden.de/util/check-spelling.sh
only: [ merge_requests ]
Check links for changed md-files:
......
......@@ -454,10 +454,8 @@ there is a list of conventions w.r.t. spelling and technical wording.
* `Slurm` not `SLURM`
* `Filesystem` not `file system`
* `ZIH system` and `ZIH systems` not `Taurus`, `HRSKII`, `our HPC systems` etc.
**TODO:** Put into file
**TODO:** Implement checks [Issue #13](#13)
* `Workspace` not `work space`
* avoid term `HPC-DA`
### Code Blocks and Command Prompts
......
# Debugging Tools
Debugging is an essential but also rather time consuming step during application development. Tools
dramatically reduce the amount of time spent to detect errors. Besides the "classical" serial
programming errors, which may usually be easily detected with a regular debugger, there exist
programming errors that result from the usage of OpenMP, Pthreads, or MPI. These errors may also be
detected with debuggers (preferably debuggers with support for parallel applications), however,
specialized tools like MPI checking tools (e.g. Marmot) or thread checking tools (e.g. Intel Thread
Checker) can simplify this task. The following sections provide detailed information about the
different types of debugging tools:
- [Debuggers] **todo** Debuggers -- debuggers (with and without support for parallel applications)
- [MPI Usage Error Detection] **todo** MPI Usage Error Detection -- tools to detect MPI usage errors
- [Thread Checking] **todo** Thread Checking -- tools to detect OpenMP/Pthread usage errors
# Debuggers
# Debugging
This section describes how to start the debuggers on the ZIH systems.
Debugging is an essential but also rather time consuming step during application development. Tools
dramatically reduce the amount of time spent to detect errors. Besides the "classical" serial
programming errors, which may usually be easily detected with a regular debugger, there exist
programming errors that result from the usage of OpenMP, Pthreads, or MPI. These errors may also be
detected with debuggers (preferably debuggers with support for parallel applications), however,
specialized tools like MPI checking tools (e.g. Marmot) or thread checking tools (e.g. Intel Thread
Checker) can simplify this task.
Detailed information about how to use the debuggers can be found on the
website of the debuggers (see below).
This page provides detailed information on classic debugging at ZIH systems. The more specific
topic [MPI Usage Error Detection](mpi_usage_error_detection.md) covers tools to detect MPI usage
errors.
## Overview of available Debuggers at ZIH
......@@ -17,30 +24,30 @@ website of the debuggers (see below).
## General Advices
- You need to compile your code with the flag `-g` to enable
debugging. This tells the compiler to include information about
variable and function names, source code lines etc. into the
executable.
- It is also recommendable to reduce or even disable optimizations
(`-O0` or gcc's `-Og`). At least inlining should be disabled (usually
`-fno-inline`).
- For parallel applications: try to reproduce the problem with less
processes or threads before using a parallel debugger.
- Use the compiler's check capabilites to find typical problems at
compile time or run time, read the manual (`man gcc`, `man ifort`, etc.)
- Intel C++ example: `icpc -g -std=c++14 -w3 -check=stack,uninit -check-pointers=rw -fp-trap=all`
- Intel Fortran example: `ifort -g -std03 -warn all -check all -fpe-all=0 -traceback`
- The flag `-traceback` of the Intel Fortran compiler causes to print
stack trace and source code location when the program terminates
abnormally.
- If your program crashes and you get an address of the failing
instruction, you can get the source code line with the command
`addr2line -e <executable> <address>` (if compiled with `-g`).
- Use [Memory Debuggers](#memory-debugging) to
verify the proper usage of memory.
- Core dumps are useful when your program crashes after a long
runtime.
- Slides from user training: [Introduction to Parallel Debugging](misc/debugging_intro.pdf)
- You need to compile your code with the flag `-g` to enable
debugging. This tells the compiler to include information about
variable and function names, source code lines etc. into the
executable.
- It is also recommendable to reduce or even disable optimizations
(`-O0` or gcc's `-Og`). At least inlining should be disabled (usually
`-fno-inline`).
- For parallel applications: try to reproduce the problem with less
processes or threads before using a parallel debugger.
- Use the compiler's check capabilities to find typical problems at
compile time or run time, read the manual (`man gcc`, `man ifort`, etc.)
- Intel C++ example: `icpc -g -std=c++14 -w3 -check=stack,uninit -check-pointers=rw -fp-trap=all`
- Intel Fortran example: `ifort -g -std03 -warn all -check all -fpe-all=0 -traceback`
- The flag `-traceback` of the Intel Fortran compiler causes to print
stack trace and source code location when the program terminates
abnormally.
- If your program crashes and you get an address of the failing
instruction, you can get the source code line with the command
`addr2line -e <executable> <address>` (if compiled with `-g`).
- Use [Memory Debuggers](#memory-debugging) to
verify the proper usage of memory.
- Core dumps are useful when your program crashes after a long
runtime.
- Slides from user training: [Introduction to Parallel Debugging](misc/debugging_intro.pdf)
## GNU Debugger (GDB)
......@@ -55,34 +62,28 @@ several ways:
| Attach running program to GDB | `gdb --pid <process ID>` |
| Open a core dump | `gdb <executable> <core file>` |
This [GDB Reference
Sheet](http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf) makes life
easier when you often use GDB.
This [GDB Reference Sheet](http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf) makes life easier
when you often use GDB.
Fortran 90 programmers may issue an
`module load ddt` before their debug session. This makes the GDB
modified by DDT available, which has better support for Fortran 90 (e.g.
derived types).
Fortran 90 programmers may issue an `module load ddt` before their debug session. This makes the GDB
modified by DDT available, which has better support for Fortran 90 (e.g. derived types).
## Arm DDT
![DDT Main Window](misc/ddt-main-window.png)
- Intuitive graphical user interface and great support for parallel applications
- We have 1024 licences, so many user can use this tool for parallel
debugging
- Don't expect that debugging an MPI program with 100ths of process
will always work without problems
- The more processes and nodes involved, the higher is the
probability for timeouts or other problems
- Debug with as few processes as required to reproduce the bug you
want to find
- Module to load before using: `module load ddt`
- Start: `ddt <executable>`
- If the GUI runs too slow over your remote connection:
Use [WebVNC](../access/graphical_applications_with_webvnc.md) to start a remote desktop
session in a web browser.
- Slides from user training: [Parallel Debugging with DDT](misc/debugging_ddt.pdf)
- Intuitive graphical user interface and great support for parallel applications
- We have 1024 licences, so many user can use this tool for parallel debugging
- Don't expect that debugging an MPI program with 100ths of process will always work without
problems
- The more processes and nodes involved, the higher is the probability for timeouts or other
problems
- Debug with as few processes as required to reproduce the bug you want to find
- Module to load before using: `module load ddt` Start: `ddt <executable>` If the GUI runs too slow
- over your remote connection:
Use [WebVNC](../access/graphical_applications_with_webvnc.md) to start a remote desktop session in
a web browser.
- Slides from user training: [Parallel Debugging with DDT](misc/debugging_ddt.pdf)
### Serial Program Example
......@@ -95,9 +96,9 @@ srun: job 123456 has been allocated resources
marie@compute$ ddt ./myprog
```
- Run dialog window of DDT opens.
- Optionally: configure options like program arguments.
- Hit *Run*.
- Run dialog window of DDT opens.
- Optionally: configure options like program arguments.
- Hit *Run*.
### Multi-threaded Program Example
......@@ -110,10 +111,10 @@ srun: job 123457 has been allocated resources
marie@compute$ ddt ./myprog
```
- Run dialog window of DDT opens.
- Optionally: configure options like program arguments.
- If OpenMP: set number of threads.
- Hit *Run*.
- Run dialog window of DDT opens.
- Optionally: configure options like program arguments.
- If OpenMP: set number of threads.
- Hit *Run*.
### MPI-Parallel Program Example
......@@ -128,27 +129,27 @@ salloc: Granted job allocation 123458
marie@login$ ddt srun ./myprog
```
- Run dialog window of DDT opens.
- If MPI-OpenMP-hybrid: set number of threads.
- Hit *Run*
- Run dialog window of DDT opens.
- If MPI-OpenMP-hybrid: set number of threads.
- Hit *Run*
## Memory Debugging
- Memory debuggers find memory management bugs, e.g.
- Use of non-initialized memory
- Access memory out of allocated bounds
- DDT has memory debugging included (needs to be enabled in the run dialog)
- Memory debuggers find memory management bugs, e.g.
- Use of non-initialized memory
- Access memory out of allocated bounds
- DDT has memory debugging included (needs to be enabled in the run dialog)
### Valgrind (Memcheck)
- Simulation of the program run in a virtual machine which accurately observes memory operations.
- Extreme run time slow-down: use small program runs!
- Finds more memory errors than other debuggers.
- Further information:
- [Valgrind Website](http://www.valgrind.org)
- [Memcheck Manual](https://www.valgrind.org/docs/manual/mc-manual.html)
(explanation of output, command-line options)
- For serial or multi-threaded programs:
- Simulation of the program run in a virtual machine which accurately observes memory operations.
- Extreme run time slow-down: use small program runs!
- Finds more memory errors than other debuggers.
- Further information:
- [Valgrind Website](http://www.valgrind.org)
- [Memcheck Manual](https://www.valgrind.org/docs/manual/mc-manual.html)
(explanation of output, command-line options)
- For serial or multi-threaded programs:
```console
marie@login$ module load Valgrind
......@@ -156,12 +157,12 @@ Module Valgrind/3.14.0-foss-2018b and 12 dependencies loaded.
marie@login$ srun -n 1 valgrind ./myprog
```
- Not recommended for MPI parallel programs, since usually the MPI library will throw
a lot of errors. But you may use valgrind the following way such that every rank
writes its own valgrind logfile:
- Not recommended for MPI parallel programs, since usually the MPI library will throw
a lot of errors. But you may use Valgrind the following way such that every rank
writes its own Valgrind logfile:
```console
marie@login$ module load Valgrind
Module Valgrind/3.14.0-foss-2018b and 12 dependencies loaded.
marie@login$ srun -n <number of processes> valgrind --log-file=valgrind-%p.out ./myprog
marie@login$ srun -n <number of processes> valgrind --log-file=valgrind-%p.out ./myprog
```
# Software Development and Tools
This section provides you with the basic knowledge and tools to get you out of trouble. It will tell
you:
This section provides you with the basic knowledge and tools for software development
on the ZIH systems.
It will tell you:
- How to compile your code
- Using mathematical libraries
- Find caveats and hidden errors in application codes
- Handle debuggers
- Follow system calls and interrupts
- Understand the relationship between correct code and performance
- [General advises for building software](building_software.md)
- [Using compilers](compilers.md)
- [GPU programming](gpu_programming.md)
- How to use libraries
- [Using mathematical libraries](libraries.md)
- How to deal with (or even prevent) bugs
- [Find caveats and hidden errors in MPI application codes](mpi_usage_error_detection.md)
- [Using debuggers](debuggers.md)
- How to investigate the performance and efficiency of your code
- [Pika: monitoring of batch jobs](pika.md)
- [Perf: sampling-based performance analysis](perf_tools.md)
- [Score-P: event tracing of HPC applications](scorep.md)
- [Vampir: trace visualization](vampir.md)
Some hints that are helpful:
- Stick to standards wherever possible, e.g. use the **`-std`** flag
- Stick to standards wherever possible, e.g. use the `-std` flag
for GNU and Intel C/C++ compilers. Computers are short living
creatures, migrating between platforms can be painful. In addition,
running your code on different platforms greatly increases the
......@@ -26,31 +35,10 @@ Some questions you should ask yourself:
- Given that a code is parallel, are the results independent from the
numbers of threads or processes?
- Have you ever run your Fortran code with array bound and subroutine
argument checking (the **`-check all`** and **`-traceback`** flags
argument checking (the `-check all` and `-traceback` flags
for the Intel compilers)?
- Have you checked that your code is not causing floating point
exceptions?
- Does your code work with a different link order of objects?
- Have you made any assumptions regarding storage of data objects in
memory?
Subsections:
- [Compilers](compilers.md)
- [Debugging Tools](../archive/debugging_tools.md)
- [Debuggers](debuggers.md) (GDB, Allinea DDT, Totalview)
- [Tools to detect MPI usage errors](mpi_usage_error_detection.md) (MUST)
- PerformanceTools.md: [Score-P](scorep.md), [Vampir](vampir.md)
- [Libraries](libraries.md)
Intel Tools Seminar \[Oct. 2013\]
- [TU-Dresden_Intel_Multithreading_Methodologies.pdf]**todo** %ATTACHURL%/TU-Dresden_Intel_Multithreading_Methodologies.pdf:
Intel Multithreading Methodologies
- [TU-Dresden_Advisor_XE.pdf] **todo** %ATTACHURL%/TU-Dresden_Advisor_XE.pdf):
Intel Advisor XE - Threading prototyping tool for software
architects
- [TU-Dresden_Inspector_XE.pdf] **todo** %ATTACHURL%/TU-Dresden_Inspector_XE.pdf):
Inspector XE - Memory-, Thread-, Pointer-Checker, Debugger
- [TU-Dresden_Intel_Composer_XE.pdf] **todo** %ATTACHURL%/TU-Dresden_Intel_Composer_XE.pdf):
Intel Composer - Compilers, Libraries
......@@ -60,14 +60,14 @@ nav:
- Software Development and Tools:
- Overview: software/software_development_overview.md
- Building Software: software/building_software.md
- GPU Programming: software/gpu_programming.md
- Compilers: software/compilers.md
- Debuggers: software/debuggers.md
- GPU Programming: software/gpu_programming.md
- Libraries: software/libraries.md
- MPI Error Detection: software/mpi_usage_error_detection.md
- Score-P: software/scorep.md
- Debugging: software/debuggers.md
- Pika: software/pika.md
- Perf Tools: software/perf_tools.md
- PIKA: software/pika.md
- Score-P: software/scorep.md
- Vampir: software/vampir.md
- Data Life Cycle Management:
- Overview: data_lifecycle/overview.md
......@@ -113,7 +113,7 @@ nav:
- Overview: archive/overview.md
- Bio Informatics: archive/bioinformatics.md
- CXFS End of Support: archive/cxfs_end_of_support.md
- Debugging Tools: archive/debugging_tools.md
- KNL Nodes: archive/knl_nodes.md
- Load Leveler: archive/load_leveler.md
- Migrate to Atlas: archive/migrate_to_atlas.md
- No IB Jobs: archive/no_ib_jobs.md
......
#!/bin/bash
set -euo pipefail
scriptpath=${BASH_SOURCE[0]}
basedir=`dirname "$scriptpath"`
basedir=`dirname "$basedir"`
wordlistfile=$(realpath $basedir/wordlist.aspell)
function getNumberOfAspellOutputLines(){
cat - | aspell -p "$wordlistfile" --ignore 2 -l en_US list --mode=markdown | sort -u | wc -l
}
branch="preview"
if [ -n "$CI_MERGE_REQUEST_TARGET_BRANCH_NAME" ]; then
branch="origin/$CI_MERGE_REQUEST_TARGET_BRANCH_NAME"
fi
any_fails=false
source_hash=`git merge-base HEAD "$branch"`
#Remove everything except lines beginning with --- or +++
files=`git diff $source_hash | sed -n 's/^[-+]\{3,3\} //p'`
#echo "$files"
#echo "-------------------------"
#Assume that we have pairs of lines (starting with --- and +++).
while read oldfile; do
read newfile
if [ "${newfile: -3}" == ".md" ]; then
if [[ $newfile == *"accessibility.md"* ||
$newfile == *"data_protection_declaration.md"* ||
$newfile == *"legal_notice.md"* ]]; then
echo "Skip $newfile"
else
echo "Check $newfile"
if [ "$oldfile" == "/dev/null" ]; then
#Added files should not introduce new spelling mistakes
previous_count=0
else
previous_count=`git show "$source_hash:${oldfile:2}" | getNumberOfAspellOutputLines`
fi
if [ "$newfile" == "/dev/null" ]; then
#Deleted files do not contain any spelling mistakes
current_count=0
else
#Remove the prefix "b/"
newfile=${newfile:2}
current_count=`cat "$newfile" | getNumberOfAspellOutputLines`
fi
if [ $current_count -gt $previous_count ]; then
echo "-- File $newfile"
echo "Change increases spelling mistake count (from $previous_count to $current_count)"
any_fails=true
fi
fi
fi
done <<< "$files"
if [ "$any_fails" == true ]; then
exit 1
fi
#!/bin/bash
set -euo pipefail
scriptpath=${BASH_SOURCE[0]}
basedir=`dirname "$scriptpath"`
basedir=`dirname "$basedir"`
wordlistfile=$basedir/wordlist.aspell
acmd="aspell -p $wordlistfile --ignore 2 -l en_US list --mode=markdown"
function spell_check () {
file_to_check=$1
ret=$(cat "$file_to_check" | $acmd)
if [ ! -z "$ret" ]; then
echo "-- File $file_to_check"
echo "$ret" | sort -u
fi
}
wordlistfile=$(realpath $basedir/wordlist.aspell)
branch="origin/${CI_MERGE_REQUEST_TARGET_BRANCH_NAME:-preview}"
aspellmode=
if aspell dump modes | grep -q markdown; then
aspellmode="--mode=markdown"
fi
function usage() {
cat <<-EOF
usage: $0 [file]
Outputs all words of the file (or, if no argument given, all files in the current directory, recursively), that the spell checker cannot recognize.
If file is given, outputs all words of the file, that the spell checker cannot recognize.
If file is omitted, checks whether any changed file contains more unrecognizable words than before the change.
If you are sure a word is correct, you can put it in $wordlistfile.
EOF
}
function getAspellOutput(){
aspell -p "$wordlistfile" --ignore 2 -l en_US $aspellmode list | sort -u
}
function getNumberOfAspellOutputLines(){
getAspellOutput | wc -l
}
function isMistakeCountIncreasedByChanges(){
any_fails=false
#Unfortunately, sort depends on locale and docker does not provide much.
#Therefore, it uses bytewise comparison. We avoid problems with the command tr.
if ! sed 1d "$wordlistfile" | tr [:upper:] [:lower:] | sort -C; then
echo "Unsorted wordlist in $wordlistfile"
any_fails=true
fi
source_hash=`git merge-base HEAD "$branch"`
#Remove everything except lines beginning with --- or +++
files=`git diff $source_hash | sed -E -n 's#^(---|\+\+\+) ((/|./)[^[:space:]]+)$#\2#p'`
#echo "$files"
#echo "-------------------------"
#Assume that we have pairs of lines (starting with --- and +++).
while read oldfile; do
read newfile
if [ "${newfile: -3}" == ".md" ]; then
if [[ $newfile == *"accessibility.md"* ||
$newfile == *"data_protection_declaration.md"* ||
$newfile == *"legal_notice.md"* ]]; then
echo "Skip $newfile"
else
echo "Check $newfile"
if [ "$oldfile" == "/dev/null" ]; then
#Added files should not introduce new spelling mistakes
previous_count=0
else
previous_count=`git show "$source_hash:${oldfile:2}" | getNumberOfAspellOutputLines`
fi
if [ "$newfile" == "/dev/null" ]; then
#Deleted files do not contain any spelling mistakes
current_count=0
else
#Remove the prefix "b/"
newfile=${newfile:2}
current_count=`cat "$newfile" | getNumberOfAspellOutputLines`
fi
if [ $current_count -gt $previous_count ]; then
echo "-- File $newfile"
echo "Change increases spelling mistake count (from $previous_count to $current_count)"
any_fails=true
fi
fi
fi
done <<< "$files"
if [ "$any_fails" == true ]; then
return 1
fi
return 0
}
if [ $# -eq 1 ]; then
case $1 in
help | -help | --help)
......@@ -30,13 +90,11 @@ if [ $# -eq 1 ]; then
exit
;;
*)
spell_check $1
cat "$1" | getAspellOutput
;;
esac
elif [ $# -eq 0 ]; then
for i in `find -name \*.md`; do
spell_check $i
done
isMistakeCountIncreasedByChanges
else
usage
fi
......@@ -18,7 +18,8 @@ i file \+system
i \<taurus\> taurus\.hrsk /taurus
i \<hrskii\>
i hpc \+system
i hpc[ -]\+da\>"
i hpc[ -]\+da\>
i work[ -]\+space"
function grepExceptions () {
if [ $# -gt 0 ]; then
......@@ -73,7 +74,7 @@ fi
cnt=0
for f in $files; do
if [ "$f" != doc.zih.tu-dresden.de/README.md -a "${f: -3}" == ".md" ]; then
if [ "$f" != doc.zih.tu-dresden.de/README.md -a "${f: -3}" == ".md" -a -f "$f" ]; then
echo "Check wording in file $f"
while IFS=$'\t' read -r flags pattern exceptionPatterns; do
while IFS=$'\t' read -r -a exceptionPatternsArray; do
......
personal_ws-1.1 en 1805
Altix
analytics
BeeGFS
benchmarking
bsub
ccNUMA
citable
CPU
CPUs
CUDA
CXFS
DDR
DFG
EasyBuild
fastfs
filesystem
Filesystem
Flink
Fortran
GFLOPS
gfortran
GiB
gnuplot
Gnuplot
GPU
hadoop
Haswell
HDFS
Horovod
HPC
HPL
icc
icpc
ifort
ImageNet
Infiniband
Itanium
jpg
Jupyter
Keras
KNL
LINPACK
LoadLeveler
lsf
LSF
MEGWARE
MIMD
MKL
Montecito
mountpoint
MPI
mpicc
mpiCC
mpicxx
mpif
mpifort
mpirun
multicore
multithreaded
Neptun
NFS
NUMA
NUMAlink
OPARI
OpenACC
OpenCL
OpenMP
openmpi
OpenMPI
Opteron
PAPI
pdf
Perf
Pika
pipelining
png
rome
romeo
RSA
salloc
Saxonid
sbatch
ScaDS
Scalasca
scancel
scontrol
scp
SGI
SGEMM
SHA
SHMEM
SLES
Slurm
SMP
squeue
srun
SSD
TensorFlow
Theano
Vampir
ZIH
DFG
NUMAlink
ccNUMA
NUMA
Montecito
Opteron
Saxonid
MIMD
LSF
lsf
Itanium
mpif
mpicc
mpiCC
mpicxx
mpirun
mpifort
ifort
icc
icpc
gfortran
Altix
Neptun
Trition
stderr
stdout
SUSE
SLES
Fortran
SMP
MEGWARE
SGI
CXFS
NFS
CPUs
GFLOPS
TBB
TensorFlow
TFLOPS
png
jpg
pdf
bsub
OpenMPI
openmpi
multicore
fastfs
Theano
tmp
MKL
TBB
LoadLeveler
Gnuplot
gnuplot
RSA
SHA
pipelining
LINPACK
HPL
SGEMM
Trition
Vampir
Xeon
DDR
GiB
KNL
stdout
stderr
multithreaded
ZIH
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment