diff --git a/twiki2md/root/Compendium.DataManagement/WorkSpaces.md b/twiki2md/root/Compendium.DataManagement/WorkSpaces.md deleted file mode 100644 index 7d922ca512a92301700b3423cb452f4b8514189d..0000000000000000000000000000000000000000 --- a/twiki2md/root/Compendium.DataManagement/WorkSpaces.md +++ /dev/null @@ -1,257 +0,0 @@ -# Workspaces - - - -Storage systems come with different flavours in terms of - -- size -- streaming bandwidth -- IOPS rate - -With a limited price one cannot have all in one. That is the reason why -our fast parallel file systems have restrictions wrt. age of files (see -[TermsOfUse](TermsOfUse)). The mechanism of workspaces enables users to -better manage the data life cycle of their HPC data. Workspaces are -primarily login-related. The tool concept of "workspaces" is common in a -large number of HPC centers. The idea is to request for a workspace -directory in a certain storage system - connected with an expiry date. -After a grace period the data is deleted automatically. The **maximum** -lifetime of a workspace depends on the storage system and is listed -below: - -- ssd: 1 day default, 30 days maximum, -- beegfs_global0: 1 day default, 30 days maximum, -- scratch: 1 day default, 100 days maximum, -- warm_archive: 1 day default, 1 year maximum. - -All workspaces can be extended twice (update: 10 times in scratch now). -There is no problem to use the fastest file systems we have, but keep -track on your data and move it to a cheaper system once you have done -your computations. - -## Workspace commands - -To list all available workspaces use: - - mark@tauruslogin6:~> mark@tauruslogin5:~> ws_find -l<br />Available filesystems:<br />scratch<br />warm_archive<br />ssd<br />beegfs_global0 - -To create a workspace, specify a unique name and its life time like -this: - - mark@tauruslogin6:~> ws_allocate -F scratch SPECint 50 - Info: creating workspace. - /scratch/ws/mark-SPECint - remaining extensions : 10 - remaining time in days: 50 - -**Important:** You can (and should) also add your email address and a -relative date for notification: - - mark@tauruslogin6:~> ws_allocate -F scratch -r 7 -m name.lastname@tu-dresden.de SPECint 50 - -\<verbatim><mark@tauruslogin6>:\~\> ws_allocate: \[options\] -workspace_name duration Options: -h \[ --help \] produce help message -V -\[ --version \] show version -d \[ --duration \] arg (=1) duration in -days -n \[ --name \] arg workspace name -F \[ --filesystem \] arg -filesystem -r \[ --reminder \] arg reminder to be sent n days before -expiration -m \[ --mailaddress \] arg mailaddress to send reminder to -(works only with tu-dresden.de addresses) -x \[ --extension \] extend -workspace -u \[ --username \] arg username -g \[ --group \] group -workspace -c \[ --comment \] arg comment\</verbatim> - -The maximum duration depends on the storage system: - -\<table border="2" cellpadding="2" cellspacing="2"> \<tbody> \<tr> \<td -style="padding-left: 30px;">**Storage system ( use with parameter -F ) -\<br />**\</td> \<td style="padding-left: 30px;"> **Duration** \</td> -\<td style="padding-left: 30px;">**Remarks\<br />**\</td> \</tr> \<tr -style="padding-left: 30px;"> \<td style="padding-left: 30px;">ssd\</td> -\<td style="padding-left: 30px;">30 days\</td> \<td style="padding-left: -30px;">High-IOPS file system (/lustre/ssd) on SSDs.\</td> \</tr> \<tr -style="padding-left: 30px;"> \<td style="padding-left: -30px;">beegfs\</td> \<td style="padding-left: 30px;">30 days\</td> \<td -style="padding-left: 30px;">High-IOPS file system (/lustre/ssd) -onNVMes.\</td> \</tr> \<tr style="padding-left: 30px;"> \<td -style="padding-left: 30px;">scratch\</td> \<td style="padding-left: -30px;">100 days\</td> \<td style="padding-left: 30px;">Scratch file -system (/scratch) with high streaming bandwidth, based on spinning -disks.\</td> \</tr> \<tr style="padding-left: 30px;"> \<td -style="padding-left: 30px;">warm_archive\</td> \<td style="padding-left: -30px;">1 year\</td> \<td style="padding-left: 30px;">Capacity file -system based on spinning disks.\</td> \</tr> \</tbody> \</table> A -workspace can be extended twice. With this command, a *new* duration for -the workspace is set (*not cumulative*): - - mark@tauruslogin6:~> ws_extend -F scratch SPECint 100 - Info: extending workspace. - /scratch/ws/mark-SPECint - remaining extensions : 1 - remaining time in days: 100 - -For email notification, you can either use the option `-m` in the -`ws_allocate` command line or use `ws_send_ical` to get an entry in your -calendar. (%RED%This works only with \<span>tu-dresden.de -\</span>addresses<span class="twiki-macro ENDCOLOR"></span>. Please -configure email redirection if you want to use another address.) - - mark@tauruslogin6:~> ws_send_ical -m ulf.markwardt@tu-dresden.de -F scratch SPECint - -You can easily get an overview of your currently used workspaces with -**`ws_list`**. - - mark@tauruslogin6:~> ws_list - id: benchmark_storage - workspace directory : /warm_archive/ws/mark-benchmark_storage - remaining time : 364 days 23 hours - creation time : Thu Jul 4 13:40:31 2019 - expiration date : Fri Jul 3 13:40:30 2020 - filesystem name : warm_archive - available extensions : 2 - id: SPECint - workspace directory : /scratch/ws/mark-SPECint - remaining time : 99 days 23 hours - creation time : Thu Jul 4 13:36:51 2019 - expiration date : Sat Oct 12 13:36:51 2019 - filesystem name : scratch - available extensions : 1 - -With\<span> **\<span>ws_release -F \<file system> \<workspace -name>\</span>**\</span>, you can delete your workspace. - -### Restoring expired workspaces - -**At expiration time** (or when you manually release your workspace), -your workspace will be moved to a special, hidden directory. For a month -(in \_warm*archive*: 2 months), you can still restore your data into a -valid workspace. For that, use - - mark@tauruslogin6:~> ws_restore -l -F scratch - -to get a list of your expired workspaces, and then restore them like -that into an existing, active workspace **newws**: - - mark@tauruslogin6:~> ws_restore -F scratch myuser-myws-1234567 newws - -**NOTE**: the expired workspace has to be specified using the full name -as listed by `ws_restore -l`, including username prefix and timestamp -suffix (otherwise, it cannot be uniquely identified). \<br />The target -workspace, on the other hand, must be given with just its short name as -listed by `ws_list`, without the username prefix. - -### Linking workspaces in home - -It might be valuable to have links to personal workspaces within a -certain directory, e.g., the user home directory. The command -\`ws_register DIR\` will create and manage links to all personal -workspaces within in the directory \`DIR\`. Calling this command will do -the following: - -- The directory \`DIR\` will be created if necessary -- Links to all personal workspaces will be managed: - - Creates links to all available workspaces if not already present - - Removes links to released workspaces \<p> \</p> \<p> \</p> \<p> - \</p> \<p> \</p> \<p> \</p> \<p> \</p> \<p> \</p> \<p> \</p> - \<p> \</p> \<p> \</p> \<p> \</p> \<p> \</p> \<p> \</p> \<p> - \</p> \<p> \</p> - -**Remark:** An automatic update of the workspace links can be invoked by -putting the command \`ws_register DIR\` in the user's personal shell -configuration file (e.g., .bashrc, .zshrc). - -## How to Use Workspaces - -We see three typical use cases for the use of workspaces: - -### Per-Job-Storage - -A batch job needs a directory for temporary data. This can be deleted -afterwards. - -Here an example for the use with Gaussian: - - #!/bin/bash - #SBATCH --partition=haswell - #SBATCH --time=96:00:00 - #SBATCH --nodes=1 - #SBATCH --ntasks=1 - #SBATCH --cpus-per-task=24 - - module load modenv/classic - module load gaussian - - COMPUTE_DIR=gaussian_$SLURM_JOB_ID - export GAUSS_SCRDIR=$(ws_allocate -F ssd $COMPUTE_DIR 7) - echo $GAUSS_SCRDIR - - srun g16 inputfile.gjf logfile.log - - test -d $GAUSS_SCRDIR && rm -rf $GAUSS_SCRDIR/* - ws_release -F ssd $COMPUTE_DIR - -In a similar manner, other jobs can make use of temporary workspaces. - -### Data for a Campaign - -For a series of calculations that works on the same data, you could -allocate a workspace in the scratch for e.g. 100 days: - - mark@tauruslogin6:~> ws_allocate -F scratch my_scratchdata 100 - Info: creating workspace. - /scratch/ws/mark-my_scratchdata - remaining extensions : 2 - remaining time in days: 99 - -If you want to share it with your project group, set the correct access -attributes, eg. - - mark@tauruslogin6:~> chmod g+wrx /scratch/ws/mark-my_scratchdata - -And verify it with: - - mark@tauruslogin6:~> ls -la /scratch/ws/mark-my_scratchdata <br />total 8<br />drwxrwx--- 2 mark hpcsupport 4096 Jul 10 09:03 .<br />drwxr-xr-x 5 operator adm 4096 Jul 10 09:01 .. - -### Mid-Term Storage - -For data that seldomly changes but consumes a lot of space, the warm -archive can be used. \<br />Note that this is **mounted read-only**on -the compute nodes, so you cannot use it as a work directory for your -jobs! - - mark@tauruslogin6:~> ws_allocate -F warm_archive my_inputdata 365 - /warm_archive/ws/mark-my_inputdata - remaining extensions : 2 - remaining time in days: 365 - -**Attention:** The warm archive is not built for billions of files. -There is a quota active of 100.000 files per group. Maybe you might want -to tar your data. To see your active quota use: - - mark@tauruslogin6:~> qinfo quota /warm_archive/ws/ - Consuming Entity Type Limit Current Usage - GROUP: hpcsupport LOGICAL_DISK_SPACE 100 TB 51 GB (0%) - GROUP: hpcsupport FILE_COUNT 100000 4 (0%) - GROUP: swtest LOGICAL_DISK_SPACE 100 TB 5 GB (0%) - GROUP: swtest FILE_COUNT 100000 38459 (38%) - TENANT: 8a2373d6-7aaf-4df3-86f5-a201281afdbb LOGICAL_DISK_SPACE 5 PB 1 TB (0%) - -Note that the workspaces reside under the mountpoint `/warm_archive/ws/` -and not \<span>/warm_archive\</span>anymore. - -### Troubleshooting - -If you are getting the error: - - Error: could not create workspace directory! - -you should check the \<span>"locale" \</span>setting of your ssh client. -Some clients (e.g. the one from MacOSX) set values that are not valid on -Taurus. You should overwrite LC_CTYPE and set it to a valid locale value -like: - - export LC_CTYPE=de_DE.UTF-8 - -A list of valid locales can be retrieved via \<br /> - - locale -a - -Please use only UTF8 (or plain) settings. Avoid "iso" codepages! diff --git a/twiki2md/root/PerformanceTools/ScoreP.md b/twiki2md/root/PerformanceTools/ScoreP.md deleted file mode 100644 index 5a32563b7cc5e560a53f95c711aad568c81e9057..0000000000000000000000000000000000000000 --- a/twiki2md/root/PerformanceTools/ScoreP.md +++ /dev/null @@ -1,136 +0,0 @@ -# Score-P - -The Score-P measurement infrastructure is a highly scalable and -easy-to-use tool suite for profiling, event tracing, and online analysis -of HPC applications.\<br />Currently, it works with the analysis tools -[Vampir](Vampir), Scalasca, Periscope, and Tau.\<br />Score-P supports -lots of features e.g. - -- MPI, SHMEM, OpenMP, pthreads, and hybrid programs -- Manual source code instrumentation -- Monitoring of CUDA applications -- Recording hardware counter by using PAPI library -- Function filtering and grouping - -Only the basic usage is shown in this Wiki. For a comprehensive Score-P -user manual refer to the [Score-P website](http://www.score-p.org). - -Before using Score-P, set up the correct environment with - - module load scorep - -To make measurements with Score-P, the user's application program needs -to be instrumented, i.e., at specific important points (\`\`events'') -Score-P measurement calls have to be activated. By default, Score-P -handles this automatically. In order to enable instrumentation of -function calls, MPI as well as OpenMP events, the user only needs to -prepend the Score-P wrapper to the usual compiler and linker commands. -Following wrappers exist: - -The following sections show some examples depending on the -parallelization type of the program. - -## Serial programs - -| | | -|----------------------|------------------------------------| -| original | ifort a.f90 b.f90 -o myprog | -| with instrumentation | scorep ifort a.f90 b.f90 -o myprog | - -This will instrument user functions (if supported by the compiler) and -link the Score-P library. - -## MPI parallel programs - -If your MPI implementation uses MPI compilers, Score-P will detect MPI -parallelization automatically: - -| | | -|----------------------|-------------------------------| -| original | mpicc hello.c -o hello | -| with instrumentation | scorep mpicc hello.c -o hello | - -MPI implementations without own compilers (as on the Altix) require the -user to link the MPI library manually. Even in this case, Score-P will -detect MPI parallelization automatically: - -| | | -|----------------------|-----------------------------------| -| original | icc hello.c -o hello -lmpi | -| with instrumentation | scorep icc hello.c -o hello -lmpi | - -However, if Score-P falis to detect MPI parallelization automatically -you can manually select MPI instrumentation: - -| | | -|----------------------|---------------------------------------------| -| original | icc hello.c -o hello -lmpi | -| with instrumentation | scorep --mpp=mpi icc hello.c -o hello -lmpi | - -If you want to instrument MPI events only (creates less overhead and -smaller trace files) use the option --nocompiler to disable automatic -instrumentation of user functions. - -## OpenMP parallel programs - -When Score-P detects OpenMP flags on the command line, OPARI2 is invoked -for automatic source code instrumentation of OpenMP events: - -| | | -|----------------------|---------------------------------| -| original | ifort -openmp pi.f -o pi | -| with instrumentation | scorep ifort -openmp pi.f -o pi | - -## Hybrid MPI/OpenMP parallel programs - -With a combination of the above mentioned approaches, hybrid -applications can be instrumented: - -| | | -|----------------------|--------------------------------------------| -| original | mpif90 -openmp hybrid.F90 -o hybrid | -| with instrumentation | scorep mpif90 -openmp hybrid.F90 -o hybrid | - -## Score-P instrumenter option overview - -| Type of instrumentation | Instrumenter switch | Default value | Runtime measurement control | -|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------:|:-----------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| MPI | --mpp=mpi | (auto) | (see Sec. [Selection of MPI Groups](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#mpi_groups) ) | -| SHMEM | --mpp=shmem | (auto) | | -| OpenMP | --thread=omp | (auto) | | -| Pthread | --thread=pthread | (auto) | | -| Compiler (see Sec. [Automatic Compiler Instrumentation](https://silc.zih.tu-dresden.de/scorep-current/html/instrumentation.html#compiler_instrumentation) ) | --compiler/--nocompiler | enabled | Filtering (see Sec. [Filtering](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#filtering) ) | -| PDT instrumentation (see Sec. [Source-Code Instrumentation Using PDT](https://silc.zih.tu-dresden.de/scorep-current/html/instrumentation.html#tau_instrumentation) ) | --pdt/--nopdt | disabled | Filtering (see Sec. [Filtering](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#filtering) ) | -| POMP2 user regions (see Sec. [Semi-Automatic Instrumentation of POMP2 User Regions](https://silc.zih.tu-dresden.de/scorep-current/html/instrumentation.html#pomp_instrumentation) ) | --pomp/--nopomp | depends on OpenMP usage | Filtering (see Sec. [Filtering](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#filtering) ) | -| Manual (see Sec. [Manual Region Instrumentation](https://silc.zih.tu-dresden.de/scorep-current/html/instrumentation.html#manual_instrumentation) ) | --user/--nouser | disabled | Filtering (see Sec. [Filtering](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#filtering) )\<br /> and\<br /> selective recording (see Sec. [Selective Recording](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#selective_recording) ) | - -## Application Measurement - -After the application run, you will find an experiment directory in your -current working directory, which contains all recorded data. - -In general, you can record a profile and/or a event trace. Whether a -profile and/or a trace is recorded, is specified by the environment -variables \<span> -`[[https://silc.zih.tu-dresden.de/scorep-current/html/scorepmeasurementconfig.html#SCOREP_ENABLE_PROFILING][SCOREP_ENABLE_PROFILING]]` -\</span> and \<span> -`[[https://silc.zih.tu-dresden.de/scorep-current/html/scorepmeasurementconfig.html#SCOREP_ENABLE_TRACING][SCOREP_ENABLE_TRACING]]` -\</span>. If the value of this variables is zero or false, -profiling/tracing is disabled. Otherwise Score-P will record a profile -and/or trace. By default, profiling is enabled and tracing is disabled. -For more information please see [the list of Score-P measurement -configuration -variables](https://silc.zih.tu-dresden.de/scorep-current/html/scorepmeasurementconfig.html). - -You may start with a profiling run, because of its lower space -requirements. According to profiling results, you may configure the -trace buffer limits, filtering or selective recording for recording -traces. - -Score-P allows to configure several parameters via environment -variables. After the measurement run you can find a \<span>scorep.cfg -\</span>file in your experiment directory which contains the -configuration of the measurement run. If you had not set configuration -values explicitly, the file will contain the default values. - --- Main.RonnyTschueter - 2014-09-11 diff --git a/twiki2md/root/SoftwareDevelopment/Debuggers.md b/twiki2md/root/SoftwareDevelopment/Debuggers.md deleted file mode 100644 index 0e4c0b8fc565565cc38515f1de3ad60eaee24873..0000000000000000000000000000000000000000 --- a/twiki2md/root/SoftwareDevelopment/Debuggers.md +++ /dev/null @@ -1,247 +0,0 @@ -# Debuggers - -This section describes how to start the debuggers on the HPC systems of -ZIH. - -Detailed i nformation about how to use the debuggers can be found on the -website of the debuggers (see below). - - - -## Overview of available Debuggers - -| | | | | -|--------------------|-----------------------------------|--------------------------------------------------------------------------------------------|---------------------------------------------------------| -| | **GNU Debugger** | **DDT** | **Totalview** | -| Interface | command line | graphical user interface | | -| Languages | C, C++, Fortran | C, C++, Fortran, F95 | | -| Parallel Debugging | Threads | Threads, MPI, hybrid | | -| Debugger Backend | GDB | | own backend | -| Website | <http://www.gnu.org/software/gdb> | [arm.com](https://developer.arm.com/products/software-development-tools/hpc/documentation) | <https://www.roguewave.com/products-services/totalview> | -| Licenses at ZIH | free | 1024 | 32 | - -## General Advices - -- You need to compile your code with the flag `-g` to enable - debugging. This tells the compiler to include information about - variable and function names, source code lines etc. into the - executable. -- It is also recommendable to reduce or even disable optimizations - (`-O0`). At least inlining should be disabled (usually - `-fno-inline`) -- For parallel applications: try to reconstruct the problem with less - processes before using a parallel debugger. -- The flag `-traceback` of the Intel Fortran compiler causes to print - stack trace and source code location when the program terminates - abnormally. -- If your program crashes and you get an address of the failing - instruction, you can get the source code line with the command - `addr2line -e <executable> <address>` -- Use the compiler's check capabilites to find typical problems at - compile time or run time - - Read manual (`man gcc`, `man ifort`, etc.) - - Intel C compile time checks: - `-Wall -Wp64 -Wuninitialized -strict-ansi` - - Intel Fortran compile time checks: `-warn all -std95` - - Intel Fortran run time checks: `-C -fpe0 -traceback` -- Use [memory debuggers](Compendium.Debuggers#Memory_Debugging) to - verify the proper usage of memory. -- Core dumps are useful when your program crashes after a long - runtime. -- More hints: [Slides about typical Bugs in parallel - Programs](%ATTACHURL%/typical_bugs.pdf) - -## GNU Debugger - -The GNU Debugger (GDB) offers only limited to no support for parallel -applications and Fortran 90. However, it might be the debugger you are -most used to. GDB works best for serial programs. You can start GDB in -several ways: - -| | | -|-------------------------------|--------------------------------| -| | Command | -| Run program under GDB | `gdb <executable>` | -| Attach running program to GDB | `gdb --pid <process ID>` | -| Open a core dump | `gdb <executable> <core file>` | - -This [GDB Reference -Sheet](http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf) makes life -easier when you often use GDB. - -Fortran 90 programmers which like to use the GDB should issue an -`module load ddt` before their debug session. This makes the GDB -modified by DDT available, which has better support for Fortran 90 (e.g. -derived types). - -## DDT - -\<img alt="" src="%ATTACHURL%/ddt.png" title="DDT Main Window" -width="500" /> - -- Commercial tool of Arm shipped as "Forge" together with MAP profiler -- Intuitive graphical user interface -- Great support for parallel applications -- We have 1024 licences, so many user can use this tool for parallel - debugging -- Don't expect that debugging an MPI program with 100ths of process - will work without problems - - The more processes and nodes involved, the higher is the - probability for timeouts or other problems - - Debug with as few processes as required to reproduce the bug you - want to find -- Module to load before using: `module load ddt` -- Start: `ddt <executable>` -- If you experience problems in DDTs configuration when changing the - HPC system, you should issue `rm -r ~/.ddt.` -- More Info - - [Slides about basic DDT - usage](%ATTACHURL%/parallel_debugging_ddt.pdf) - - [Official - Userguide](https://developer.arm.com/docs/101136/latest/ddt) - -### Serial Program Example (Taurus, Venus) - - % module load ddt - % salloc --x11 -n 1 --time=2:00:00 - salloc: Granted job allocation 123456 - % ddt ./myprog - -- uncheck MPI, uncheck OpenMP -- hit *Run*. - -### Multithreaded Program Example (Taurus, Venus) - - % module load ddt - % salloc --x11 -n 1 --cpus-per-task=<number of threads> --time=2:00:00 - salloc: Granted job allocation 123456 - % srun --x11=first ddt ./myprog - -- uncheck MPI -- select OpenMP, set number of threads (if OpenMP) -- hit *Run* - -### MPI-Parallel Program Example (Taurus, Venus) - - % module load ddt - % module load bullxmpi # Taurus only - % salloc --x11 -n <number of processes> --time=2:00:00 - salloc: Granted job allocation 123456 - % ddt -np <number of processes> ./myprog - -- select MPI -- set the MPI implementation to "SLURM (generic)" -- set number of processes -- hit *Run* - -## Totalview - -\<img alt="" src="%ATTACHURL%/totalview-main.png" title="Totalview Main -Window" /> - -- Commercial tool -- Intuitive graphical user interface -- Good support for parallel applications -- Great support for complex data structures, also for Fortran 90 - derived types -- We have only 32 licences - - Large parallel runs are not possible - - Use DDT for these cases -- Module to load before using: `module load totalview` -- Start: `totalview <executable>` - -### Serial Program Example (Taurus, Venus) - - % module load totalview - % salloc --x11 -n 1 --time=2:00:00 - salloc: Granted job allocation 123456 - % srun --x11=first totalview ./myprog - -- ensure *Parallel system* is set to *None* in the *Parallel* tab -- hit *Ok* -- hit the *Go* button to start the program - -### Multithreaded Program Example (Taurus, Venus) - - % module load totalview - % salloc --x11 -n 1 --cpus-per-task=<number of threads> --time=2:00:00 - salloc: Granted job allocation 123456 - % export OMP_NUM_THREADS=<number of threads> # or any other method to set the number of threads - % srun --x11=first totalview ./myprog - -- ensure *Parallel system* is set to *None* in the *Parallel* tab -- hit *Ok* -- set breakpoints, if necessary -- hit the *Go* button to start the program - -### MPI-Parallel Program Example (Taurus, Venus) - - % module load totalview - % module load bullxmpi # Taurus only - % salloc -n <number of processes> --time=2:00:00 - salloc: Granted job allocation 123456 - % totalview - -- select your executable program with the button *Browse...* -- ensure *Parallel system* is set to *SLURM* in the *Parallel* tab -- set the number of Tasks in the *Parallel* tab -- hit *Ok* -- set breakpoints, if necessary -- hit the *Go* button to start the program - -## Memory Debugging - -- Memory debuggers find memory management bugs, e.g. - - Use of non-initialized memory - - Access memory out of allocated bounds -- Very valuable tools to find bugs -- DDT and Totalview have memory debugging included (needs to be - enabled before run) - -### Valgrind - -- <http://www.valgrind.org> -- Simulation of the program run in a virtual machine which accurately - observes memory operations -- Extreme run time slow-down - - Use small program runs -- Sees more memory errors than the other debuggers -- Not available on mars - -<!-- --> - -- for serial programs: - -<!-- --> - - % module load Valgrind - % valgrind ./myprog - -- for MPI parallel programs (every rank writes own valgrind logfile): - -<!-- --> - - % module load Valgrind - % mpirun -np 4 valgrind --log-file=valgrind.%p.out ./myprog - -### DUMA - -**Note: DUMA is no longer installed on our systems** - -- DUMA = Detect Unintended Memory Access -- <http://duma.sourceforge.net> -- Replaces memory management functions through own versions and keeps - track of allocated memory -- Easy to use -- Triggers program crash when an error is detected - - use GDB or other debugger to find location -- Almost no run-time slow-down -- Does not see so many bugs like Valgrind - -<!-- --> - - % module load duma - icc -o myprog myprog.o ... -lduma # link program with DUMA library (-lduma) - % bsub -W 2:00 -n 1 -Is bash - <<Waiting for dispatch ...>> - % gdb ./myprog diff --git a/twiki2md/root/SoftwareDevelopment/Miscellaneous.md b/twiki2md/root/SoftwareDevelopment/Miscellaneous.md deleted file mode 100644 index 61b4678c0b36bb33d429aee814b682add78c77a3..0000000000000000000000000000000000000000 --- a/twiki2md/root/SoftwareDevelopment/Miscellaneous.md +++ /dev/null @@ -1,50 +0,0 @@ -# Miscellaneous - -## Check Assembler Code - -If a binary `a.out` was built to include symbolic information (option -"-g") one can have a look at a commented disassembly with - - objdump -dS a.out - -to see something like: - - do { - checksum += d.D.ACC(idx).i[0] * d.D.ACC(idx).i[1]; - 8049940: 89 d9 mov %ebx,%ecx - 8049942: 2b 8d 08 fa ff ff sub -0x5f8(%ebp),%ecx - 8049948: 8b 55 c4 mov -0x3c(%ebp),%edx - 804994b: 2b 95 0c fa ff ff sub -0x5f4(%ebp),%edx - 8049951: 8b 45 c8 mov -0x38(%ebp),%eax - 8049954: 2b 85 10 fa ff ff sub -0x5f0(%ebp),%eax - 804995a: 0f af 85 78 fd ff ff imul -0x288(%ebp),%eax - 8049961: 01 c2 add %eax,%edx - 8049963: 0f af 95 74 fd ff ff imul -0x28c(%ebp),%edx - 804996a: 01 d1 add %edx,%ecx - 804996c: 8d 0c 49 lea (%ecx,%ecx,2),%ecx - 804996f: c1 e1 03 shl $0x3,%ecx - 8049972: 03 8d b8 fd ff ff add -0x248(%ebp),%ecx - 8049978: dd 01 fldl (%ecx) - 804997a: dd 41 08 fldl 0x8(%ecx) - 804997d: d9 c1 fld %st(1) - 804997f: d8 c9 fmul %st(1),%st - 8049981: dc 85 b8 f9 ff ff faddl -0x648(%ebp) - checksum += d.D.ACC(idx).i[2] * d.D.ACC(idx).i[0]; - 8049987: dd 41 10 fldl 0x10(%ecx) - 804998a: dc cb fmul %st,%st(3) - 804998c: d9 cb fxch %st(3) - 804998e: de c1 faddp %st,%st(1) - 8049990: d9 c9 fxch %st(1) - checksum -= d.D.ACC(idx).i[1] * d.D.ACC(idx).i[2]; - 8049992: de ca fmulp %st,%st(2) - 8049994: de e1 fsubp %st,%st(1) - 8049996: dd 9d b8 f9 ff ff fstpl -0x648(%ebp) - } - -## I/O from/to binary files - -## Compilation Problem Isolator - -`icpi` - --- Main.mark - 2009-12-16 diff --git a/twiki2md/root/WebHome.md b/twiki2md/root/WebHome.md deleted file mode 100644 index 7d09bf683d0fd8839e76a34f49156f8a14322f26..0000000000000000000000000000000000000000 --- a/twiki2md/root/WebHome.md +++ /dev/null @@ -1,47 +0,0 @@ -# Foreword - -This compendium is work in progress, since we try to incorporate more -information with increasing experience and with every question you ask -us. We invite you to take part in the improvement of these pages by -correcting or adding useful information or commenting the pages. - -Ulf Markwardt - -# Contents - -- [Introduction](Introduction) -- [Access](Access), [TermsOfUse](TermsOfUse), [login](Login), [project - management](ProjectManagement), [ step-by step - examples](StepByStepTaurus) -- Our HPC Systems - - [Taurus: general purpose HPC cluster (HRSK-II)](SystemTaurus) - - [Venus: SGI Ultraviolet](SystemVenus) - - **[HPC for Data Analytics](HPCDA)** -- **[Data Management](Data Management)**, [WorkSpaces](WorkSpaces) -- [Batch Systems](Batch Systems) -- HPC Software - - [Runtime Environment](Runtime Environment) - - [Available Software](Applications) - - [Custom EasyBuild Environment](Custom EasyBuild Environment) -- [Software Development](Software Development) - - [BuildingSoftware](BuildingSoftware) - - [GPU Programming](GPU Programming) - -<!-- --> - -- [Checkpoint/Restart](CheckpointRestart) -- [Containers](Containers) -- [Further Documentation](Further Documentation) - -<!-- --> - -- [Older Hardware](Hardware) - -# News - -- 2021-05-10 GPU sub-cluster "\<a href="AlphaCentauri" - title="AlphaCentauri">AlphaCentauri\</a>" ready for production -- 2021-03-18 [HPC Introduction - - Slides](%ATTACHURL%/HPC-Introduction.pdf) -- 2021-01-20 new file system /beegfs/global0, introducing [Slurm - features](Slurmfeatures) diff --git a/twiki2md/root/WebHome/Container.md b/twiki2md/root/WebHome/Container.md deleted file mode 100644 index ddb6309aea882459160acb0cd5e88223624bc545..0000000000000000000000000000000000000000 --- a/twiki2md/root/WebHome/Container.md +++ /dev/null @@ -1,280 +0,0 @@ -# Singularity - - - -If you wish to containerize your workflow/applications, you can use -Singularity containers on Taurus. As opposed to Docker, this solution is -much more suited to being used in an HPC environment. Existing Docker -containers can easily be converted. - -Website: \<a href="<https://www.sylabs.io>" -target="\_blank"><https://www.sylabs.io>\</a>\<br />Docu: \<a -href="<https://www.sylabs.io/docs/>" -target="\_blank"><https://www.sylabs.io/docs/>\</a> - -ZIH wiki sites: - -- [Example Definitions](SingularityExampleDefinitions) -- [Building Singularity images on Taurus](VMTools) -- [Hints on Advanced usage](SingularityRecipeHints) - -It is available on Taurus without loading any module. - -## Local installation - -One advantage of containers is that you can create one on a local -machine (e.g. your laptop) and move it to the HPC system to execute it -there. This requires a local installation of singularity. The easiest -way to do so is: 1 Check if go is installed by executing \`go version\`. -If it is **not**: \<verbatim>wget -<https://storage.googleapis.com/golang/getgo/installer_linux> && chmod -+x installer_linux && ./installer_linux && source -$HOME/.bash_profile\</verbatim> 1 Follow the instructions to [install -Singularity](https://github.com/sylabs/singularity/blob/master/INSTALL.md#clone-the-repo)\<br -/>\<br /> Clone the repo\<br /> \<pre>mkdir -p -${GOPATH}/src/github.com/sylabs && cd ${GOPATH}/src/github.com/sylabs && -git clone <https://github.com/sylabs/singularity.git> && cd -singularity\</pre> Checkout the version you want (see the \<a -href="<https://github.com/sylabs/singularity/releases>" -target="\_blank">Github Releases page\</a> for available releases), -e.g.\<br /> \<pre>git checkout v3.2.1\</pre> Build and install\<br /> -\<pre>cd ${GOPATH}/src/github.com/sylabs/singularity && ./mconfig && cd -./builddir && make && sudo make install\</pre> - -## - -## Container creation - -Since creating a new container requires access to system-level tools and -thus root privileges, it is not possible for users to generate new -custom containers on Taurus directly. You can, however, import an -existing container from, e.g., Docker. - -In case you wish to create a new container, you can do so on your own -local machine where you have the necessary privileges and then simply -copy your container file to Taurus and use it there.\<br />This does not -work on our **ml** partition, as it uses Power9 as its architecture -which is different to the x86 architecture in common -computers/laptops.** For that you can use the [VM Tools](VMTools).** - -### Creating a container - -Creating a container is done by writing a definition file and passing it -to - - singularity build myContainer.sif myDefinition.def - -NOTE: This must be done on a machine (or \<a href="Cloud" -target="\_blank">VM\</a>) with root rights. - -A definition file contains a bootstrap \<a -href="<https://sylabs.io/guides/3.2/user-guide/definition_files.html#header>" -target="\_blank">header\</a> where you choose the base and \<a -href="<https://sylabs.io/guides/3.2/user-guide/definition_files.html#sections>" -target="\_blank">sections\</a> where you install your software. - -The most common approach is to start from an existing docker image from -DockerHub. For example, to start from an \<a -href="<https://hub.docker.com/_/ubuntu>" target="\_blank">ubuntu -image\</a> copy the following into a new file called ubuntu.def (or any -other filename of your choosing) - - Bootstrap: docker<br />From: ubuntu:trusty<br /><br />%runscript<br /> echo "This is what happens when you run the container..."<br /><br />%post<br /> apt-get install g++ - -Then you can call: - - singularity build ubuntu.sif ubuntu.def - -And it will install Ubuntu with g++ inside your container, according to -your definition file. - -More bootstrap options are available. The following example, for -instance, bootstraps a basic CentOS 7 image. - - BootStrap: yum - OSVersion: 7 - MirrorURL: http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/os/$basearch/ - Include: yum - - %runscript - echo "This is what happens when you run the container..." - - %post - echo "Hello from inside the container" - yum -y install vim-minimal - -More examples of definition files can be found at -<https://github.com/singularityware/singularity/tree/master/examples> - -### Importing a docker container - -You can import an image directly from the Docker repository (Docker -Hub): - - singularity build my-container.sif docker://ubuntu:latest - -As opposed to bootstrapping a container, importing from Docker does -**not require root privileges** and therefore works on Taurus directly. - -Creating a singularity container directly from a local docker image is -possible but not recommended. Steps: - - # Start a docker registry - $ docker run -d -p 5000:5000 --restart=always --name registry registry:2 - - # Push local docker container to it - $ docker tag alpine localhost:5000/alpine - $ docker push localhost:5000/alpine - - # Create def file for singularity like this... - $ cat example.def - Bootstrap: docker - Registry: <a href="http://localhost:5000" rel="nofollow" target="_blank">http://localhost:5000</a> - From: alpine - - # Build singularity container - $ singularity build --nohttps alpine.sif example.def - -### Starting from a Dockerfile - -As singularity definition files and Dockerfiles are very similar you can -start creating a definition file from an existing Dockerfile by -"translating" each section. - -There are tools to automate this. One of them is \<a -href="<https://github.com/singularityhub/singularity-cli>" -target="\_blank">spython\</a> which can be installed with \`pip\` (add -\`--user\` if you don't want to install it system-wide): - -`pip3 install -U spython` - -With this you can simply issue the following command to convert a -Dockerfile in the current folder into a singularity definition file: - -`spython recipe Dockerfile myDefinition.def<br />` - -Now please **verify** your generated defintion and adjust where -required! - -There are some notable changes between singularity definitions and -Dockerfiles: 1 Command chains in Dockerfiles (\`apt-get update && -apt-get install foo\`) must be split into separate commands (\`apt-get -update; apt-get install foo). Otherwise a failing command before the -ampersand is considered "checked" and does not fail the build. 1 The -environment variables section in Singularity is only set on execution of -the final image, not during the build as with Docker. So \`*ENV*\` -sections from Docker must be translated to an entry in the -*%environment* section and **additionally** set in the *%runscript* -section if the variable is used there. 1 \`*VOLUME*\` sections from -Docker cannot be represented in Singularity containers. Use the runtime -option \`-B\` to bind folders manually. 1 *\`CMD\`* and *\`ENTRYPOINT\`* -from Docker do not have a direct representation in Singularity. The -closest is to check if any arguments are given in the *%runscript* -section and call the command from \`*ENTRYPOINT*\` with those, if none -are given call \`*ENTRYPOINT*\` with the arguments of \`*CMD*\`: -\<verbatim>if \[ $# -gt 0 \]; then \<ENTRYPOINT> "$@" else \<ENTRYPOINT> -\<CMD> fi\</verbatim> - -## Using the containers - -### Entering a shell in your container - -A read-only shell can be entered as follows: - - singularity shell my-container.sif - -**IMPORTANT:** In contrast to, for instance, Docker, this will mount -various folders from the host system including $HOME. This may lead to -problems with, e.g., Python that stores local packages in the home -folder, which may not work inside the container. It also makes -reproducibility harder. It is therefore recommended to use -\`--contain/-c\` to not bind $HOME (and others like /tmp) automatically -and instead set up your binds manually via \`-B\` parameter. Example: - - singularity shell --contain -B /scratch,/my/folder-on-host:/folder-in-container my-container.sif - -You can write into those folders by default. If this is not desired, add -an \`:ro\` for read-only to the bind specification (e.g. \`-B -/scratch:/scratch:ro\`).\<br />Note that we already defined bind paths -for /scratch, /projects and /sw in our global singularity.conf, so you -needn't use the -B parameter for those. - -If you wish, for instance, to install additional packages, you have to -use the `-w` parameter to enter your container with it being writable. -This, again, must be done on a system where you have the necessary -privileges, otherwise you can only edit files that your user has the -permissions for. E.g: - - singularity shell -w my-container.sif - Singularity.my-container.sif> yum install htop - -The -w parameter should only be used to make permanent changes to your -container, not for your productive runs (it can only be used writeable -by one user at the same time). You should write your output to the usual -Taurus file systems like /scratch.Launching applications in your -container - -### Running a command inside the container - -While the "shell" command can be useful for tests and setup, you can -also launch your applications inside the container directly using -"exec": - - singularity exec my-container.img /opt/myapplication/bin/run_myapp - -This can be useful if you wish to create a wrapper script that -transparently calls a containerized application for you. E.g.: - - #!/bin/bash - - X=`which singularity 2>/dev/null` - if [ "z$X" = "z" ] ; then - echo "Singularity not found. Is the module loaded?" - exit 1 - fi - - singularity exec /scratch/p_myproject/my-container.sif /opt/myapplication/run_myapp "$@" - The better approach for that however is to use `singularity run` for that, which executes whatever was set in the _%runscript_ section of the definition file with the arguments you pass to it. - Example: - Build the following definition file into an image: - Bootstrap: docker - From: ubuntu:trusty - - %post - apt-get install -y g++ - echo '#include <iostream>' > main.cpp - echo 'int main(int argc, char** argv){ std::cout << argc << " args for " << argv[0] << std::endl; }' >> main.cpp - g++ main.cpp -o myCoolApp - mv myCoolApp /usr/local/bin/myCoolApp - - %runscript - myCoolApp "$@ - singularity build my-container.sif example.def - -Then you can run your application via - - singularity run my-container.sif first_arg 2nd_arg - -Alternatively you can execute the container directly which is -equivalent: - - ./my-container.sif first_arg 2nd_arg - -With this you can even masquerade an application with a singularity -container as if it was an actual program by naming the container just -like the binary: - - mv my-container.sif myCoolAp - -### Use-cases - -One common use-case for containers is that you need an operating system -with a newer GLIBC version than what is available on Taurus. E.g., the -bullx Linux on Taurus used to be based on RHEL6 having a rather dated -GLIBC version 2.12, some binary-distributed applications didn't work on -that anymore. You can use one of our pre-made CentOS 7 container images -(`/scratch/singularity/centos7.img`) to circumvent this problem. -Example: - - $ singularity exec /scratch/singularity/centos7.img ldd --version - ldd (GNU libc) 2.17 diff --git a/twiki2md/root/WebHome/DataManagement.md b/twiki2md/root/WebHome/DataManagement.md deleted file mode 100644 index de1f952975243963821797089d69bfe888029070..0000000000000000000000000000000000000000 --- a/twiki2md/root/WebHome/DataManagement.md +++ /dev/null @@ -1,16 +0,0 @@ -# HPC Data Management - -To efficiently handle different types of storage systems, please design -your data workflow according to characteristics, like I/O footprint -(bandwidth/IOPS) of the application, size of the data, (number of -files,) and duration of the storage. In general, the mechanisms of -so-called** [Workspaces](WorkSpaces)** are compulsory for all HPC users -to store data for a defined duration - depending on the requirements and -the storage system this time span might range from days to a few years. - -- [HPC file systems](FileSystems) -- [Intermediate Archive](IntermediateArchive) -- [Special data containers](Special data containers) -- [Move data between file systems](DataMover) -- [Move data to/from ZIH's file systems](ExportNodes) -- [Longterm Preservation for Research Data](PreservationResearchData) diff --git a/twiki2md/root/WebHome/Impressum.md b/twiki2md/root/WebHome/Impressum.md deleted file mode 100644 index 9c34f3d81fc15e76a28a506a3a7bdefc357e4715..0000000000000000000000000000000000000000 --- a/twiki2md/root/WebHome/Impressum.md +++ /dev/null @@ -1,14 +0,0 @@ -Es gilt das \<a href="<https://tu-dresden.de/impressum>" rel="nofollow" -title="<https://tu-dresden.de/impressum>">Impressum der TU Dresden\</a> -mit folgenden nderungen: - -**Ansprechpartner/Betreiber:** - -Technische Universitt Dresden\<br />Zentrum fr Informationsdienste und -Hochleistungsrechnen\<br />01062 Dresden\<br />\<br />Tel.: +49 351 -463-40000\<br />Fax: +49 351 463-42328\<br />E-Mail: -<servicedesk@tu-dresden.de>\<br />\<br />**Konzeption, Technische -Umsetzung, Anbieter:**\<br />\<br />Technische Universitt Dresden\<br -/>Zentrum fr Informationsdienste und Hochleistungsrechnen\<br />Prof. -Dr. Wolfgang E. Nagel\<br />01062 Dresden\<br />\<br />Tel.: +49 351 -463-35450\<br />Fax: +49 351 463-37773\<br />E-Mail: <zih@tu-dresden.de> diff --git a/twiki2md/root/WebHome/Test.md b/twiki2md/root/WebHome/Test.md deleted file mode 100644 index a1b5589b3a8f2727efdab33a7265c35b5427552e..0000000000000000000000000000000000000000 --- a/twiki2md/root/WebHome/Test.md +++ /dev/null @@ -1,2 +0,0 @@ -<span class="twiki-macro TREE" web="Compendium" -formatting="ullist"></span> -- Main.MatthiasKraeusslein - 2021-05-10