From 7eb7f79ee1c6a4054f54db77c31da8337ed2b720 Mon Sep 17 00:00:00 2001 From: Martin Schroschk <martin.schroschk@tu-dresden.de> Date: Thu, 1 Jul 2021 13:29:27 +0200 Subject: [PATCH] Archive: fix checks and remove useless pages --- .../docs/archive/AnnouncementOfQuotas.md | 32 +-- .../docs/archive/DebuggingTools.md | 30 +-- .../docs/archive/Hardware.md | 18 +- .../docs/archive/HardwareDeimos.md | 50 ++-- .../docs/archive/HardwarePhobos.md | 37 ++- .../docs/archive/HardwareTriton.md | 25 +- .../docs/archive/HardwareVenus.md | 2 - .../docs/archive/Introduction.md | 18 -- .../docs/archive/KnlNodes.md | 40 +-- .../docs/archive/MigrateToAtlas.md | 32 +-- .../docs/archive/PlatformLSF.md | 230 ++++++++---------- .../docs/archive/StepByStepTaurus.md | 10 - .../docs/archive/SystemVenus.md | 81 +++--- .../docs/archive/TaurusII.md | 15 +- .../docs/archive/UNICORERestAPI.md | 4 +- .../docs/archive/VampirTrace.md | 12 +- .../docs/archive/VenusOpen.md | 9 - 17 files changed, 282 insertions(+), 363 deletions(-) delete mode 100644 doc.zih.tu-dresden.de/docs/archive/Introduction.md delete mode 100644 doc.zih.tu-dresden.de/docs/archive/StepByStepTaurus.md delete mode 100644 doc.zih.tu-dresden.de/docs/archive/VenusOpen.md diff --git a/doc.zih.tu-dresden.de/docs/archive/AnnouncementOfQuotas.md b/doc.zih.tu-dresden.de/docs/archive/AnnouncementOfQuotas.md index bdae4d0ed..bc04e86de 100644 --- a/doc.zih.tu-dresden.de/docs/archive/AnnouncementOfQuotas.md +++ b/doc.zih.tu-dresden.de/docs/archive/AnnouncementOfQuotas.md @@ -1,20 +1,16 @@ # Quotas for the home file system -The quotas of the home file system are meant to help the users to keep -in touch with their data. Especially in HPC, millions of temporary files -can be created within hours. We have identified this as a main reason -for performance degradation of the HOME file system. To stay in -operation with out HPC systems we regrettably have to fall back to this -unpopular technique. - -Based on a balance between the allotted disk space and the usage over -the time, reasonable quotas (mostly above current used space) for the -projects have been defined. The will be activated by the end of April -2012. - -If a project exceeds its quota (total size OR total number of files) it -cannot submit jobs into the batch system. Running jobs are not affected. -The following commands can be used for monitoring: +The quotas of the home file system are meant to help the users to keep in touch with their data. +Especially in HPC, millions of temporary files can be created within hours. We have identified this +as a main reason for performance degradation of the HOME file system. To stay in operation with out +HPC systems we regrettably have to fall back to this unpopular technique. + +Based on a balance between the allotted disk space and the usage over the time, reasonable quotas +(mostly above current used space) for the projects have been defined. The will be activated by the +end of April 2012. + +If a project exceeds its quota (total size OR total number of files) it cannot submit jobs into the +batch system. Running jobs are not affected. The following commands can be used for monitoring: - `quota -s -g` shows the file system usage of all groups the user is a member of. @@ -37,9 +33,9 @@ In case a project is above its limits, please - for later use (weeks...months) at the HPC systems, build tar archives with meaningful names or IDs and store them in the [DMF system](#AnchorDataMigration). Avoid using this system - (`/hpc_fastfs`) for files \< 1 MB! + (`/hpc_fastfs`) for files < 1 MB! - refer to the hints for - [long term preservation for research data](PreservationResearchData.md). + [long term preservation for research data](../data_management/PreservationResearchData.md). ## No Alternatives @@ -58,5 +54,3 @@ The current situation is this: `/hpc_fastfs`. In case of problems don't hesitate to ask for support. - -Ulf Markwardt, Claudia Schmidt diff --git a/doc.zih.tu-dresden.de/docs/archive/DebuggingTools.md b/doc.zih.tu-dresden.de/docs/archive/DebuggingTools.md index 30b7ae112..0d902d2cf 100644 --- a/doc.zih.tu-dresden.de/docs/archive/DebuggingTools.md +++ b/doc.zih.tu-dresden.de/docs/archive/DebuggingTools.md @@ -1,20 +1,14 @@ -Debugging is an essential but also rather time consuming step during -application development. Tools dramatically reduce the amount of time -spent to detect errors. Besides the "classical" serial programming -errors, which may usually be easily detected with a regular debugger, -there exist programming errors that result from the usage of OpenMP, -Pthreads, or MPI. These errors may also be detected with debuggers -(preferably debuggers with support for parallel applications), however, -specialized tools like MPI checking tools (e.g. Marmot) or thread -checking tools (e.g. Intel Thread Checker) can simplify this task. The -following sections provide detailed information about the different -types of debugging tools: +# Debugging Tools -- [Debuggers](Debuggers) -- debuggers (with and without support for - parallel applications) -- [MPI Usage Error Detection](MPI Usage Error Detection) -- tools to - detect MPI usage errors -- [Thread Checking](Thread Checking) -- tools to detect OpenMP/Pthread - usage errors +Debugging is an essential but also rather time consuming step during application development. Tools +dramatically reduce the amount of time spent to detect errors. Besides the "classical" serial +programming errors, which may usually be easily detected with a regular debugger, there exist +programming errors that result from the usage of OpenMP, Pthreads, or MPI. These errors may also be +detected with debuggers (preferably debuggers with support for parallel applications), however, +specialized tools like MPI checking tools (e.g. Marmot) or thread checking tools (e.g. Intel Thread +Checker) can simplify this task. The following sections provide detailed information about the +different types of debugging tools: --- Main.hilbrich - 2009-12-21 +- [Debuggers] **todo** Debuggers -- debuggers (with and without support for parallel applications) +- [MPI Usage Error Detection] **todo** MPI Usage Error Detection -- tools to detect MPI usage errors +- [Thread Checking] **todo** Thread Checking -- tools to detect OpenMP/Pthread usage errors diff --git a/doc.zih.tu-dresden.de/docs/archive/Hardware.md b/doc.zih.tu-dresden.de/docs/archive/Hardware.md index bac841daf..449a2cf64 100644 --- a/doc.zih.tu-dresden.de/docs/archive/Hardware.md +++ b/doc.zih.tu-dresden.de/docs/archive/Hardware.md @@ -1,17 +1,17 @@ # Hardware -Here, you can find basic information about the hardware installed at -ZIH. We try to keep this list up-to-date. +Here, you can find basic information about the hardware installed at ZIH. We try to keep this list +up-to-date. -- [BULL HPC-Cluster Taurus](HardwareTaurus) -- [SGI Ultraviolet (UV)](HardwareVenus) +- [BULL HPC-Cluster Taurus](TaurusII.md) +- [SGI Ultraviolet (UV)](HardwareVenus.md) Hardware hosted by ZIH: Former systems -- [PC-Farm Deimos](HardwareDeimos) -- [SGI Altix](HardwareAltix) -- [PC-Farm Atlas](HardwareAtlas) -- [PC-Cluster Triton](HardwareTriton) -- [HPC-Windows-Cluster Titan](HardwareTitan) +- [PC-Farm Deimos](HardwareDeimos.md) +- [SGI Altix](HardwareAltix.md) +- [PC-Farm Atlas](HardwareAtlas.md) +- [PC-Cluster Triton](HardwareTriton.md) +- [HPC-Windows-Cluster Titan](HardwareTitan.md) diff --git a/doc.zih.tu-dresden.de/docs/archive/HardwareDeimos.md b/doc.zih.tu-dresden.de/docs/archive/HardwareDeimos.md index 643fab9f4..81a69258c 100644 --- a/doc.zih.tu-dresden.de/docs/archive/HardwareDeimos.md +++ b/doc.zih.tu-dresden.de/docs/archive/HardwareDeimos.md @@ -1,5 +1,3 @@ - - # Linux Networx PC-Farm Deimos The PC farm `Deimos` is a heterogenous cluster based on dual core AMD @@ -7,36 +5,38 @@ Opteron CPUs. The nodes are operated by the Linux operating system SuSE SLES 10 with a 2.6 kernel. Currently, the following hardware is installed: -\|CPUs \|AMD Opteron X85 dual core \| \|RAM per core \|2 GB \| \|Number -of cores \|2584 \| \|total peak performance \|13.4 TFLOPS \| \|single -chip nodes \|384 \| \|dual nodes \|230 \| \|quad nodes \|88 \| \|quad -nodes (32 GB RAM) \|24 \| - -\<P> All nodes share a 68 TB [file -system](RuntimeEnvironment#Filesystem) on DDN hardware. Each node has -per core 40 GB local disk space for scratch mounted on `/tmp` . The jobs -for the compute nodes are scheduled by the [Platform LSF](Platform LSF) +|CPUs |AMD Opteron X85 dual core | +|RAM per core |2 GB | +|Number of cores |2584 | +|total peak performance |13.4 TFLOPS | +|single chip nodes |384 | +|dual nodes |230 | +|quad nodes |88 | +|quad nodes (32 GB RAM) |24 | + +All nodes share a 68 TB on DDN hardware. Each node has per core 40 GB local disk space for scratch +mounted on `/tmp` . The jobs for the compute nodes are scheduled by the +[Platform LSF](PlatformLSF.md) batch system from the login nodes `deimos.hrsk.tu-dresden.de` . -Two separate Infiniband networks (10 Gb/s) with low cascading switches -provide the communication and I/O infrastructure for low latency / high -throughput data traffic. An additional gigabit Ethernet network is used -for control and service purposes. +Two separate Infiniband networks (10 Gb/s) with low cascading switches provide the communication and +I/O infrastructure for low latency / high throughput data traffic. An additional gigabit Ethernet +network is used for control and service purposes. -Users with a login on the [SGI Altix](HardwareAltix) can access their -home directory via NFS below the mount point `/hpc_work`. +Users with a login on the [SGI Altix](HardwareAltix.md) can access their home directory via NFS +below the mount point `/hpc_work`. ## CPU The cluster is based on dual-core AMD Opteron X85 processor. One core has the following basic properties: -\|clock rate \|2.6 GHz \| \|floating point units \|2 \| \|peak -performance \|5.2 GFLOPS \| \|L1 cache \|2x64 kB \| \|L2 cache \|1 MB \| -\|memory bus \|128 bit x 200 MHz \| - -The CPU belongs to the x86_64 family. Since it is fully capable of -running x86-code, one should compare the performances of the 32 and 64 -bit versions of the same code. +|clock rate |2.6 GHz | +|floating point units |2 | +|peak performance |5.2 GFLOPS | +|L1 cache |2x64 kB | +|L2 cache |1 MB | +|memory bus |128 bit x 200 MHz | -<span class="twiki-macro COMMENT"></span> +The CPU belongs to the x86_64 family. Since it is fully capable of running x86-code, one should +compare the performances of the 32 and 64 bit versions of the same code. diff --git a/doc.zih.tu-dresden.de/docs/archive/HardwarePhobos.md b/doc.zih.tu-dresden.de/docs/archive/HardwarePhobos.md index 3221dc590..774c9507c 100644 --- a/doc.zih.tu-dresden.de/docs/archive/HardwarePhobos.md +++ b/doc.zih.tu-dresden.de/docs/archive/HardwarePhobos.md @@ -1,38 +1,37 @@ - - # Linux Networx PC-Cluster Phobos -------- **Phobos was shut down on 1 November 2010.** ------- +**Phobos was shut down on 1 November 2010.** `Phobos` is a cluster based on AMD Opteron CPUs. The nodes are operated by the Linux operating system SuSE SLES 9 with a 2.6 kernel. Currently, the following hardware is installed: -\|CPUs \|AMD Opteron 248 (single core) \| \|total peak performance -\|563.2 GFLOPS \| \|Number of nodes \|64 compute + 1 master \| \|CPUs -per node \|2 \| \|RAM per node \|4 GB \| +|CPUs \|AMD Opteron 248 (single core) | +|total peak performance |563.2 GFLOPS | +|Number of nodes |64 compute + 1 master | +|CPUs per node |2 | +|RAM per node |4 GB | -\<P> All nodes share a 4.4 TB SAN [file system](FileSystems). Each node -has additional local disk space mounted on `/scratch`. The jobs for the -compute nodes are scheduled by a [Platform LSF](Platform LSF) batch -system running on the login node `phobos.hrsk.tu-dresden.de`. +All nodes share a 4.4 TB SAN. Each node has additional local disk space mounted on `/scratch`. The jobs for the +compute nodes are scheduled by a [Platform LSF](PlatformLSF.md) batch system running on the login +node `phobos.hrsk.tu-dresden.de`. -Two separate Infiniband networks (10 Gb/s) with low cascading switches -provide the infrastructure for low latency / high throughput data -traffic. An additional GB/Ethernetwork is used for control and service -purposes. +Two separate Infiniband networks (10 Gb/s) with low cascading switches provide the infrastructure +for low latency / high throughput data traffic. An additional GB/Ethernetwork is used for control +and service purposes. ## CPU `Phobos` is based on single-core AMD Opteron 248 processor. It has the following basic properties: -\|clock rate \|2.2 GHz \| \|floating point units \|2 \| \|peak -performance \|4.4 GFLOPS \| \|L1 cache \|2x64 kB \| \|L2 cache \|1 MB \| -\|memory bus \|128 bit x 200 MHz \| +|clock rate |2.2 GHz | +|floating point units |2 | +|peak performance |4.4 GFLOPS | +|L1 cache |2x64 kB | +|L2 cache |1 MB | +|memory bus |128 bit x 200 MHz | The CPU belongs to the x86_64 family. Although it is fully capable of running x86-code, one should always try to use 64-bit programs due to their potentially higher performance. - -<span class="twiki-macro COMMENT"></span> diff --git a/doc.zih.tu-dresden.de/docs/archive/HardwareTriton.md b/doc.zih.tu-dresden.de/docs/archive/HardwareTriton.md index ce88271b9..17fd54449 100644 --- a/doc.zih.tu-dresden.de/docs/archive/HardwareTriton.md +++ b/doc.zih.tu-dresden.de/docs/archive/HardwareTriton.md @@ -6,23 +6,28 @@ is a cluster based on quadcore Intel Xeon CPUs. The nodes are operated by the Linux operating system SuSE SLES 11. Currently, the following hardware is installed: -\|CPUs \|Intel quadcore E5530 \| \|RAM per core \|6 GB \| \|Number of -cores \|512 \| \|total peak performance \|4.9 TFLOPS \| \|dual nodes -\|64 \| +|CPUs |Intel quadcore E5530 | +|RAM per core |6 GB | +|Number of cores |512 | +|total peak performance |4.9 TFLOPS | +|dual nodes |64 | -The jobs for the compute nodes are scheduled by the -[LoadLeveler](LoadLeveler) batch system from the login node -triton.hrsk.tu-dresden.de . +The jobs for the compute nodes are scheduled by the [LoadLeveler](LoadLeveler.md) batch system from +the login node triton.hrsk.tu-dresden.de . ## CPU The cluster is based on dual-core Intel Xeon E5530 processor. One core has the following basic properties: -\|clock rate \|2.4 GHz \| \|Cores \|4 \| \|Threads \|8 \| \|Intel Smart -Cache \|8MB \| \|Intel QPI Speed \|5.86 GT/s \| \|Max TDP \|80 W \| +|clock rate |2.4 GHz | +|Cores |4 | +|Threads |8 | +|Intel Smart Cache |8MB | +|Intel QPI Speed |5.86 GT/s | +|Max TDP |80 W | -# Software +### Software | Compilers | Version | |:--------------------------------|---------------:| @@ -45,4 +50,4 @@ Cache \|8MB \| \|Intel QPI Speed \|5.86 GT/s \| \|Max TDP \|80 W \| | NAMD | 2.7b1 | | QuantumEspresso | 4.1.3 | | **Tools** | | -| [Totalview Debugger](Debuggers) | 8.8 | +| [Totalview Debugger] **todo** debuggers | 8.8 | diff --git a/doc.zih.tu-dresden.de/docs/archive/HardwareVenus.md b/doc.zih.tu-dresden.de/docs/archive/HardwareVenus.md index 00c6046dc..be90985ea 100644 --- a/doc.zih.tu-dresden.de/docs/archive/HardwareVenus.md +++ b/doc.zih.tu-dresden.de/docs/archive/HardwareVenus.md @@ -18,5 +18,3 @@ additional hardware hyperthreads. Venus uses the same HOME file system as all our other HPC installations. For computations, please use `/scratch`. - -... [More information on file systems](FileSystems) diff --git a/doc.zih.tu-dresden.de/docs/archive/Introduction.md b/doc.zih.tu-dresden.de/docs/archive/Introduction.md deleted file mode 100644 index ae6de5f86..000000000 --- a/doc.zih.tu-dresden.de/docs/archive/Introduction.md +++ /dev/null @@ -1,18 +0,0 @@ -# Introduction - -The Center for Information Services and High Performance Computing (ZIH) -is a central scientific unit of TU Dresden with a strong competence in -parallel computing and software tools. We have a strong commitment to -support *real users*, collaborating to create new algorithms, -applications and to tackle the problems that need to be solved to create -new scientific insight with computational methods. Our compute complex -"Hochleistungs-Rechner-/-Speicher-Komplex" (HRSK) is focused on -data-intensive computing. High scalability, big memory and fast -I/O-systems are the outstanding properties of this project, aside from -the significant performance increase. The infrastructure is provided not -only to TU Dresden but to all universities and public research -institutes in Saxony. - -\<img alt="" -src="<http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/hpc/bilder/hpc_hardware07>" -title="HRSK overview" /> diff --git a/doc.zih.tu-dresden.de/docs/archive/KnlNodes.md b/doc.zih.tu-dresden.de/docs/archive/KnlNodes.md index f779a68bc..78e4cabc7 100644 --- a/doc.zih.tu-dresden.de/docs/archive/KnlNodes.md +++ b/doc.zih.tu-dresden.de/docs/archive/KnlNodes.md @@ -1,25 +1,29 @@ -# Intel Xeon Phi (Knights Landing) %RED%- Out of Service<span class="twiki-macro ENDCOLOR"></span> +# Intel Xeon Phi (Knights Landing) + +Xeon Phi nodes are **Out of Service**! The nodes `taurusknl[1-32]` are equipped with -- Intel Xeon Phi procesors: 64 cores Intel Xeon Phi 7210 (1,3 GHz) -- 96 GB RAM DDR4 -- 16 GB MCDRAM -- /scratch, /lustre/ssd, /projects, /home are mounted +- Intel Xeon Phi procesors: 64 cores Intel Xeon Phi 7210 (1,3 GHz) +- 96 GB RAM DDR4 +- 16 GB MCDRAM +- `/scratch`, `/lustre/ssd`, `/projects`, `/home` are mounted Benchmarks, so far (single node): -- HPL (Linpack): 1863.74 GFlops -- SGEMM (single precision) MKL: 4314 GFlops -- Stream (only 1.4 GiB memory used): 431 GB/s +- HPL (Linpack): 1863.74 GFlops +- SGEMM (single precision) MKL: 4314 GFlops +- Stream (only 1.4 GiB memory used): 431 GB/s Each of them can run 4 threads, so one can start a job here with e.g. - srun -p knl -N 1 --mem=90000 -n 1 -c 64 a.out +```Bash +srun -p knl -N 1 --mem=90000 -n 1 -c 64 a.out +``` In order to get their optimal performance please re-compile your code with the most recent Intel compiler and explicitely set the compiler -flag **`-xMIC-AVX512`**. +flag `-xMIC-AVX512`. MPI works now, we recommend to use the latest Intel MPI version (intelmpi/2017.1.132). To utilize the OmniPath Fabric properly, make @@ -33,23 +37,21 @@ request): | Nodes | Cluster Mode | Memory Mode | |:-------------------|:-------------|:------------| -| taurusknl\[1-28\] | Quadrant | Cache | -| taurusknl29 | Quadrant | Flat | -| taurusknl\[30-32\] | SNC4 | Flat | +| `taurusknl[1-28]` | Quadrant | Cache | +| `taurusknl29` | Quadrant | Flat | +| `taurusknl[30-32]` | SNC4 | Flat | They have SLURM features set, so that you can request them specifically -by using the SLURM parameter **--constraint** where multiple values can -be linked with the & operator, e.g. **--constraint="SNC4&Flat"**. If you +by using the SLURM parameter `--constraint` where multiple values can +be linked with the & operator, e.g. `--constraint="SNC4&Flat"`. If you don't set a constraint, your job will run preferably on the nodes with Quadrant+Cache. Note that your performance might take a hit if your code is not NUMA-aware and does not make use of the Flat memory mode while running on the nodes that have those modes set, so you might want to use ---constraint="Quadrant&Cache" in such a case to ensure your job does not +`--constraint="Quadrant&Cache"` in such a case to ensure your job does not run on an unfavorable node (which might happen if all the others are already allocated). -\<a -href="<http://www.prace-ri.eu/best-practice-guide-knights-landing-january-2017/>" -title="Knl Best Practice Guide">KNL Best Practice Guide\</a> from PRACE +[Knl Best Practice Guide](https://prace-ri.eu/training-support/best-practice-guides/best-practice-guide-knights-landing/) diff --git a/doc.zih.tu-dresden.de/docs/archive/MigrateToAtlas.md b/doc.zih.tu-dresden.de/docs/archive/MigrateToAtlas.md index fa53026bc..688f390e8 100644 --- a/doc.zih.tu-dresden.de/docs/archive/MigrateToAtlas.md +++ b/doc.zih.tu-dresden.de/docs/archive/MigrateToAtlas.md @@ -1,7 +1,6 @@ # Migration to Atlas - Atlas is a different machine than -Deimos, please have a look at the table: +Atlas is a different machine than Deimos, please have a look at the table: | | | | |---------------------------------------------------|------------|-----------| @@ -19,11 +18,7 @@ codenamed "Bulldozer" is designed for multi-threaded use. We have grouped the module definitions for a better overview. This is only for displaying the available modules, not for loading a module. All -available modules can be made visible with `module load ALL; module av` -. For more details, please see [module -groups.](RuntimeEnvironment#Module_Groups) - -#BatchSystem +available modules can be made visible with `module load ALL; module av`. ## Batch System @@ -58,8 +53,7 @@ nodes you have to be more precise in your resource requests. - In ninety nine percent of the cases it is enough when you specify your processor requirements with `-n <n>` and your memory requirements with `-M <memory per process in MByte>`. -- Please use \<span class="WYSIWYG_TT">-x\</span>("exclusive use of a - hosts") only with care and when you really need it. +- Please use `-x` ("exclusive use of a hosts") only with care and when you really need it. - The option `-x` in combination with `-n 1` leads to an "efficiency" of only 1.5% - in contrast with 50% on the single socket nodes at Deimos. @@ -69,15 +63,14 @@ nodes you have to be more precise in your resource requests. - Please use `-M <memory per process in MByte>` to specify your memory requirements per process. - Please don't use `-R "span[hosts=1]"` or `-R "span[ptile=<n>]"` or - any other \<span class="WYSIWYG_TT">-R "..."\</span>option, the - batch system is smart enough to select the best hosts in accordance + any other `-R "..."` option, the batch system is smart enough to select the best hosts in accordance with your processor and memory requirements. - Jobs with a processor requirement ≤ 64 will always be scheduled on one node. - Larger jobs will use just as many hosts as needed, e.g. 160 processes will be scheduled on three hosts. -For more details, please see the pages on [LSF](PlatformLSF). +For more details, please see the pages on [LSF](PlatformLSF.md). ## Software @@ -95,21 +88,18 @@ degradation. Please include "Atlas" in your subject. ### Development -From the benchmarking point of view, the best compiler for the AMD -Bulldozer processor, the best compiler comes from the Open64 suite. For -convenience, other compilers are installed, Intel 12.1 shows good -results as well. Please check the best compiler flags at [this -overview](http://developer.amd.com/Assets/CompilerOptQuickRef-62004200.pdf). -For best performance, please use [ACML](Libraries#ACML) as BLAS/LAPACK -library. +From the benchmarking point of view, the best compiler for the AMD Bulldozer processor, the best +compiler comes from the Open64 suite. For convenience, other compilers are installed, Intel 12.1 +shows good results as well. Please check the best compiler flags at +[this overview] developer.amd.com/Assets/CompilerOptQuickRef-62004200.pdf. ### MPI parallel applications Please note the more convenient syntax on Atlas. Therefore, please use a command like +```Bash bsub -W 2:00 -M 200 -n 8 mpirun a.out +``` to submit your MPI parallel applications. - -- Set DENYTOPICVIEW = WikiGuest diff --git a/doc.zih.tu-dresden.de/docs/archive/PlatformLSF.md b/doc.zih.tu-dresden.de/docs/archive/PlatformLSF.md index 56a86433a..699db5c9b 100644 --- a/doc.zih.tu-dresden.de/docs/archive/PlatformLSF.md +++ b/doc.zih.tu-dresden.de/docs/archive/PlatformLSF.md @@ -1,9 +1,8 @@ # Platform LSF -**`%RED%This Page is deprecated! The current bachsystem on Taurus and Venus is [[Compendium.Slurm][Slurm]]!%ENDCOLOR%`** +**This Page is deprecated!** The current bachsystem on Taurus is [Slurm][../jobs/Slurm.md] - The HRSK-I systems are operated -with the batch system LSF running on *Mars*, *Atlas* resp.. +The HRSK-I systems are operated with the batch system LSF running on *Mars*, *Atlas* resp.. ## Job Submission @@ -14,40 +13,44 @@ Some options of `bsub` are shown in the following table: | bsub option | Description | |:-------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| -n \<N> | set number of processors (cores) to N(default=1) | -| -W \<hh:mm> | set maximum wall clock time to \<hh:mm> | -| -J \<name> | assigns the specified name to the job | -| -eo \<errfile> | writes the standard error output of the job to the specified file (overwriting) | -| -o \<outfile> | appends the standard output of the job to the specified file | -| -R span\[hosts=1\] | use only one SMP node (automatically set by the batch system) | -| -R span\[ptile=2\] | run 2 tasks per node | -| -x | disable other jobs to share the node ( Atlas ). | -| -m | specify hosts to run on ( [see below](#HostList)) | -| -M \<M> | specify per-process (per-core) memory limit (in MB), the job's memory limit is derived from that number (N proc \* M MB); see examples and [Attn. #2](#AttentionNo2) below | -| -P \<project> | specifiy project | +| `-n \<N> ` | set number of processors (cores) to N(default=1) | +| `-W \<hh:mm> ` | set maximum wall clock time to `<hh:mm>` | +| `-J \<name> ` | assigns the specified name to the job | +| `-eo \<errfile> ` | writes the standard error output of the job to the specified file (overwriting) | +| `-o \<outfile> ` | appends the standard output of the job to the specified file | +| `-R span\[hosts=1\]` | use only one SMP node (automatically set by the batch system) | +| `-R span\[ptile=2\]` | run 2 tasks per node | +| `-x ` | disable other jobs to share the node ( Atlas ). | +| `-m ` | specify hosts to run on ( [see below](#HostList)) | +| `-M \<M> ` | specify per-process (per-core) memory limit (in MB), the job's memory limit is derived from that number (N proc \* M MB); see examples and [Attn. #2](#AttentionNo2) below | +| `-P \<project> ` | specifiy project | You can use the `%J` -macro to merge the job ID into names. It might be more convenient to put the options directly in a job file which you can submit using - bsub < my_jobfile +```Bash +bsub <my_jobfile> +``` The following example job file shows how you can make use of it: - #!/bin/bash - #BSUB -J my_job # the job's name - #BSUB -W 4:00 # max. wall clock time 4h - #BSUB -R "span[hosts=1]" # run on a single node - #BSUB -n 4 # number of processors - #BSUB -M 500 # 500MB per core memory limit - #BSUB -o out.%J # output file - #BSUB -u name@tu-dresden.de # email address; works ONLY with @tu-dresden.de - - echo Starting Program - cd $HOME/work - a.out # e.g. an OpenMP program - echo Finished Program +```Bash +#!/bin/bash +#BSUB -J my_job # the job's name +#BSUB -W 4:00 # max. wall clock time 4h +#BSUB -R "span[hosts=1]" # run on a single node +#BSUB -n 4 # number of processors +#BSUB -M 500 # 500MB per core memory limit +#BSUB -o out.%J # output file +#BSUB -u name@tu-dresden.de # email address; works ONLY with @tu-dresden.de + +echo Starting Program +cd $HOME/work +a.out # e.g. an OpenMP program +echo Finished Program +``` **Understanding memory limits** The option -M to bsub defines how much memory may be consumed by a single process of the job. The job memory @@ -58,51 +61,31 @@ memory limit of 2400 MB. If any one of your processes consumes more than than 2400 MB of memory in sum, then the job will be killed by LSF. - For serial programs, the given limit is the same for the process and - the whole job, e.g. 500 MB - -<!-- --> - - bsub -W 1:00 -n 1 -M 500 myprog - + the whole job, e.g. 500 MB `bsub -W 1:00 -n 1 -M 500 myprog` - For MPI-parallel programs, the job memory limit is N processes \* - memory limit, e.g. 32\*800 MB = 25600 MB - -<!-- --> - - bsub -W 8:00 -n 32 -M 800 mympiprog - + memory limit, e.g. 32\*800 MB = 25600 MB `bsub -W 8:00 -n 32 -M 800 mympiprog` - For OpenMP-parallel programs, the same applies as with MPI-parallel programs, e.g. 8\*2000 MB = 16000 MB + `bsub -W 4:00 -n 8 -M 2000 myompprog` -<!-- --> - - bsub -W 4:00 -n 8 -M 2000 myompprog +LSF sets the user environment according to the environment at the time of submission. -LSF sets the user environment according to the environment at the time -of submission. - -Based on the given information the job scheduler puts your job into the -appropriate queue. These queues are subject to permanent changes. You -can check the current situation using the command `bqueues -l` . There -are a couple of rules and restrictions to balance the system loads. One -idea behind them is to prevent users from occupying the machines -unfairly. An indicator for the priority of a job placement in a queue is -therefore the ratio between used and granted CPU time for a certain +Based on the given information the job scheduler puts your job into the appropriate queue. These +queues are subject to permanent changes. You can check the current situation using the command +`bqueues -l` . There are a couple of rules and restrictions to balance the system loads. One idea +behind them is to prevent users from occupying the machines unfairly. An indicator for the priority +of a job placement in a queue is therefore the ratio between used and granted CPU time for a certain period. -`Attention`: If you do not give the maximum runtime of your program, the +**Attention:** If you do not give the maximum runtime of your program, the default runtime for the specified queue is taken. This is way below the maximal possible runtime (see table [below](#JobQueues)). -#AttentionNo2 `Attention #2`: Some systems enforce a limit on how much -memory each process and your job as a whole may allocate. If your job or -any of its processes exceed this limit (N proc.\*limit for the job), -your job will be killed. If memory limiting is in place, there also -exists a default limit which will be applied to your job if you do not -specify one. Please find the limits along with the description of the -machines' [queues](#JobQueues) below. - -#InteractiveJobs +**Attention 2:** Some systems enforce a limit on how much memory each process and your job as a +whole may allocate. If your job or any of its processes exceed this limit (N proc.\*limit for the +job), your job will be killed. If memory limiting is in place, there also exists a default limit +which will be applied to your job if you do not specify one. Please find the limits along with the +description of the machines' [queues](#JobQueues) below. ### Interactive Jobs @@ -115,16 +98,20 @@ extensive production runs! Use the bsub options `-Is` for an interactive and, additionally on *Atlas*, `-XF` for an X11 job like: - bsub -Is -XF matlab +```Bash +bsub -Is -XF matlab +``` or for an interactive job with a bash use - bsub -Is -n 2 -W <hh:mm> -P <project> bash +```Bash +bsub -Is -n 2 -W <hh:mm> -P <project> bash +``` You can check the current usage of the system with the command `bhosts` to estimate the time to schedule. -#ParallelJobs +## ParallelJobs ### Parallel Jobs @@ -132,21 +119,14 @@ For submitting parallel jobs, a few rules have to be understood and followed. In general they depend on the type of parallelization and the architecture. -#OpenMPJobs - #### OpenMP Jobs An SMP-parallel job can only run within a node (or a partition), so it is necessary to include the option `-R "span[hosts=1]"` . The maximum number of processors for an SMP-parallel program is 506 on a large Altix -partition, and 64 on \<tt>*Atlas*\</tt> . A simple example of a job file +partition, and 64 on *Atlas*. A simple example of a job file for an OpenMP job can be found above (section [3.4](#LSF-OpenMP)). -[Further information on pinning -threads.](RuntimeEnvironment#Placing_Threads_or_Processes_on) - -#MpiJobs - #### MPI Jobs There are major differences for submitting MPI-parallel jobs on the @@ -167,22 +147,26 @@ defined when the job array is created. Here is an example how an array job can looks like: - #!/bin/bash +```Bash +#!/bin/bash - #BSUB -W 00:10 - #BSUB -n 1 - #BSUB -J "myTask[1-100:2]" # create job array with 50 tasks - #BSUB -o logs/out.%J.%I # appends the standard output of the job to the specified file that - # contains the job information (%J) and the task information (%I) - #BSUB -e logs/err.%J.%I # appends the error output of the job to the specified file that - # contains the job information (%J) and the task information (%I) +#BSUB -W 00:10 +#BSUB -n 1 +#BSUB -J "myTask[1-100:2]" # create job array with 50 tasks +#BSUB -o logs/out.%J.%I # appends the standard output of the job to the specified file that + # contains the job information (%J) and the task information (%I) +#BSUB -e logs/err.%J.%I # appends the error output of the job to the specified file that + # contains the job information (%J) and the task information (%I) - echo "Hello Job $LSB_JOBID Task $LSB_JOBINDEX" +echo "Hello Job $LSB_JOBID Task $LSB_JOBINDEX" +``` Alternatively, you can use the following single command line to submit an array job: - bsub -n 1 -W 00:10 -J "myTask[1-100:2]" -o "logs/out.%J.%I" -e "logs/err.%J.%I" "echo Hello Job \$LSB_JOBID Task \$LSB_JOBINDEX" +```Bash +bsub -n 1 -W 00:10 -J "myTask[1-100:2]" -o "logs/out.%J.%I" -e "logs/err.%J.%I" "echo Hello Job \$LSB_JOBID Task \$LSB_JOBINDEX" +``` For further details please read the LSF manual. @@ -200,36 +184,36 @@ detailed information see the man pages of bsub with `man bsub`. Here is an example how a chain job can looks like: - #!/bin/bash - - #job parameters - time="4:00" - mem="rusage[mem=2000] span[host=1]" - n="8" - - #iteration parameters - start=1 - end=10 - i=$start - - #create chain job with 10 jobs - while [ "$i" -lt "`expr $end + 1`" ] - do - if [ "$i" -eq "$start" ];then - #create jobname - JOBNAME="${USER}_job_$i" - bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" <job> - else - #create jobname - OJOBNAME=$JOBNAME - JOBNAME="${USER}_job_$i" - #only start a job if the preceding job has the status done - bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" -w "done($OJOBNAME)" <job> - fi - i=`expr $i + 1` - done - -#JobQueues +```Bash +#!/bin/bash + +#job parameters +time="4:00" +mem="rusage[mem=2000] span[host=1]" +n="8" + +#iteration parameters +start=1 +end=10 +i=$start + +#create chain job with 10 jobs +while [ "$i" -lt "`expr $end + 1`" ] +do + if [ "$i" -eq "$start" ];then + #create jobname + JOBNAME="${USER}_job_$i" + bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" <job> + else + #create jobname + OJOBNAME=$JOBNAME + JOBNAME="${USER}_job_$i" + #only start a job if the preceding job has the status done + bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" -w "done($OJOBNAME)" <job> + fi + i=`expr $i + 1` +done +``` ## Job Queues @@ -251,12 +235,15 @@ The command `bhosts` shows the load on the hosts. For a more convenient overview the command `lsfshowjobs` displays information on the LSF status like this: - You have 1 running job using 64 cores - You have 1 pending job +```Bash +You have 1 running job using 64 cores +You have 1 pending job +``` and the command `lsfnodestat` displays the node and core status of machine like this: +```Bash # ------------------------------------------- nodes available: 714/714 nodes damaged: 0 @@ -269,7 +256,8 @@ jobs damaged: 0 \| # ------------------------------------------- -normal working cores: 2556 cores free for jobs: 265 \</pre> +normal working cores: 2556 cores free for jobs: 265 +``` The command `bjobs` allows to monitor your running jobs. It has the following options: @@ -287,15 +275,15 @@ following options: If you run code that regularily emits status or progress messages, using the command -`watch -n10 tail -n2 '*out'` +```Bash +watch -n10 tail -n2 '*out' +``` in your `$HOME/.lsbatch` directory is a very handy way to keep yourself informed. Note that this only works if you did not use the `-o` option of `bsub`, If you used `-o`, replace `*out` with the list of file names you passed to this very option. -#HostList - ## Host List The `bsub` option `-m` can be used to specify a list of hosts for @@ -305,5 +293,3 @@ execution. This is especially useful for memory intensive computations. Jupiter, saturn, and uranus have 4 GB RAM per core, mars only 1GB. So it makes sense to specify '-m "jupiter saturn uranus". - -\</noautolink> diff --git a/doc.zih.tu-dresden.de/docs/archive/StepByStepTaurus.md b/doc.zih.tu-dresden.de/docs/archive/StepByStepTaurus.md deleted file mode 100644 index 03aa8538a..000000000 --- a/doc.zih.tu-dresden.de/docs/archive/StepByStepTaurus.md +++ /dev/null @@ -1,10 +0,0 @@ -# Step by step examples for working on Taurus - -(in development) - -- From Windows: - [login](Login#Prerequisites_for_Access_to_a_Linux_Cluster_From_a_Windows_Workstation) - and file transfer -- Short introductionary presentation on the module an job system on - taurus with focus on AI/ML: [Using taurus for - AI](%ATTACHURL%/Scads_-_Using_taurus_for_AI.pdf) diff --git a/doc.zih.tu-dresden.de/docs/archive/SystemVenus.md b/doc.zih.tu-dresden.de/docs/archive/SystemVenus.md index f8b7d14cc..94aa24f36 100644 --- a/doc.zih.tu-dresden.de/docs/archive/SystemVenus.md +++ b/doc.zih.tu-dresden.de/docs/archive/SystemVenus.md @@ -1,16 +1,9 @@ # Venus - - ## Information about the hardware Detailed information on the currect HPC hardware can be found -[here.](HardwareVenus) - -## Applying for Access to the System - -Project and login application forms for taurus are available -[here](Access). +[here](HardwareVenus.md). ## Login to the System @@ -18,63 +11,65 @@ Login to the system is available via ssh at `venus.hrsk.tu-dresden.de`. The RSA fingerprints of the Phase 2 Login nodes are: - MD5:63:65:c6:d6:4e:5e:03:9e:07:9e:70:d1:bc:b4:94:64 +```Bash +MD5:63:65:c6:d6:4e:5e:03:9e:07:9e:70:d1:bc:b4:94:64 +``` and - SHA256:Qq1OrgSCTzgziKoop3a/pyVcypxRfPcZT7oUQ3V7E0E - -You can find an list of fingerprints [here](Login#SSH_access). +```Bash +SHA256:Qq1OrgSCTzgziKoop3a/pyVcypxRfPcZT7oUQ3V7E0E +``` ## MPI -The installation of the Message Passing Interface on Venus (SGI MPT) -supports the MPI 2.2 standard (see `man mpi` ). There is no command like -`mpicc`, instead you just have to use the "serial" compiler (e.g. `icc`, -`icpc`, or `ifort`) and append `-lmpi` to the linker command line. +The installation of the Message Passing Interface on Venus (SGI MPT) supports the MPI 2.2 standard +(see `man mpi` ). There is no command like `mpicc`, instead you just have to use the "serial" +compiler (e.g. `icc`, `icpc`, or `ifort`) and append `-lmpi` to the linker command line. Example: - <span class='WYSIWYG_HIDDENWHITESPACE'> </span>% icc -o myprog -g -O2 -xHost myprog.c -lmpi<span class='WYSIWYG_HIDDENWHITESPACE'> </span> +```Bash +% icc -o myprog -g -O2 -xHost myprog.c -lmpi +``` Notes: -- C++ programmers: You need to link with both libraries: - `-lmpi++ -lmpi`. -- Fortran programmers: The MPI module is only provided for the Intel - compiler and does not work with gfortran. +- C++ programmers: You need to link with both libraries: + `-lmpi++ -lmpi`. +- Fortran programmers: The MPI module is only provided for the Intel + compiler and does not work with gfortran. -Please follow the following guidelines to run your parallel program -using the batch system on Venus. +Please follow the following guidelines to run your parallel program using the batch system on Venus. ## Batch system -Applications on an HPC system can not be run on the login node. They -have to be submitted to compute nodes with dedicated resources for the -user's job. Normally a job can be submitted with these data: +Applications on an HPC system can not be run on the login node. They have to be submitted to compute +nodes with dedicated resources for the user's job. Normally a job can be submitted with these data: -- number of CPU cores, -- requested CPU cores have to belong on one node (OpenMP programs) or - can distributed (MPI), -- memory per process, -- maximum wall clock time (after reaching this limit the process is - killed automatically), -- files for redirection of output and error messages, -- executable and command line parameters. +- number of CPU cores, +- requested CPU cores have to belong on one node (OpenMP programs) or + can distributed (MPI), +- memory per process, +- maximum wall clock time (after reaching this limit the process is + killed automatically), +- files for redirection of output and error messages, +- executable and command line parameters. -The batch sytem on Venus is Slurm. For general information on Slurm, -please follow [this link](Slurm). +The batch sytem on Venus is Slurm. For general information on Slurm, please follow +[this link](../jobs/Slurm.md). ### Submission of Parallel Jobs -The MPI library running on the UV is provided by SGI and highly -optimized for the ccNUMA architecture of this machine. +The MPI library running on the UV is provided by SGI and highly optimized for the ccNUMA +architecture of this machine. -On Venus, you can only submit jobs with a core number which is a -multiple of 8 (a whole CPU chip and 128 GB RAM). Parallel jobs can be -started like this: +On Venus, you can only submit jobs with a core number which is a multiple of 8 (a whole CPU chip and +128 GB RAM). Parallel jobs can be started like this: - <span class='WYSIWYG_HIDDENWHITESPACE'> </span>srun -n 16 a.out<span class='WYSIWYG_HIDDENWHITESPACE'> </span> +```Bash +srun -n 16 a.out +``` **Please note:** There are different MPI libraries on Taurus and Venus, so you have to compile the binaries specifically for their target. @@ -83,4 +78,4 @@ so you have to compile the binaries specifically for their target. - The large main memory on the system allows users to create ramdisks within their own jobs. The documentation on how to use these - ramdisks can be found [here](RamDiskDocumentation). + ramdisks can be found [here](RamDiskDocumentation.md). diff --git a/doc.zih.tu-dresden.de/docs/archive/TaurusII.md b/doc.zih.tu-dresden.de/docs/archive/TaurusII.md index 1517542e7..03fa87e0e 100644 --- a/doc.zih.tu-dresden.de/docs/archive/TaurusII.md +++ b/doc.zih.tu-dresden.de/docs/archive/TaurusII.md @@ -10,22 +10,17 @@ updated, and they will be merged with phase 2. Basic information for Taurus, phase 2: -- Please use the login nodes\<span class="WYSIWYG_TT"> - tauruslogin\[3-5\].hrsk.tu-dresden.de\</span> for the new system. +- Please use the login nodes `tauruslogin\[3-5\].hrsk.tu-dresden.de` for the new system. - We have mounted the same file systems like on our other HPC systems: - - /home/ - - /projects/ - - /sw - - Taurus phase 2 has it's own /scratch file system (capacity 2.5 - PB). + - `/home/` + - `/projects/` + - `/sw` + - Taurus phase 2 has it's own `/scratch` file system (capacity 2.5 PB). - All nodes have 24 cores. - Memory capacity is 64/128/256 GB per node. The batch system handles your requests like in phase 1. We have other memory-per-core limits! - Our 64 GPU nodes now have 2 cards with 2 GPUs, each. -For more details, please refer to our updated -[documentation](SystemTaurus). - Thank you for testing the system with us! Ulf Markwardt diff --git a/doc.zih.tu-dresden.de/docs/archive/UNICORERestAPI.md b/doc.zih.tu-dresden.de/docs/archive/UNICORERestAPI.md index 02cc0bf61..3cc59e7be 100644 --- a/doc.zih.tu-dresden.de/docs/archive/UNICORERestAPI.md +++ b/doc.zih.tu-dresden.de/docs/archive/UNICORERestAPI.md @@ -15,6 +15,4 @@ Some useful examples of job submission via REST are available at: The base address for the Taurus system at the ZIH is: -<https://unicore.zih.tu-dresden.de:8080/TAURUS/rest/core> - --- Main.AlvaroAguilera - 2017-02-01 +unicore.zih.tu-dresden.de:8080/TAURUS/rest/core diff --git a/doc.zih.tu-dresden.de/docs/archive/VampirTrace.md b/doc.zih.tu-dresden.de/docs/archive/VampirTrace.md index eee845e9c..76d267cf1 100644 --- a/doc.zih.tu-dresden.de/docs/archive/VampirTrace.md +++ b/doc.zih.tu-dresden.de/docs/archive/VampirTrace.md @@ -2,7 +2,7 @@ VampirTrace is a performance monitoring tool, that produces tracefiles during a program run. These tracefiles can be analyzed and visualized by -the tool [Vampir](Compendium.Vampir). Vampir Supports lots of features +the tool [Vampir] **todo** Vampir. Vampir Supports lots of features e.g. - MPI, OpenMP, pthreads, and hybrid programs @@ -13,12 +13,14 @@ e.g. - Function filtering and grouping Only the basic usage is shown in this Wiki. For a comprehensive -VampirTrace user manual refer to the [VampirTrace -Website](http://www.tu-dresden.de/zih/vampirtrace). +VampirTrace user manual refer to the +[VampirTrace Website](http://www.tu-dresden.de/zih/vampirtrace). Before using VampirTrace, set up the correct environment with - module load vampirtrace +```Bash +module load vampirtrace +``` To make measurements with VampirTrace, the user's application program needs to be instrumented, i.e., at specific important points @@ -99,5 +101,3 @@ applications can be instrumented: By default, running a VampirTrace instrumented application should result in a tracefile in the current working directory where the application was executed. - --- Main.jurenz - 2009-12-17 diff --git a/doc.zih.tu-dresden.de/docs/archive/VenusOpen.md b/doc.zih.tu-dresden.de/docs/archive/VenusOpen.md deleted file mode 100644 index cfee2b385..000000000 --- a/doc.zih.tu-dresden.de/docs/archive/VenusOpen.md +++ /dev/null @@ -1,9 +0,0 @@ -# Venus open to HPC projects - -The new HPC server [Venus](SystemVenus) is open to all HPC projects -running on Mars with a quota of 20000 CPU h for testing the system. -Projects without access to Mars have to apply for the new resorce. - -To increase the CPU quota beyond this limit, a follow-up (but full) -proposal is needed. This should be done via the new project management -system. -- GitLab