diff --git a/doc.zih.tu-dresden.de/docs/archive/AnnouncementOfQuotas.md b/doc.zih.tu-dresden.de/docs/archive/AnnouncementOfQuotas.md index bdae4d0eda1f17b662b351bd37fd666e2b17325e..bc04e86de79a293cdd144fa8d9023abb5e12b970 100644 --- a/doc.zih.tu-dresden.de/docs/archive/AnnouncementOfQuotas.md +++ b/doc.zih.tu-dresden.de/docs/archive/AnnouncementOfQuotas.md @@ -1,20 +1,16 @@ # Quotas for the home file system -The quotas of the home file system are meant to help the users to keep -in touch with their data. Especially in HPC, millions of temporary files -can be created within hours. We have identified this as a main reason -for performance degradation of the HOME file system. To stay in -operation with out HPC systems we regrettably have to fall back to this -unpopular technique. - -Based on a balance between the allotted disk space and the usage over -the time, reasonable quotas (mostly above current used space) for the -projects have been defined. The will be activated by the end of April -2012. - -If a project exceeds its quota (total size OR total number of files) it -cannot submit jobs into the batch system. Running jobs are not affected. -The following commands can be used for monitoring: +The quotas of the home file system are meant to help the users to keep in touch with their data. +Especially in HPC, millions of temporary files can be created within hours. We have identified this +as a main reason for performance degradation of the HOME file system. To stay in operation with out +HPC systems we regrettably have to fall back to this unpopular technique. + +Based on a balance between the allotted disk space and the usage over the time, reasonable quotas +(mostly above current used space) for the projects have been defined. The will be activated by the +end of April 2012. + +If a project exceeds its quota (total size OR total number of files) it cannot submit jobs into the +batch system. Running jobs are not affected. The following commands can be used for monitoring: - `quota -s -g` shows the file system usage of all groups the user is a member of. @@ -37,9 +33,9 @@ In case a project is above its limits, please - for later use (weeks...months) at the HPC systems, build tar archives with meaningful names or IDs and store them in the [DMF system](#AnchorDataMigration). Avoid using this system - (`/hpc_fastfs`) for files \< 1 MB! + (`/hpc_fastfs`) for files < 1 MB! - refer to the hints for - [long term preservation for research data](PreservationResearchData.md). + [long term preservation for research data](../data_management/PreservationResearchData.md). ## No Alternatives @@ -58,5 +54,3 @@ The current situation is this: `/hpc_fastfs`. In case of problems don't hesitate to ask for support. - -Ulf Markwardt, Claudia Schmidt diff --git a/doc.zih.tu-dresden.de/docs/archive/DebuggingTools.md b/doc.zih.tu-dresden.de/docs/archive/DebuggingTools.md index 30b7ae112263d12a4c9c0028a336a212682db5ed..0d902d2cfeb23f9ca1763df909d6746b16be81da 100644 --- a/doc.zih.tu-dresden.de/docs/archive/DebuggingTools.md +++ b/doc.zih.tu-dresden.de/docs/archive/DebuggingTools.md @@ -1,20 +1,14 @@ -Debugging is an essential but also rather time consuming step during -application development. Tools dramatically reduce the amount of time -spent to detect errors. Besides the "classical" serial programming -errors, which may usually be easily detected with a regular debugger, -there exist programming errors that result from the usage of OpenMP, -Pthreads, or MPI. These errors may also be detected with debuggers -(preferably debuggers with support for parallel applications), however, -specialized tools like MPI checking tools (e.g. Marmot) or thread -checking tools (e.g. Intel Thread Checker) can simplify this task. The -following sections provide detailed information about the different -types of debugging tools: +# Debugging Tools -- [Debuggers](Debuggers) -- debuggers (with and without support for - parallel applications) -- [MPI Usage Error Detection](MPI Usage Error Detection) -- tools to - detect MPI usage errors -- [Thread Checking](Thread Checking) -- tools to detect OpenMP/Pthread - usage errors +Debugging is an essential but also rather time consuming step during application development. Tools +dramatically reduce the amount of time spent to detect errors. Besides the "classical" serial +programming errors, which may usually be easily detected with a regular debugger, there exist +programming errors that result from the usage of OpenMP, Pthreads, or MPI. These errors may also be +detected with debuggers (preferably debuggers with support for parallel applications), however, +specialized tools like MPI checking tools (e.g. Marmot) or thread checking tools (e.g. Intel Thread +Checker) can simplify this task. The following sections provide detailed information about the +different types of debugging tools: --- Main.hilbrich - 2009-12-21 +- [Debuggers] **todo** Debuggers -- debuggers (with and without support for parallel applications) +- [MPI Usage Error Detection] **todo** MPI Usage Error Detection -- tools to detect MPI usage errors +- [Thread Checking] **todo** Thread Checking -- tools to detect OpenMP/Pthread usage errors diff --git a/doc.zih.tu-dresden.de/docs/archive/Hardware.md b/doc.zih.tu-dresden.de/docs/archive/Hardware.md index bac841daf83fbe4a5a9c4204e4d22ac18db61af3..449a2cf644d7453fc20856a74074ab11d6f51f15 100644 --- a/doc.zih.tu-dresden.de/docs/archive/Hardware.md +++ b/doc.zih.tu-dresden.de/docs/archive/Hardware.md @@ -1,17 +1,17 @@ # Hardware -Here, you can find basic information about the hardware installed at -ZIH. We try to keep this list up-to-date. +Here, you can find basic information about the hardware installed at ZIH. We try to keep this list +up-to-date. -- [BULL HPC-Cluster Taurus](HardwareTaurus) -- [SGI Ultraviolet (UV)](HardwareVenus) +- [BULL HPC-Cluster Taurus](TaurusII.md) +- [SGI Ultraviolet (UV)](HardwareVenus.md) Hardware hosted by ZIH: Former systems -- [PC-Farm Deimos](HardwareDeimos) -- [SGI Altix](HardwareAltix) -- [PC-Farm Atlas](HardwareAtlas) -- [PC-Cluster Triton](HardwareTriton) -- [HPC-Windows-Cluster Titan](HardwareTitan) +- [PC-Farm Deimos](HardwareDeimos.md) +- [SGI Altix](HardwareAltix.md) +- [PC-Farm Atlas](HardwareAtlas.md) +- [PC-Cluster Triton](HardwareTriton.md) +- [HPC-Windows-Cluster Titan](HardwareTitan.md) diff --git a/doc.zih.tu-dresden.de/docs/archive/HardwareDeimos.md b/doc.zih.tu-dresden.de/docs/archive/HardwareDeimos.md index 643fab9f4ab4ac6119f830cd1af09a93d2c1b858..81a69258cc34162695b00499c7166af6daaf7b17 100644 --- a/doc.zih.tu-dresden.de/docs/archive/HardwareDeimos.md +++ b/doc.zih.tu-dresden.de/docs/archive/HardwareDeimos.md @@ -1,5 +1,3 @@ - - # Linux Networx PC-Farm Deimos The PC farm `Deimos` is a heterogenous cluster based on dual core AMD @@ -7,36 +5,38 @@ Opteron CPUs. The nodes are operated by the Linux operating system SuSE SLES 10 with a 2.6 kernel. Currently, the following hardware is installed: -\|CPUs \|AMD Opteron X85 dual core \| \|RAM per core \|2 GB \| \|Number -of cores \|2584 \| \|total peak performance \|13.4 TFLOPS \| \|single -chip nodes \|384 \| \|dual nodes \|230 \| \|quad nodes \|88 \| \|quad -nodes (32 GB RAM) \|24 \| - -\<P> All nodes share a 68 TB [file -system](RuntimeEnvironment#Filesystem) on DDN hardware. Each node has -per core 40 GB local disk space for scratch mounted on `/tmp` . The jobs -for the compute nodes are scheduled by the [Platform LSF](Platform LSF) +|CPUs |AMD Opteron X85 dual core | +|RAM per core |2 GB | +|Number of cores |2584 | +|total peak performance |13.4 TFLOPS | +|single chip nodes |384 | +|dual nodes |230 | +|quad nodes |88 | +|quad nodes (32 GB RAM) |24 | + +All nodes share a 68 TB on DDN hardware. Each node has per core 40 GB local disk space for scratch +mounted on `/tmp` . The jobs for the compute nodes are scheduled by the +[Platform LSF](PlatformLSF.md) batch system from the login nodes `deimos.hrsk.tu-dresden.de` . -Two separate Infiniband networks (10 Gb/s) with low cascading switches -provide the communication and I/O infrastructure for low latency / high -throughput data traffic. An additional gigabit Ethernet network is used -for control and service purposes. +Two separate Infiniband networks (10 Gb/s) with low cascading switches provide the communication and +I/O infrastructure for low latency / high throughput data traffic. An additional gigabit Ethernet +network is used for control and service purposes. -Users with a login on the [SGI Altix](HardwareAltix) can access their -home directory via NFS below the mount point `/hpc_work`. +Users with a login on the [SGI Altix](HardwareAltix.md) can access their home directory via NFS +below the mount point `/hpc_work`. ## CPU The cluster is based on dual-core AMD Opteron X85 processor. One core has the following basic properties: -\|clock rate \|2.6 GHz \| \|floating point units \|2 \| \|peak -performance \|5.2 GFLOPS \| \|L1 cache \|2x64 kB \| \|L2 cache \|1 MB \| -\|memory bus \|128 bit x 200 MHz \| - -The CPU belongs to the x86_64 family. Since it is fully capable of -running x86-code, one should compare the performances of the 32 and 64 -bit versions of the same code. +|clock rate |2.6 GHz | +|floating point units |2 | +|peak performance |5.2 GFLOPS | +|L1 cache |2x64 kB | +|L2 cache |1 MB | +|memory bus |128 bit x 200 MHz | -<span class="twiki-macro COMMENT"></span> +The CPU belongs to the x86_64 family. Since it is fully capable of running x86-code, one should +compare the performances of the 32 and 64 bit versions of the same code. diff --git a/doc.zih.tu-dresden.de/docs/archive/HardwarePhobos.md b/doc.zih.tu-dresden.de/docs/archive/HardwarePhobos.md index 3221dc5902d518faf8257e8aac0af9a89e55fbb8..c5ecccb5487d43f6f9e723d65b5553653c38ee88 100644 --- a/doc.zih.tu-dresden.de/docs/archive/HardwarePhobos.md +++ b/doc.zih.tu-dresden.de/docs/archive/HardwarePhobos.md @@ -1,38 +1,37 @@ - - # Linux Networx PC-Cluster Phobos -------- **Phobos was shut down on 1 November 2010.** ------- +**Phobos was shut down on 1 November 2010.** `Phobos` is a cluster based on AMD Opteron CPUs. The nodes are operated by the Linux operating system SuSE SLES 9 with a 2.6 kernel. Currently, the following hardware is installed: -\|CPUs \|AMD Opteron 248 (single core) \| \|total peak performance -\|563.2 GFLOPS \| \|Number of nodes \|64 compute + 1 master \| \|CPUs -per node \|2 \| \|RAM per node \|4 GB \| +|CPUs \|AMD Opteron 248 (single core) | +|total peak performance |563.2 GFLOPS | +|Number of nodes |64 compute + 1 master | +|CPUs per node |2 | +|RAM per node |4 GB | -\<P> All nodes share a 4.4 TB SAN [file system](FileSystems). Each node -has additional local disk space mounted on `/scratch`. The jobs for the -compute nodes are scheduled by a [Platform LSF](Platform LSF) batch -system running on the login node `phobos.hrsk.tu-dresden.de`. +All nodes share a 4.4 TB SAN. Each node has additional local disk space mounted on `/scratch`. The +jobs for the compute nodes are scheduled by a [Platform LSF](PlatformLSF.md) batch system running on +the login node `phobos.hrsk.tu-dresden.de`. -Two separate Infiniband networks (10 Gb/s) with low cascading switches -provide the infrastructure for low latency / high throughput data -traffic. An additional GB/Ethernetwork is used for control and service -purposes. +Two separate Infiniband networks (10 Gb/s) with low cascading switches provide the infrastructure +for low latency / high throughput data traffic. An additional GB/Ethernetwork is used for control +and service purposes. ## CPU `Phobos` is based on single-core AMD Opteron 248 processor. It has the following basic properties: -\|clock rate \|2.2 GHz \| \|floating point units \|2 \| \|peak -performance \|4.4 GFLOPS \| \|L1 cache \|2x64 kB \| \|L2 cache \|1 MB \| -\|memory bus \|128 bit x 200 MHz \| +|clock rate |2.2 GHz | +|floating point units |2 | +|peak performance |4.4 GFLOPS | +|L1 cache |2x64 kB | +|L2 cache |1 MB | +|memory bus |128 bit x 200 MHz | The CPU belongs to the x86_64 family. Although it is fully capable of running x86-code, one should always try to use 64-bit programs due to their potentially higher performance. - -<span class="twiki-macro COMMENT"></span> diff --git a/doc.zih.tu-dresden.de/docs/archive/HardwareTitan.md b/doc.zih.tu-dresden.de/docs/archive/HardwareTitan.md index 4388cdbc89f858b450c2fd7e9a98fb79649de39f..6c383c94feafa9628f234b00a0f28f31c9f4902d 100644 --- a/doc.zih.tu-dresden.de/docs/archive/HardwareTitan.md +++ b/doc.zih.tu-dresden.de/docs/archive/HardwareTitan.md @@ -1,5 +1,3 @@ - - # Windows HPC Server 2008 - Cluster Titan The Dell Blade Server `Titan` is a homogenous cluster based on quad core diff --git a/doc.zih.tu-dresden.de/docs/archive/HardwareTriton.md b/doc.zih.tu-dresden.de/docs/archive/HardwareTriton.md index ce88271b90c13c5a5208f26f3c2368dd91cce8df..17fd54449f8e971624cdb72e02d15da981e3a33d 100644 --- a/doc.zih.tu-dresden.de/docs/archive/HardwareTriton.md +++ b/doc.zih.tu-dresden.de/docs/archive/HardwareTriton.md @@ -6,23 +6,28 @@ is a cluster based on quadcore Intel Xeon CPUs. The nodes are operated by the Linux operating system SuSE SLES 11. Currently, the following hardware is installed: -\|CPUs \|Intel quadcore E5530 \| \|RAM per core \|6 GB \| \|Number of -cores \|512 \| \|total peak performance \|4.9 TFLOPS \| \|dual nodes -\|64 \| +|CPUs |Intel quadcore E5530 | +|RAM per core |6 GB | +|Number of cores |512 | +|total peak performance |4.9 TFLOPS | +|dual nodes |64 | -The jobs for the compute nodes are scheduled by the -[LoadLeveler](LoadLeveler) batch system from the login node -triton.hrsk.tu-dresden.de . +The jobs for the compute nodes are scheduled by the [LoadLeveler](LoadLeveler.md) batch system from +the login node triton.hrsk.tu-dresden.de . ## CPU The cluster is based on dual-core Intel Xeon E5530 processor. One core has the following basic properties: -\|clock rate \|2.4 GHz \| \|Cores \|4 \| \|Threads \|8 \| \|Intel Smart -Cache \|8MB \| \|Intel QPI Speed \|5.86 GT/s \| \|Max TDP \|80 W \| +|clock rate |2.4 GHz | +|Cores |4 | +|Threads |8 | +|Intel Smart Cache |8MB | +|Intel QPI Speed |5.86 GT/s | +|Max TDP |80 W | -# Software +### Software | Compilers | Version | |:--------------------------------|---------------:| @@ -45,4 +50,4 @@ Cache \|8MB \| \|Intel QPI Speed \|5.86 GT/s \| \|Max TDP \|80 W \| | NAMD | 2.7b1 | | QuantumEspresso | 4.1.3 | | **Tools** | | -| [Totalview Debugger](Debuggers) | 8.8 | +| [Totalview Debugger] **todo** debuggers | 8.8 | diff --git a/doc.zih.tu-dresden.de/docs/archive/HardwareVenus.md b/doc.zih.tu-dresden.de/docs/archive/HardwareVenus.md index 00c6046dc4438b07b246dd5499a541dec9e1da26..be90985eace893cbf28753d5fbd2463402338e67 100644 --- a/doc.zih.tu-dresden.de/docs/archive/HardwareVenus.md +++ b/doc.zih.tu-dresden.de/docs/archive/HardwareVenus.md @@ -18,5 +18,3 @@ additional hardware hyperthreads. Venus uses the same HOME file system as all our other HPC installations. For computations, please use `/scratch`. - -... [More information on file systems](FileSystems) diff --git a/doc.zih.tu-dresden.de/docs/archive/Introduction.md b/doc.zih.tu-dresden.de/docs/archive/Introduction.md deleted file mode 100644 index ae6de5f861f391677befb65fd95574b1c950b9c8..0000000000000000000000000000000000000000 --- a/doc.zih.tu-dresden.de/docs/archive/Introduction.md +++ /dev/null @@ -1,18 +0,0 @@ -# Introduction - -The Center for Information Services and High Performance Computing (ZIH) -is a central scientific unit of TU Dresden with a strong competence in -parallel computing and software tools. We have a strong commitment to -support *real users*, collaborating to create new algorithms, -applications and to tackle the problems that need to be solved to create -new scientific insight with computational methods. Our compute complex -"Hochleistungs-Rechner-/-Speicher-Komplex" (HRSK) is focused on -data-intensive computing. High scalability, big memory and fast -I/O-systems are the outstanding properties of this project, aside from -the significant performance increase. The infrastructure is provided not -only to TU Dresden but to all universities and public research -institutes in Saxony. - -\<img alt="" -src="<http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/hpc/bilder/hpc_hardware07>" -title="HRSK overview" /> diff --git a/doc.zih.tu-dresden.de/docs/archive/KnlNodes.md b/doc.zih.tu-dresden.de/docs/archive/KnlNodes.md index f779a68bc3530a3394d1b137b5210414dab9ccb2..78e4cabc7b4f40574e084834d69175cdd9fa29ef 100644 --- a/doc.zih.tu-dresden.de/docs/archive/KnlNodes.md +++ b/doc.zih.tu-dresden.de/docs/archive/KnlNodes.md @@ -1,25 +1,29 @@ -# Intel Xeon Phi (Knights Landing) %RED%- Out of Service<span class="twiki-macro ENDCOLOR"></span> +# Intel Xeon Phi (Knights Landing) + +Xeon Phi nodes are **Out of Service**! The nodes `taurusknl[1-32]` are equipped with -- Intel Xeon Phi procesors: 64 cores Intel Xeon Phi 7210 (1,3 GHz) -- 96 GB RAM DDR4 -- 16 GB MCDRAM -- /scratch, /lustre/ssd, /projects, /home are mounted +- Intel Xeon Phi procesors: 64 cores Intel Xeon Phi 7210 (1,3 GHz) +- 96 GB RAM DDR4 +- 16 GB MCDRAM +- `/scratch`, `/lustre/ssd`, `/projects`, `/home` are mounted Benchmarks, so far (single node): -- HPL (Linpack): 1863.74 GFlops -- SGEMM (single precision) MKL: 4314 GFlops -- Stream (only 1.4 GiB memory used): 431 GB/s +- HPL (Linpack): 1863.74 GFlops +- SGEMM (single precision) MKL: 4314 GFlops +- Stream (only 1.4 GiB memory used): 431 GB/s Each of them can run 4 threads, so one can start a job here with e.g. - srun -p knl -N 1 --mem=90000 -n 1 -c 64 a.out +```Bash +srun -p knl -N 1 --mem=90000 -n 1 -c 64 a.out +``` In order to get their optimal performance please re-compile your code with the most recent Intel compiler and explicitely set the compiler -flag **`-xMIC-AVX512`**. +flag `-xMIC-AVX512`. MPI works now, we recommend to use the latest Intel MPI version (intelmpi/2017.1.132). To utilize the OmniPath Fabric properly, make @@ -33,23 +37,21 @@ request): | Nodes | Cluster Mode | Memory Mode | |:-------------------|:-------------|:------------| -| taurusknl\[1-28\] | Quadrant | Cache | -| taurusknl29 | Quadrant | Flat | -| taurusknl\[30-32\] | SNC4 | Flat | +| `taurusknl[1-28]` | Quadrant | Cache | +| `taurusknl29` | Quadrant | Flat | +| `taurusknl[30-32]` | SNC4 | Flat | They have SLURM features set, so that you can request them specifically -by using the SLURM parameter **--constraint** where multiple values can -be linked with the & operator, e.g. **--constraint="SNC4&Flat"**. If you +by using the SLURM parameter `--constraint` where multiple values can +be linked with the & operator, e.g. `--constraint="SNC4&Flat"`. If you don't set a constraint, your job will run preferably on the nodes with Quadrant+Cache. Note that your performance might take a hit if your code is not NUMA-aware and does not make use of the Flat memory mode while running on the nodes that have those modes set, so you might want to use ---constraint="Quadrant&Cache" in such a case to ensure your job does not +`--constraint="Quadrant&Cache"` in such a case to ensure your job does not run on an unfavorable node (which might happen if all the others are already allocated). -\<a -href="<http://www.prace-ri.eu/best-practice-guide-knights-landing-january-2017/>" -title="Knl Best Practice Guide">KNL Best Practice Guide\</a> from PRACE +[Knl Best Practice Guide](https://prace-ri.eu/training-support/best-practice-guides/best-practice-guide-knights-landing/) diff --git a/doc.zih.tu-dresden.de/docs/archive/LoadLeveler.md b/doc.zih.tu-dresden.de/docs/archive/LoadLeveler.md index 1fd54a80791a537355d88aa720f4e31e64a9908c..fb85aaf079e6769005a461ee226f5329210feb69 100644 --- a/doc.zih.tu-dresden.de/docs/archive/LoadLeveler.md +++ b/doc.zih.tu-dresden.de/docs/archive/LoadLeveler.md @@ -1,7 +1,5 @@ # LoadLeveler - IBM Tivoli Workload Scheduler - - ## Job Submission First of all, to submit a job to LoadLeveler a job file needs to be @@ -14,19 +12,21 @@ created. This job file can be passed to the command: An example job file may look like this: - #@ job_name = my_job - #@ output = $(job_name).$(jobid).out - #@ error = $(job_name).$(jobid).err - #@ class = short - #@ group = triton-ww | triton-ipf | triton-ism | triton-et - #@ wall_clock_limit = 00:30:00 - #@ resources = ConsumableMemory(1 gb) - #@ environment = COPY_ALL - #@ notification = complete - #@ notify_user = your_email@adress - #@ queue - - ./my_serial_program +```Bash +#@ job_name = my_job +#@ output = $(job_name).$(jobid).out +#@ error = $(job_name).$(jobid).err +#@ class = short +#@ group = triton-ww | triton-ipf | triton-ism | triton-et +#@ wall_clock_limit = 00:30:00 +#@ resources = ConsumableMemory(1 gb) +#@ environment = COPY_ALL +#@ notification = complete +#@ notify_user = your_email@adress +#@ queue + +./my_serial_program +``` This example requests a serial job with a runtime of 30 minutes and a overall memory requirement of 1GByte. There are four groups available, @@ -38,22 +38,24 @@ usage. An example job file may look like this: - #@ job_name = my_job - #@ output = $(job_name).$(jobid).out - #@ error = $(job_name).$(jobid).err - #@ job_type = parallel - #@ node = 2 - #@ tasks_per_node = 8 - #@ class = short - #@ group = triton-ww | triton-ipf | triton-ism | triton-et - #@ wall_clock_limit = 00:30:00 - #@ resources = ConsumableMemory(1 gb) - #@ environment = COPY_ALL - #@ notification = complete - #@ notify_user = your_email@adress - #@ queue - - mpirun -x OMP_NUM_THREADS=1 -x LD_LIBRARY_PATH -np 16 ./my_mpi_program +```Bash +#@ job_name = my_job +#@ output = $(job_name).$(jobid).out +#@ error = $(job_name).$(jobid).err +#@ job_type = parallel +#@ node = 2 +#@ tasks_per_node = 8 +#@ class = short +#@ group = triton-ww | triton-ipf | triton-ism | triton-et +#@ wall_clock_limit = 00:30:00 +#@ resources = ConsumableMemory(1 gb) +#@ environment = COPY_ALL +#@ notification = complete +#@ notify_user = your_email@adress +#@ queue + +mpirun -x OMP_NUM_THREADS=1 -x LD_LIBRARY_PATH -np 16 ./my_mpi_program +``` This example requests a parallel job with 16 processes (2 nodes, 8 tasks per node), a runtime of 30 minutes, 1GByte memory requirement per task @@ -83,22 +85,24 @@ loaded, e.g issue: An example job file may look like this: - #@ job_name = my_job - #@ output = $(job_name).$(jobid).out - #@ error = $(job_name).$(jobid).err - #@ job_type = parallel - #@ node = 4 - #@ tasks_per_node = 8 - #@ class = short - #@ group = triton-ww | triton-ipf | triton-ism | triton-et - #@ wall_clock_limit = 00:30:00 - #@ resources = ConsumableMemory(1 gb) - #@ environment = COPY_ALL - #@ notification = complete - #@ notify_user = your_email@adress - #@ queue - - mpirun -x OMP_NUM_THREADS=8 -x LD_LIBRARY_PATH -np 4 --bynode ./my_hybrid_program +```Bash +#@ job_name = my_job +#@ output = $(job_name).$(jobid).out +#@ error = $(job_name).$(jobid).err +#@ job_type = parallel +#@ node = 4 +#@ tasks_per_node = 8 +#@ class = short +#@ group = triton-ww | triton-ipf | triton-ism | triton-et +#@ wall_clock_limit = 00:30:00 +#@ resources = ConsumableMemory(1 gb) +#@ environment = COPY_ALL +#@ notification = complete +#@ notify_user = your_email@adress +#@ queue + +mpirun -x OMP_NUM_THREADS=8 -x LD_LIBRARY_PATH -np 4 --bynode ./my_hybrid_program +``` This example requests a parallel job with 32 processes (4 nodes, 8 tasks per node), a runtime of 30 minutes, 1GByte memory requirement per task @@ -174,24 +178,26 @@ Interactive Jobs can be submitted by the command: Loadleveler Runtime Variables give you some information within the job script, for example: - #@ job_name = my_job - #@ output = $(job_name).$(jobid).out - #@ error = $(job_name).$(jobid).err - #@ job_type = parallel - #@ node = 2 - #@ tasks_per_node = 8 - #@ class = short - #@ wall_clock_limit = 00:30:00 - #@ resources = ConsumableMemory(1 gb) - #@ environment = COPY_ALL - #@ notification = complete - #@ notify_user = your_email@adress - #@ queue - - echo $LOADL_PROCESSOR_LIST - echo $LOADL_STEP_ID - echo $LOADL_JOB_NAME - mpirun -np 16 ./my_mpi_program +```Bash +#@ job_name = my_job +#@ output = $(job_name).$(jobid).out +#@ error = $(job_name).$(jobid).err +#@ job_type = parallel +#@ node = 2 +#@ tasks_per_node = 8 +#@ class = short +#@ wall_clock_limit = 00:30:00 +#@ resources = ConsumableMemory(1 gb) +#@ environment = COPY_ALL +#@ notification = complete +#@ notify_user = your_email@adress +#@ queue + +echo $LOADL_PROCESSOR_LIST +echo $LOADL_STEP_ID +echo $LOADL_JOB_NAME +mpirun -np 16 ./my_mpi_program +``` Further Information: \[\[http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl35.admin.doc/am2ug_envvars.html\]\[Full @@ -202,56 +208,70 @@ description of variables\]\]. The `llclass` command provides information about each queue. Example output: - Name MaxJobCPU MaxProcCPU Free Max Description - d+hh:mm:ss d+hh:mm:ss Slots Slots - --------------- -------------- -------------- ----- ----- --------------------- - interactive undefined undefined 32 32 interactive, exclusive shared nodes, max. 12h runtime - triton_ism undefined undefined 8 80 exclusive, serial + parallel queue, nodes shared, unlimited runtime - openend undefined undefined 272 384 serial + parallel queue, nodes shared, unlimited runtime - long undefined undefined 272 384 serial + parallel queue, nodes shared, max. 7 days runtime - medium undefined undefined 272 384 serial + parallel queue, nodes shared, max. 3 days runtime - short undefined undefined 272 384 serial + parallel queue, nodes shared, max. 4 hours runtime +```Bash +Name MaxJobCPU MaxProcCPU Free Max Description + d+hh:mm:ss d+hh:mm:ss Slots Slots +--------------- -------------- -------------- ----- ----- --------------------- +interactive undefined undefined 32 32 interactive, exclusive shared nodes, max. 12h runtime +triton_ism undefined undefined 8 80 exclusive, serial + parallel queue, nodes shared, unlimited runtime +openend undefined undefined 272 384 serial + parallel queue, nodes shared, unlimited runtime +long undefined undefined 272 384 serial + parallel queue, nodes shared, max. 7 days runtime +medium undefined undefined 272 384 serial + parallel queue, nodes shared, max. 3 days runtime +short undefined undefined 272 384 serial + parallel queue, nodes shared, max. 4 hours runtime +``` ## Job Monitoring ### All Jobs in the Queue - # llq +```Bash +# llq +```Bash #### All of One's Own Jobs - # llq -u username +```Bash +# llq -u username +```Bash ### Details About Why A Job Has Not Yet Started - # llq -s job-id +```Bash +# llq -s job-id +``` The key information is located at the end of the output, and will look similar to the following: - ==================== EVALUATIONS FOR JOB STEP l1f1n01.4604.0 ==================== - The class of this job step is "workq". - Total number of available initiators of this class on all machines in the cluster: 0 - Minimum number of initiators of this class required by job step: 4 - The number of available initiators of this class is not sufficient for this job step. - Not enough resources to start now. - Not enough resources for this step as backfill. +```Bash +==================== EVALUATIONS FOR JOB STEP l1f1n01.4604.0 ==================== +The class of this job step is "workq". +Total number of available initiators of this class on all machines in the cluster: 0 +Minimum number of initiators of this class required by job step: 4 +The number of available initiators of this class is not sufficient for this job step. +Not enough resources to start now. +Not enough resources for this step as backfill. +``` Or it will tell you the **estimated start** time: - ==================== EVALUATIONS FOR JOB STEP l1f1n01.8207.0 ==================== - The class of this job step is "checkpt". - Total number of available initiators of this class on all machines in the cluster: 8 - Minimum number of initiators of this class required by job step: 32 - The number of available initiators of this class is not sufficient for this job step. - Not enough resources to start now. - This step is top-dog. - Considered at: Fri Jul 13 12:12:04 2007 - Will start by: Tue Jul 17 18:10:32 2007 +```Bash +==================== EVALUATIONS FOR JOB STEP l1f1n01.8207.0 ==================== +The class of this job step is "checkpt". +Total number of available initiators of this class on all machines in the cluster: 8 +Minimum number of initiators of this class required by job step: 32 +The number of available initiators of this class is not sufficient for this job step. +Not enough resources to start now. +This step is top-dog. +Considered at: Fri Jul 13 12:12:04 2007 +Will start by: Tue Jul 17 18:10:32 2007 +``` ### Generate a long listing rather than the standard one - # llq -l job-id +```Bash +# llq -l job-id +``` This command will give you detailed job information. @@ -285,11 +305,15 @@ This command will give you detailed job information. ### A Particular Job - # llcancel job-id +```Bash +# llcancel job-id +``` ### All of One's Jobs - # llcancel -u username +```Bash +# llcancel -u username +``` ## Job History and Usage Summaries @@ -300,23 +324,27 @@ jobs run under LoadLeveler. This file is An example of usage would be as follows: - # llsummary -u estrabd /var/loadl/archive/history.archive +```Bash +# llsummary -u estrabd /var/loadl/archive/history.archive +``` And the output would look something like: - Name Jobs Steps Job Cpu Starter Cpu Leverage - estrabd 118 128 07:55:57 00:00:45 634.6 - TOTAL 118 128 07:55:57 00:00:45 634.6 - Class Jobs Steps Job Cpu Starter Cpu Leverage - checkpt 13 23 03:09:32 00:00:18 631.8 - interactive 105 105 04:46:24 00:00:26 660.9 - TOTAL 118 128 07:55:57 00:00:45 634.6 - Group Jobs Steps Job Cpu Starter Cpu Leverage - No_Group 118 128 07:55:57 00:00:45 634.6 - TOTAL 118 128 07:55:57 00:00:45 634.6 - Account Jobs Steps Job Cpu Starter Cpu Leverage - NONE 118 128 07:55:57 00:00:45 634.6 - TOTAL 118 128 07:55:57 00:00:45 634.6 +```Bash + Name Jobs Steps Job Cpu Starter Cpu Leverage + estrabd 118 128 07:55:57 00:00:45 634.6 + TOTAL 118 128 07:55:57 00:00:45 634.6 + Class Jobs Steps Job Cpu Starter Cpu Leverage + checkpt 13 23 03:09:32 00:00:18 631.8 +interactive 105 105 04:46:24 00:00:26 660.9 + TOTAL 118 128 07:55:57 00:00:45 634.6 + Group Jobs Steps Job Cpu Starter Cpu Leverage + No_Group 118 128 07:55:57 00:00:45 634.6 + TOTAL 118 128 07:55:57 00:00:45 634.6 + Account Jobs Steps Job Cpu Starter Cpu Leverage + NONE 118 128 07:55:57 00:00:45 634.6 + TOTAL 118 128 07:55:57 00:00:45 634.6 +``` The **llsummary** tool has a lot of options, which are discussed in its man pages. @@ -327,89 +355,90 @@ man pages. And the output would look something like: - root@triton[0]:~# llstatus - Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys - n01 Avail 0 0 Idle 0 0.00 2403 AMD64 Linux2 - n02 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n03 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n04 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n05 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n06 Avail 0 0 Idle 0 0.71 9999 AMD64 Linux2 - n07 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n08 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n09 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n10 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n11 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n12 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n13 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n14 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n15 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n16 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n17 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n18 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n19 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n20 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n21 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n22 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n23 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n24 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n25 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n26 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n27 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n28 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n29 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n30 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n31 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n32 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n33 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n34 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n35 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n36 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n37 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n38 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n39 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n40 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n41 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n42 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n43 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n44 Avail 0 0 Idle 0 0.01 9999 AMD64 Linux2 - n45 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n46 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n47 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n48 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n49 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n50 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n51 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n52 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n53 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n54 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n55 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n56 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n57 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n58 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n59 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n60 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n61 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n62 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n63 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - n64 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 - triton Avail 0 0 Idle 0 0.00 585 AMD64 Linux2 - - AMD64/Linux2 65 machines 0 jobs 0 running tasks - Total Machines 65 machines 0 jobs 0 running tasks - - The Central Manager is defined on triton - - The BACKFILL scheduler is in use - - All machines on the machine_list are present. +```Bash +root@triton[0]:~# llstatus +Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys +n01 Avail 0 0 Idle 0 0.00 2403 AMD64 Linux2 +n02 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n03 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n04 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n05 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n06 Avail 0 0 Idle 0 0.71 9999 AMD64 Linux2 +n07 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n08 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n09 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n10 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n11 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n12 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n13 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n14 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n15 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n16 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n17 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n18 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n19 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n20 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n21 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n22 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n23 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n24 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n25 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n26 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n27 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n28 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n29 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n30 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n31 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n32 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n33 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n34 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n35 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n36 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n37 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n38 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n39 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n40 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n41 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n42 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n43 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n44 Avail 0 0 Idle 0 0.01 9999 AMD64 Linux2 +n45 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n46 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n47 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n48 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n49 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n50 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n51 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n52 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n53 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n54 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n55 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n56 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n57 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n58 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n59 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n60 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n61 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n62 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n63 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +n64 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 +triton Avail 0 0 Idle 0 0.00 585 AMD64 Linux2 + +AMD64/Linux2 65 machines 0 jobs 0 running tasks +Total Machines 65 machines 0 jobs 0 running tasks + +The Central Manager is defined on triton + +The BACKFILL scheduler is in use + +All machines on the machine_list are present. +``` Detailed status information for a specific node: - # llstatus -l n54 +```Bash +# llstatus -l n54 +``` Further information: -\[\[http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl.doc/llbooks.html\]\[IBM -Documentation (see version 3.5)\]\] - --- Main.mark - 2010-06-01 +[IBM Documentation (see version 3.5)](http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl.doc/llbooks.html) diff --git a/doc.zih.tu-dresden.de/docs/archive/MigrateToAtlas.md b/doc.zih.tu-dresden.de/docs/archive/MigrateToAtlas.md index fa53026bc88450ae2e8ab77320b117ba38761467..688f390e874dd43587d3191559b3ed12738c46cc 100644 --- a/doc.zih.tu-dresden.de/docs/archive/MigrateToAtlas.md +++ b/doc.zih.tu-dresden.de/docs/archive/MigrateToAtlas.md @@ -1,7 +1,6 @@ # Migration to Atlas - Atlas is a different machine than -Deimos, please have a look at the table: +Atlas is a different machine than Deimos, please have a look at the table: | | | | |---------------------------------------------------|------------|-----------| @@ -19,11 +18,7 @@ codenamed "Bulldozer" is designed for multi-threaded use. We have grouped the module definitions for a better overview. This is only for displaying the available modules, not for loading a module. All -available modules can be made visible with `module load ALL; module av` -. For more details, please see [module -groups.](RuntimeEnvironment#Module_Groups) - -#BatchSystem +available modules can be made visible with `module load ALL; module av`. ## Batch System @@ -58,8 +53,7 @@ nodes you have to be more precise in your resource requests. - In ninety nine percent of the cases it is enough when you specify your processor requirements with `-n <n>` and your memory requirements with `-M <memory per process in MByte>`. -- Please use \<span class="WYSIWYG_TT">-x\</span>("exclusive use of a - hosts") only with care and when you really need it. +- Please use `-x` ("exclusive use of a hosts") only with care and when you really need it. - The option `-x` in combination with `-n 1` leads to an "efficiency" of only 1.5% - in contrast with 50% on the single socket nodes at Deimos. @@ -69,15 +63,14 @@ nodes you have to be more precise in your resource requests. - Please use `-M <memory per process in MByte>` to specify your memory requirements per process. - Please don't use `-R "span[hosts=1]"` or `-R "span[ptile=<n>]"` or - any other \<span class="WYSIWYG_TT">-R "..."\</span>option, the - batch system is smart enough to select the best hosts in accordance + any other `-R "..."` option, the batch system is smart enough to select the best hosts in accordance with your processor and memory requirements. - Jobs with a processor requirement ≤ 64 will always be scheduled on one node. - Larger jobs will use just as many hosts as needed, e.g. 160 processes will be scheduled on three hosts. -For more details, please see the pages on [LSF](PlatformLSF). +For more details, please see the pages on [LSF](PlatformLSF.md). ## Software @@ -95,21 +88,18 @@ degradation. Please include "Atlas" in your subject. ### Development -From the benchmarking point of view, the best compiler for the AMD -Bulldozer processor, the best compiler comes from the Open64 suite. For -convenience, other compilers are installed, Intel 12.1 shows good -results as well. Please check the best compiler flags at [this -overview](http://developer.amd.com/Assets/CompilerOptQuickRef-62004200.pdf). -For best performance, please use [ACML](Libraries#ACML) as BLAS/LAPACK -library. +From the benchmarking point of view, the best compiler for the AMD Bulldozer processor, the best +compiler comes from the Open64 suite. For convenience, other compilers are installed, Intel 12.1 +shows good results as well. Please check the best compiler flags at +[this overview] developer.amd.com/Assets/CompilerOptQuickRef-62004200.pdf. ### MPI parallel applications Please note the more convenient syntax on Atlas. Therefore, please use a command like +```Bash bsub -W 2:00 -M 200 -n 8 mpirun a.out +``` to submit your MPI parallel applications. - -- Set DENYTOPICVIEW = WikiGuest diff --git a/doc.zih.tu-dresden.de/docs/archive/Phase2Migration.md b/doc.zih.tu-dresden.de/docs/archive/Phase2Migration.md index 7491426caeedd38e70be7f585bfb64a81767265c..484b0196eb7de8e6aca1eb7927c6957899bacf87 100644 --- a/doc.zih.tu-dresden.de/docs/archive/Phase2Migration.md +++ b/doc.zih.tu-dresden.de/docs/archive/Phase2Migration.md @@ -1,53 +1,63 @@ +# Migration towards Phase 2 - -### How to copy your data from an old scratch (Atlas, Venus, Taurus I) to our new scratch (Taurus II) +## How to copy your data from an old scratch (Atlas, Venus, Taurus I) to our new scratch (Taurus II) Currently there is only Taurus (I) scracht mountet on Taurus (II). To move files from Venus/Atlas to Taurus (II) you have to do an intermediate step over Taurus (I) -#### How to copy data from Atlas/Venus scratch to scratch of Taurus I (first step) +## How to copy data from Atlas/Venus scratch to scratch of Taurus I (first step) First you have to login to Taurus I. - ssh <username>@tauruslogin[1-2].hrsk.tu-dresden.de +```Bash +ssh <username>@tauruslogin[1-2].hrsk.tu-dresden.de +``` After your are logged in, you can use our tool called Datamover to copy your data from A to B. - dtcp -r /atlas_scratch/<project or user>/<directory> /scratch/<project or user>/<directory> +```Bash +dtcp -r /atlas_scratch/<project or user>/<directory> /scratch/<project or user>/<directory> - e.g. file: dtcp -r /atlas_scratch/rotscher/file.txt /scratch/rotscher/ - e.g. directory: dtcp -r /atlas_scratch/rotscher/directory /scratch/rotscher/ +e.g. file: dtcp -r /atlas_scratch/rotscher/file.txt /scratch/rotscher/ +e.g. directory: dtcp -r /atlas_scratch/rotscher/directory /scratch/rotscher/ +``` -#### How to copy data from scratch of Taurus I to scratch of Taurus II (second step) +## How to copy data from scratch of Taurus I to scratch of Taurus II (second step) First you have to login to Taurus II. - ssh <username>@tauruslogin[3-5].hrsk.tu-dresden.de +```Bash +ssh <username>@tauruslogin[3-5].hrsk.tu-dresden.de +``` After your are logged in, you can use our tool called Datamover to copy your data from A to B. - dtcp -r /phase1_scratch/<project or user>/<directory> /scratch/<project or user>/<directory> - - e.g. file: dtcp -r /phase1_scratch/rotscher/file.txt /scratch/rotscher/ - e.g. directory: dtcp -r /phase1_scratch/rotscher/directory /scratch/rotscher/ - -### Examples on how to use data transfer commands: +```Bash +dtcp -r /phase1_scratch/<project or user>/<directory> /scratch/<project or user>/<directory> -#### Copying data from Atlas' /scratch to Taurus' /scratch +e.g. file: dtcp -r /phase1_scratch/rotscher/file.txt /scratch/rotscher/ +e.g. directory: dtcp -r /phase1_scratch/rotscher/directory /scratch/rotscher/ +``` - % dtcp -r /atlas_scratch/jurenz/results /taurus_scratch/jurenz/ +## Examples on how to use data transfer commands: -#### Moving data from Venus' /scratch to Taurus' /scratch +### Copying data from Atlas' /scratch to Taurus' /scratch - % dtmv /venus_scratch/jurenz/results/ /taurus_scratch/jurenz/venus_results +```Bash +% dtcp -r /atlas_scratch/jurenz/results /taurus_scratch/jurenz/ +``` -#### TGZ data from Taurus' /scratch to the Archive +### Moving data from Venus' /scratch to Taurus' /scratch - % dttar -czf /archiv/jurenz/taurus_results_20140523.tgz /taurus_scratch/jurenz/results +```Bash +% dtmv /venus_scratch/jurenz/results/ /taurus_scratch/jurenz/venus_results +``` -- Set DENYTOPICVIEW = WikiGuest +### TGZ data from Taurus' /scratch to the Archive --- Main.MatthiasKraeusslein - 2015-08-20 +```Bash +% dttar -czf /archiv/jurenz/taurus_results_20140523.tgz /taurus_scratch/jurenz/results +``` diff --git a/doc.zih.tu-dresden.de/docs/archive/PlatformLSF.md b/doc.zih.tu-dresden.de/docs/archive/PlatformLSF.md index 56a86433a3945ed5b8b267af3a7de02997103117..699db5c9ba6f732514d2fad7d5070b2ffe81fdc4 100644 --- a/doc.zih.tu-dresden.de/docs/archive/PlatformLSF.md +++ b/doc.zih.tu-dresden.de/docs/archive/PlatformLSF.md @@ -1,9 +1,8 @@ # Platform LSF -**`%RED%This Page is deprecated! The current bachsystem on Taurus and Venus is [[Compendium.Slurm][Slurm]]!%ENDCOLOR%`** +**This Page is deprecated!** The current bachsystem on Taurus is [Slurm][../jobs/Slurm.md] - The HRSK-I systems are operated -with the batch system LSF running on *Mars*, *Atlas* resp.. +The HRSK-I systems are operated with the batch system LSF running on *Mars*, *Atlas* resp.. ## Job Submission @@ -14,40 +13,44 @@ Some options of `bsub` are shown in the following table: | bsub option | Description | |:-------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| -n \<N> | set number of processors (cores) to N(default=1) | -| -W \<hh:mm> | set maximum wall clock time to \<hh:mm> | -| -J \<name> | assigns the specified name to the job | -| -eo \<errfile> | writes the standard error output of the job to the specified file (overwriting) | -| -o \<outfile> | appends the standard output of the job to the specified file | -| -R span\[hosts=1\] | use only one SMP node (automatically set by the batch system) | -| -R span\[ptile=2\] | run 2 tasks per node | -| -x | disable other jobs to share the node ( Atlas ). | -| -m | specify hosts to run on ( [see below](#HostList)) | -| -M \<M> | specify per-process (per-core) memory limit (in MB), the job's memory limit is derived from that number (N proc \* M MB); see examples and [Attn. #2](#AttentionNo2) below | -| -P \<project> | specifiy project | +| `-n \<N> ` | set number of processors (cores) to N(default=1) | +| `-W \<hh:mm> ` | set maximum wall clock time to `<hh:mm>` | +| `-J \<name> ` | assigns the specified name to the job | +| `-eo \<errfile> ` | writes the standard error output of the job to the specified file (overwriting) | +| `-o \<outfile> ` | appends the standard output of the job to the specified file | +| `-R span\[hosts=1\]` | use only one SMP node (automatically set by the batch system) | +| `-R span\[ptile=2\]` | run 2 tasks per node | +| `-x ` | disable other jobs to share the node ( Atlas ). | +| `-m ` | specify hosts to run on ( [see below](#HostList)) | +| `-M \<M> ` | specify per-process (per-core) memory limit (in MB), the job's memory limit is derived from that number (N proc \* M MB); see examples and [Attn. #2](#AttentionNo2) below | +| `-P \<project> ` | specifiy project | You can use the `%J` -macro to merge the job ID into names. It might be more convenient to put the options directly in a job file which you can submit using - bsub < my_jobfile +```Bash +bsub <my_jobfile> +``` The following example job file shows how you can make use of it: - #!/bin/bash - #BSUB -J my_job # the job's name - #BSUB -W 4:00 # max. wall clock time 4h - #BSUB -R "span[hosts=1]" # run on a single node - #BSUB -n 4 # number of processors - #BSUB -M 500 # 500MB per core memory limit - #BSUB -o out.%J # output file - #BSUB -u name@tu-dresden.de # email address; works ONLY with @tu-dresden.de - - echo Starting Program - cd $HOME/work - a.out # e.g. an OpenMP program - echo Finished Program +```Bash +#!/bin/bash +#BSUB -J my_job # the job's name +#BSUB -W 4:00 # max. wall clock time 4h +#BSUB -R "span[hosts=1]" # run on a single node +#BSUB -n 4 # number of processors +#BSUB -M 500 # 500MB per core memory limit +#BSUB -o out.%J # output file +#BSUB -u name@tu-dresden.de # email address; works ONLY with @tu-dresden.de + +echo Starting Program +cd $HOME/work +a.out # e.g. an OpenMP program +echo Finished Program +``` **Understanding memory limits** The option -M to bsub defines how much memory may be consumed by a single process of the job. The job memory @@ -58,51 +61,31 @@ memory limit of 2400 MB. If any one of your processes consumes more than than 2400 MB of memory in sum, then the job will be killed by LSF. - For serial programs, the given limit is the same for the process and - the whole job, e.g. 500 MB - -<!-- --> - - bsub -W 1:00 -n 1 -M 500 myprog - + the whole job, e.g. 500 MB `bsub -W 1:00 -n 1 -M 500 myprog` - For MPI-parallel programs, the job memory limit is N processes \* - memory limit, e.g. 32\*800 MB = 25600 MB - -<!-- --> - - bsub -W 8:00 -n 32 -M 800 mympiprog - + memory limit, e.g. 32\*800 MB = 25600 MB `bsub -W 8:00 -n 32 -M 800 mympiprog` - For OpenMP-parallel programs, the same applies as with MPI-parallel programs, e.g. 8\*2000 MB = 16000 MB + `bsub -W 4:00 -n 8 -M 2000 myompprog` -<!-- --> - - bsub -W 4:00 -n 8 -M 2000 myompprog +LSF sets the user environment according to the environment at the time of submission. -LSF sets the user environment according to the environment at the time -of submission. - -Based on the given information the job scheduler puts your job into the -appropriate queue. These queues are subject to permanent changes. You -can check the current situation using the command `bqueues -l` . There -are a couple of rules and restrictions to balance the system loads. One -idea behind them is to prevent users from occupying the machines -unfairly. An indicator for the priority of a job placement in a queue is -therefore the ratio between used and granted CPU time for a certain +Based on the given information the job scheduler puts your job into the appropriate queue. These +queues are subject to permanent changes. You can check the current situation using the command +`bqueues -l` . There are a couple of rules and restrictions to balance the system loads. One idea +behind them is to prevent users from occupying the machines unfairly. An indicator for the priority +of a job placement in a queue is therefore the ratio between used and granted CPU time for a certain period. -`Attention`: If you do not give the maximum runtime of your program, the +**Attention:** If you do not give the maximum runtime of your program, the default runtime for the specified queue is taken. This is way below the maximal possible runtime (see table [below](#JobQueues)). -#AttentionNo2 `Attention #2`: Some systems enforce a limit on how much -memory each process and your job as a whole may allocate. If your job or -any of its processes exceed this limit (N proc.\*limit for the job), -your job will be killed. If memory limiting is in place, there also -exists a default limit which will be applied to your job if you do not -specify one. Please find the limits along with the description of the -machines' [queues](#JobQueues) below. - -#InteractiveJobs +**Attention 2:** Some systems enforce a limit on how much memory each process and your job as a +whole may allocate. If your job or any of its processes exceed this limit (N proc.\*limit for the +job), your job will be killed. If memory limiting is in place, there also exists a default limit +which will be applied to your job if you do not specify one. Please find the limits along with the +description of the machines' [queues](#JobQueues) below. ### Interactive Jobs @@ -115,16 +98,20 @@ extensive production runs! Use the bsub options `-Is` for an interactive and, additionally on *Atlas*, `-XF` for an X11 job like: - bsub -Is -XF matlab +```Bash +bsub -Is -XF matlab +``` or for an interactive job with a bash use - bsub -Is -n 2 -W <hh:mm> -P <project> bash +```Bash +bsub -Is -n 2 -W <hh:mm> -P <project> bash +``` You can check the current usage of the system with the command `bhosts` to estimate the time to schedule. -#ParallelJobs +## ParallelJobs ### Parallel Jobs @@ -132,21 +119,14 @@ For submitting parallel jobs, a few rules have to be understood and followed. In general they depend on the type of parallelization and the architecture. -#OpenMPJobs - #### OpenMP Jobs An SMP-parallel job can only run within a node (or a partition), so it is necessary to include the option `-R "span[hosts=1]"` . The maximum number of processors for an SMP-parallel program is 506 on a large Altix -partition, and 64 on \<tt>*Atlas*\</tt> . A simple example of a job file +partition, and 64 on *Atlas*. A simple example of a job file for an OpenMP job can be found above (section [3.4](#LSF-OpenMP)). -[Further information on pinning -threads.](RuntimeEnvironment#Placing_Threads_or_Processes_on) - -#MpiJobs - #### MPI Jobs There are major differences for submitting MPI-parallel jobs on the @@ -167,22 +147,26 @@ defined when the job array is created. Here is an example how an array job can looks like: - #!/bin/bash +```Bash +#!/bin/bash - #BSUB -W 00:10 - #BSUB -n 1 - #BSUB -J "myTask[1-100:2]" # create job array with 50 tasks - #BSUB -o logs/out.%J.%I # appends the standard output of the job to the specified file that - # contains the job information (%J) and the task information (%I) - #BSUB -e logs/err.%J.%I # appends the error output of the job to the specified file that - # contains the job information (%J) and the task information (%I) +#BSUB -W 00:10 +#BSUB -n 1 +#BSUB -J "myTask[1-100:2]" # create job array with 50 tasks +#BSUB -o logs/out.%J.%I # appends the standard output of the job to the specified file that + # contains the job information (%J) and the task information (%I) +#BSUB -e logs/err.%J.%I # appends the error output of the job to the specified file that + # contains the job information (%J) and the task information (%I) - echo "Hello Job $LSB_JOBID Task $LSB_JOBINDEX" +echo "Hello Job $LSB_JOBID Task $LSB_JOBINDEX" +``` Alternatively, you can use the following single command line to submit an array job: - bsub -n 1 -W 00:10 -J "myTask[1-100:2]" -o "logs/out.%J.%I" -e "logs/err.%J.%I" "echo Hello Job \$LSB_JOBID Task \$LSB_JOBINDEX" +```Bash +bsub -n 1 -W 00:10 -J "myTask[1-100:2]" -o "logs/out.%J.%I" -e "logs/err.%J.%I" "echo Hello Job \$LSB_JOBID Task \$LSB_JOBINDEX" +``` For further details please read the LSF manual. @@ -200,36 +184,36 @@ detailed information see the man pages of bsub with `man bsub`. Here is an example how a chain job can looks like: - #!/bin/bash - - #job parameters - time="4:00" - mem="rusage[mem=2000] span[host=1]" - n="8" - - #iteration parameters - start=1 - end=10 - i=$start - - #create chain job with 10 jobs - while [ "$i" -lt "`expr $end + 1`" ] - do - if [ "$i" -eq "$start" ];then - #create jobname - JOBNAME="${USER}_job_$i" - bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" <job> - else - #create jobname - OJOBNAME=$JOBNAME - JOBNAME="${USER}_job_$i" - #only start a job if the preceding job has the status done - bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" -w "done($OJOBNAME)" <job> - fi - i=`expr $i + 1` - done - -#JobQueues +```Bash +#!/bin/bash + +#job parameters +time="4:00" +mem="rusage[mem=2000] span[host=1]" +n="8" + +#iteration parameters +start=1 +end=10 +i=$start + +#create chain job with 10 jobs +while [ "$i" -lt "`expr $end + 1`" ] +do + if [ "$i" -eq "$start" ];then + #create jobname + JOBNAME="${USER}_job_$i" + bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" <job> + else + #create jobname + OJOBNAME=$JOBNAME + JOBNAME="${USER}_job_$i" + #only start a job if the preceding job has the status done + bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" -w "done($OJOBNAME)" <job> + fi + i=`expr $i + 1` +done +``` ## Job Queues @@ -251,12 +235,15 @@ The command `bhosts` shows the load on the hosts. For a more convenient overview the command `lsfshowjobs` displays information on the LSF status like this: - You have 1 running job using 64 cores - You have 1 pending job +```Bash +You have 1 running job using 64 cores +You have 1 pending job +``` and the command `lsfnodestat` displays the node and core status of machine like this: +```Bash # ------------------------------------------- nodes available: 714/714 nodes damaged: 0 @@ -269,7 +256,8 @@ jobs damaged: 0 \| # ------------------------------------------- -normal working cores: 2556 cores free for jobs: 265 \</pre> +normal working cores: 2556 cores free for jobs: 265 +``` The command `bjobs` allows to monitor your running jobs. It has the following options: @@ -287,15 +275,15 @@ following options: If you run code that regularily emits status or progress messages, using the command -`watch -n10 tail -n2 '*out'` +```Bash +watch -n10 tail -n2 '*out' +``` in your `$HOME/.lsbatch` directory is a very handy way to keep yourself informed. Note that this only works if you did not use the `-o` option of `bsub`, If you used `-o`, replace `*out` with the list of file names you passed to this very option. -#HostList - ## Host List The `bsub` option `-m` can be used to specify a list of hosts for @@ -305,5 +293,3 @@ execution. This is especially useful for memory intensive computations. Jupiter, saturn, and uranus have 4 GB RAM per core, mars only 1GB. So it makes sense to specify '-m "jupiter saturn uranus". - -\</noautolink> diff --git a/doc.zih.tu-dresden.de/docs/archive/RamDiskDocumentation.md b/doc.zih.tu-dresden.de/docs/archive/RamDiskDocumentation.md index d024d32032e64cb18f424400e6d12f96dea67e46..c7e50b20763d264214fe1ef11222739befe423ca 100644 --- a/doc.zih.tu-dresden.de/docs/archive/RamDiskDocumentation.md +++ b/doc.zih.tu-dresden.de/docs/archive/RamDiskDocumentation.md @@ -1,3 +1,5 @@ +# Ramdisk + ## Using parts of the main memory as a temporary file system On systems with a very large main memory, it is for some workloads very @@ -21,11 +23,15 @@ single ramdisk can be created (but you can create and delete a ramdisk multiple times during a job). You need to load the corresponding software module via - module load ramdisk +```Bash +module load ramdisk +``` Afterwards, the ramdisk can be created with the command - make-ramdisk «size of the ramdisk in GB» +```Bash +make-ramdisk «size of the ramdisk in GB» +``` The path to the ramdisk is fixed to `/ramdisks/«JOBID»`. @@ -36,7 +42,9 @@ provide a script that uses multiple threads to copy a directory tree. It can also be used to transfer single files but will only use one thread in this case. It is used as follows - parallel-copy.sh «source directory or file» «target directory» +```Bash +parallel-copy.sh «source directory or file» «target directory» +``` It is not specifically tailored to be used with the ramdisk. It can be used for any copy process between two locations. @@ -46,14 +54,16 @@ used for any copy process between two locations. A ramdisk will automatically be deleted at the end of the job. As an alternative, you can delete your own ramdisk via the command - kill-ramdisk +```Bash +kill-ramdisk +``` -. It is possible, that the deletion of the ramdisk fails. The reason for +It is possible, that the deletion of the ramdisk fails. The reason for this is typically that some process still has a file open within the ramdisk or that there is still a program using the ramdisk or having the ramdisk as its current path. Locating these processes, that block the destruction of the ramdisk is possible via using the command - - lsof +d /ramdisks/«JOBID» - --- Main.MichaelKluge - 2013-03-22 + +```Bash +lsof +d /ramdisks/«JOBID» +``` diff --git a/doc.zih.tu-dresden.de/docs/archive/StepByStepTaurus.md b/doc.zih.tu-dresden.de/docs/archive/StepByStepTaurus.md deleted file mode 100644 index 03aa8538a5e5a779f8a179d2dac5fb6ee4f1378b..0000000000000000000000000000000000000000 --- a/doc.zih.tu-dresden.de/docs/archive/StepByStepTaurus.md +++ /dev/null @@ -1,10 +0,0 @@ -# Step by step examples for working on Taurus - -(in development) - -- From Windows: - [login](Login#Prerequisites_for_Access_to_a_Linux_Cluster_From_a_Windows_Workstation) - and file transfer -- Short introductionary presentation on the module an job system on - taurus with focus on AI/ML: [Using taurus for - AI](%ATTACHURL%/Scads_-_Using_taurus_for_AI.pdf) diff --git a/doc.zih.tu-dresden.de/docs/archive/SystemVenus.md b/doc.zih.tu-dresden.de/docs/archive/SystemVenus.md index f8b7d14cc378a50c19a34ea758f12cebe882d8f3..94aa24f360633717694f131dca20f3ab4b79da9c 100644 --- a/doc.zih.tu-dresden.de/docs/archive/SystemVenus.md +++ b/doc.zih.tu-dresden.de/docs/archive/SystemVenus.md @@ -1,16 +1,9 @@ # Venus - - ## Information about the hardware Detailed information on the currect HPC hardware can be found -[here.](HardwareVenus) - -## Applying for Access to the System - -Project and login application forms for taurus are available -[here](Access). +[here](HardwareVenus.md). ## Login to the System @@ -18,63 +11,65 @@ Login to the system is available via ssh at `venus.hrsk.tu-dresden.de`. The RSA fingerprints of the Phase 2 Login nodes are: - MD5:63:65:c6:d6:4e:5e:03:9e:07:9e:70:d1:bc:b4:94:64 +```Bash +MD5:63:65:c6:d6:4e:5e:03:9e:07:9e:70:d1:bc:b4:94:64 +``` and - SHA256:Qq1OrgSCTzgziKoop3a/pyVcypxRfPcZT7oUQ3V7E0E - -You can find an list of fingerprints [here](Login#SSH_access). +```Bash +SHA256:Qq1OrgSCTzgziKoop3a/pyVcypxRfPcZT7oUQ3V7E0E +``` ## MPI -The installation of the Message Passing Interface on Venus (SGI MPT) -supports the MPI 2.2 standard (see `man mpi` ). There is no command like -`mpicc`, instead you just have to use the "serial" compiler (e.g. `icc`, -`icpc`, or `ifort`) and append `-lmpi` to the linker command line. +The installation of the Message Passing Interface on Venus (SGI MPT) supports the MPI 2.2 standard +(see `man mpi` ). There is no command like `mpicc`, instead you just have to use the "serial" +compiler (e.g. `icc`, `icpc`, or `ifort`) and append `-lmpi` to the linker command line. Example: - <span class='WYSIWYG_HIDDENWHITESPACE'> </span>% icc -o myprog -g -O2 -xHost myprog.c -lmpi<span class='WYSIWYG_HIDDENWHITESPACE'> </span> +```Bash +% icc -o myprog -g -O2 -xHost myprog.c -lmpi +``` Notes: -- C++ programmers: You need to link with both libraries: - `-lmpi++ -lmpi`. -- Fortran programmers: The MPI module is only provided for the Intel - compiler and does not work with gfortran. +- C++ programmers: You need to link with both libraries: + `-lmpi++ -lmpi`. +- Fortran programmers: The MPI module is only provided for the Intel + compiler and does not work with gfortran. -Please follow the following guidelines to run your parallel program -using the batch system on Venus. +Please follow the following guidelines to run your parallel program using the batch system on Venus. ## Batch system -Applications on an HPC system can not be run on the login node. They -have to be submitted to compute nodes with dedicated resources for the -user's job. Normally a job can be submitted with these data: +Applications on an HPC system can not be run on the login node. They have to be submitted to compute +nodes with dedicated resources for the user's job. Normally a job can be submitted with these data: -- number of CPU cores, -- requested CPU cores have to belong on one node (OpenMP programs) or - can distributed (MPI), -- memory per process, -- maximum wall clock time (after reaching this limit the process is - killed automatically), -- files for redirection of output and error messages, -- executable and command line parameters. +- number of CPU cores, +- requested CPU cores have to belong on one node (OpenMP programs) or + can distributed (MPI), +- memory per process, +- maximum wall clock time (after reaching this limit the process is + killed automatically), +- files for redirection of output and error messages, +- executable and command line parameters. -The batch sytem on Venus is Slurm. For general information on Slurm, -please follow [this link](Slurm). +The batch sytem on Venus is Slurm. For general information on Slurm, please follow +[this link](../jobs/Slurm.md). ### Submission of Parallel Jobs -The MPI library running on the UV is provided by SGI and highly -optimized for the ccNUMA architecture of this machine. +The MPI library running on the UV is provided by SGI and highly optimized for the ccNUMA +architecture of this machine. -On Venus, you can only submit jobs with a core number which is a -multiple of 8 (a whole CPU chip and 128 GB RAM). Parallel jobs can be -started like this: +On Venus, you can only submit jobs with a core number which is a multiple of 8 (a whole CPU chip and +128 GB RAM). Parallel jobs can be started like this: - <span class='WYSIWYG_HIDDENWHITESPACE'> </span>srun -n 16 a.out<span class='WYSIWYG_HIDDENWHITESPACE'> </span> +```Bash +srun -n 16 a.out +``` **Please note:** There are different MPI libraries on Taurus and Venus, so you have to compile the binaries specifically for their target. @@ -83,4 +78,4 @@ so you have to compile the binaries specifically for their target. - The large main memory on the system allows users to create ramdisks within their own jobs. The documentation on how to use these - ramdisks can be found [here](RamDiskDocumentation). + ramdisks can be found [here](RamDiskDocumentation.md). diff --git a/doc.zih.tu-dresden.de/docs/archive/TaurusII.md b/doc.zih.tu-dresden.de/docs/archive/TaurusII.md index 1517542e731665466bc8e25e17ba52f04cd25fb6..03fa87e0eb20045b39a74359206ba3aebf5df549 100644 --- a/doc.zih.tu-dresden.de/docs/archive/TaurusII.md +++ b/doc.zih.tu-dresden.de/docs/archive/TaurusII.md @@ -10,22 +10,17 @@ updated, and they will be merged with phase 2. Basic information for Taurus, phase 2: -- Please use the login nodes\<span class="WYSIWYG_TT"> - tauruslogin\[3-5\].hrsk.tu-dresden.de\</span> for the new system. +- Please use the login nodes `tauruslogin\[3-5\].hrsk.tu-dresden.de` for the new system. - We have mounted the same file systems like on our other HPC systems: - - /home/ - - /projects/ - - /sw - - Taurus phase 2 has it's own /scratch file system (capacity 2.5 - PB). + - `/home/` + - `/projects/` + - `/sw` + - Taurus phase 2 has it's own `/scratch` file system (capacity 2.5 PB). - All nodes have 24 cores. - Memory capacity is 64/128/256 GB per node. The batch system handles your requests like in phase 1. We have other memory-per-core limits! - Our 64 GPU nodes now have 2 cards with 2 GPUs, each. -For more details, please refer to our updated -[documentation](SystemTaurus). - Thank you for testing the system with us! Ulf Markwardt diff --git a/doc.zih.tu-dresden.de/docs/archive/UNICORERestAPI.md b/doc.zih.tu-dresden.de/docs/archive/UNICORERestAPI.md index 02cc0bf61c588ef6a8de4b739306bea862980fc6..3cc59e7beb48a69a2b939542b14fef28cf4047fc 100644 --- a/doc.zih.tu-dresden.de/docs/archive/UNICORERestAPI.md +++ b/doc.zih.tu-dresden.de/docs/archive/UNICORERestAPI.md @@ -15,6 +15,4 @@ Some useful examples of job submission via REST are available at: The base address for the Taurus system at the ZIH is: -<https://unicore.zih.tu-dresden.de:8080/TAURUS/rest/core> - --- Main.AlvaroAguilera - 2017-02-01 +unicore.zih.tu-dresden.de:8080/TAURUS/rest/core diff --git a/doc.zih.tu-dresden.de/docs/archive/VampirTrace.md b/doc.zih.tu-dresden.de/docs/archive/VampirTrace.md index eee845e9c4a778e58d55610a3090fc5b1c669900..76d267cf1d5eb7115dd26417b42638ca16e07040 100644 --- a/doc.zih.tu-dresden.de/docs/archive/VampirTrace.md +++ b/doc.zih.tu-dresden.de/docs/archive/VampirTrace.md @@ -2,7 +2,7 @@ VampirTrace is a performance monitoring tool, that produces tracefiles during a program run. These tracefiles can be analyzed and visualized by -the tool [Vampir](Compendium.Vampir). Vampir Supports lots of features +the tool [Vampir] **todo** Vampir. Vampir Supports lots of features e.g. - MPI, OpenMP, pthreads, and hybrid programs @@ -13,12 +13,14 @@ e.g. - Function filtering and grouping Only the basic usage is shown in this Wiki. For a comprehensive -VampirTrace user manual refer to the [VampirTrace -Website](http://www.tu-dresden.de/zih/vampirtrace). +VampirTrace user manual refer to the +[VampirTrace Website](http://www.tu-dresden.de/zih/vampirtrace). Before using VampirTrace, set up the correct environment with - module load vampirtrace +```Bash +module load vampirtrace +``` To make measurements with VampirTrace, the user's application program needs to be instrumented, i.e., at specific important points @@ -99,5 +101,3 @@ applications can be instrumented: By default, running a VampirTrace instrumented application should result in a tracefile in the current working directory where the application was executed. - --- Main.jurenz - 2009-12-17 diff --git a/doc.zih.tu-dresden.de/docs/archive/VenusOpen.md b/doc.zih.tu-dresden.de/docs/archive/VenusOpen.md deleted file mode 100644 index cfee2b3855f8a6f054858335b558f19cb454eb75..0000000000000000000000000000000000000000 --- a/doc.zih.tu-dresden.de/docs/archive/VenusOpen.md +++ /dev/null @@ -1,9 +0,0 @@ -# Venus open to HPC projects - -The new HPC server [Venus](SystemVenus) is open to all HPC projects -running on Mars with a quota of 20000 CPU h for testing the system. -Projects without access to Mars have to apply for the new resorce. - -To increase the CPU quota beyond this limit, a follow-up (but full) -proposal is needed. This should be done via the new project management -system.