diff --git a/doc.zih.tu-dresden.de/docs/access.md b/doc.zih.tu-dresden.de/docs/access.md index dd398784ebe822e6a4a55209e68a0e7872499f0a..e774a09124fe26155af269c1bf48ad0940b74fc8 100644 --- a/doc.zih.tu-dresden.de/docs/access.md +++ b/doc.zih.tu-dresden.de/docs/access.md @@ -3,8 +3,8 @@ ## SSH access Important note: ssh to Taurus is only possible from inside TU Dresden Campus. Users from outside -should use VPN ([see -here](https://tu-dresden.de/zih/dienste/service-katalog/arbeitsumgebung/zugang_datennetz/vpn)). +should use VPN +([see here](https://tu-dresden.de/zih/dienste/service-katalog/arbeitsumgebung/zugang_datennetz/vpn)). The recommended way to connect to the HPC login servers directly via ssh: @@ -17,4 +17,4 @@ during the access procedure. Accept the host verifying and enter your password. by login nodes in your Taurus home directory. This method requires two conditions: Linux OS, workstation within the campus network. For other options and details check the Login page. -Useful links: [Access](todo), [Project Request Form](todo), [Terms Of Use](todo) +Useful links: [Access]**todo link**, [Project Request Form](req_resources.md), [Terms Of Use]**todo link** diff --git a/doc.zih.tu-dresden.de/docs/data_management/datamanagement.md b/doc.zih.tu-dresden.de/docs/data_management/datamanagement.md index e28c628fb659290a865cdf41af335a4587e86010..2b59932e3882d80a8c6136982d0fcd7650103232 100644 --- a/doc.zih.tu-dresden.de/docs/data_management/datamanagement.md +++ b/doc.zih.tu-dresden.de/docs/data_management/datamanagement.md @@ -11,7 +11,7 @@ the same **data storage** or set of them, the same **set of software** (packages ### Taxonomy of File Systems As soon as you have access to Taurus you have to manage your data. The main concept of -working with data on Taurus bases on [Workspaces](../workspaces). Use it properly: +working with data on Taurus bases on [Workspaces](workspaces.md). Use it properly: * use a **/home** directory for the limited amount of personal data, simple examples and the results of calculations. The home directory is not a working directory! However, /home file system is @@ -29,14 +29,14 @@ afterwards. *Note:* Keep in mind that every working space has a storage duration (i.e. ssd - 30 days). Thus be careful with the expire date otherwise it could vanish. The core data of your project should be -[backed up](todo) and [archived](todo)(for the most [important](todo) data). +[backed up]**todo link** and [archived]**todo link**(for the most [important]**todo link** data). ### Backup The backup is a crucial part of any project. Organize it at the beginning of the project. If you will lose/delete your data in the "no back up" file systems it can not be restored! The backup on Taurus is **only** available in the **/home** and the **/projects** file systems! Backed up files -could be restored by the user. Details could be found [here](todo). +could be restored by the user. Details could be found [here]**todo link**. ### Folder Structure and Organizing Data @@ -49,21 +49,21 @@ project. We recommend following the rules for your work regarding: steps. * Naming Data: Keep short, but meaningful names; Keep standard file endings; File names don’t replace documentation and metadata; Use standards of your discipline; Make rules for your - project, document and keep them (See the [README recommendations](todo) below) + project, document and keep them (See the [README recommendations]**todo link** below) This is the example of an organisation (hierarchical) for the folder structure. Use it as a visual illustration of the above: **todo** Insert grapic *Organizing_Data-using_file_systems.png* -Keep in mind [input-process-output pattern](todo) for the work with folder structure. +Keep in mind [input-process-output pattern]**todo link** for the work with folder structure. ### README Recommendation -In general, [README](todo) is just simple general information of software/project that exists in the -same directory/repository of the project. README is used to explain the details project and the -**structure** of the project/folder in a short way. We recommend using readme as for entire project as -for every important folder in the project. +In general, [README]**todo link** is just simple general information of software/project that exists +in the same directory/repository of the project. README is used to explain the details project and +the **structure** of the project/folder in a short way. We recommend using readme as for entire +project as for every important folder in the project. Example of the structure for the README: Think first: What is calculated why? (Description); What is @@ -81,9 +81,9 @@ Version: ### Metadata -Another important aspect is the [Metadata](todo). It is sufficient to use [Metadata](todo) for your -project on Taurus. [Metadata standards](todo) will help to do it easier (i.e. [Dublin core](todo), -[OME](todo)) +Another important aspect is the [Metadata]**todo link**. It is sufficient to use [Metadata]**todo +link** for your project on Taurus. [Metadata standards]**todo link** will help to do it easier (i.e. +[Dublin core]**todo link**, [OME]**todo link**) ### Data Hygiene @@ -97,14 +97,14 @@ you don’t need throughout its lifecycle. As was written before the module concept is the basic concept for using software on Taurus. Uniformity of the project has to be achieved by using the same set of software on different levels. It could be done by using environments. There are two types of environments should be distinguished: -runtime environment (the project level, use scripts to load [modules](todo)), Python virtual +runtime environment (the project level, use scripts to load [modules]**todo link**), Python virtual environment. The concept of the environment will give an opportunity to use the same version of the software on every level of the project for every project member. ### Private individual and project modules files -[Private individual and project module files](todo) will be discussed in [chapter 7](todo). Project -modules list is a powerful instrument for effective teamwork. +[Private individual and project module files]**todo link** will be discussed in [chapter 7]**todo +link**. Project modules list is a powerful instrument for effective teamwork. ### Python virtual environment @@ -116,11 +116,11 @@ packages). **Vitualenv (venv)** is a standard Python tool to create isolated Python environments. We recommend using venv to work with Tensorflow and Pytorch on Taurus. It has been integrated into the -standard library under the [venv module](todo). **Conda** is the second way to use a virtual +standard library under the [venv module]**todo link**. **Conda** is the second way to use a virtual environment on the Taurus. Conda is an open-source package management system and environment management system from the Anaconda. -[Detailed information](todo) about using the virtual environment. +[Detailed information]**todo link** about using the virtual environment. ## Application Software Availability @@ -131,10 +131,10 @@ documented and gives the opportunity to comfort efficient and safe work. ## Access rights The concept of **permissions** and **ownership** is crucial in Linux. See the -[HPC-introduction](todo) slides for the understanding of the main concept. Standard Linux changing -permission command (i.e `chmod`) valid for Taurus as well. The **group** access level contains -members of your project group. Be careful with 'write' permission and never allow to change the -original data. +[HPC-introduction]**todo link** slides for the understanding of the main concept. Standard Linux +changing permission command (i.e `chmod`) valid for Taurus as well. The **group** access level +contains members of your project group. Be careful with 'write' permission and never allow to change +the original data. -Useful links: [Data Management](todo), [File Systems](todo), [Get Started with HPC-DA](todo), -[Project Management](todo), [Preservation research data[(todo) +Useful links: [Data Management]**todo link**, [File Systems]**todo link**, [Get Started with +HPC-DA]**todo link**, [Project Management]**todo link**, [Preservation research data[**todo link** diff --git a/doc.zih.tu-dresden.de/docs/data_moving.md b/doc.zih.tu-dresden.de/docs/data_moving.md index 1a84e92a8f97f0ffdcb5e2160b3837191c379520..a67c6b812091bcb2ced8a817499c1ccddf4fe1cb 100644 --- a/doc.zih.tu-dresden.de/docs/data_moving.md +++ b/doc.zih.tu-dresden.de/docs/data_moving.md @@ -34,4 +34,4 @@ command. Except for the 'dt' prefix, their syntax is the same as the shell comma Keep in mind: The warm_archive is not writable for jobs. However, you can store the data in the warm archive with the datamover. -Useful links: [Data Mover](todo), [Export Nodes](todo) +Useful links: [Data Mover]**todo link**, [Export Nodes]**todo link** diff --git a/doc.zih.tu-dresden.de/docs/index.md b/doc.zih.tu-dresden.de/docs/index.md index 871de3021d52a414041419b169b1dbcc2fc7f5dc..5cd74f0a279aca4339c9edb21676618953598c31 100644 --- a/doc.zih.tu-dresden.de/docs/index.md +++ b/doc.zih.tu-dresden.de/docs/index.md @@ -63,4 +63,6 @@ for this very project is to generate the static html using `mkdocs` and deploy t ### Contribute -Contributions are highly welcome. Please refere to [CONTRIBUTE.md](CONTRIBUTE.md) file of this project. +Contributions are highly welcome. Please refere to +[README.md](https://gitlab.hrz.tu-chemnitz.de/zih/hpc-compendium/hpc-compendium/-/blob/main/doc.zih.tu-dresden.de/README.md) +file of this project. diff --git a/doc.zih.tu-dresden.de/docs/software/containers.md b/doc.zih.tu-dresden.de/docs/software/containers.md index 06aedfe1d2e36b65531458dd6da995e4abc927c4..a5eb5244053d4add03eb04ef3316e45cb396b706 100644 --- a/doc.zih.tu-dresden.de/docs/software/containers.md +++ b/doc.zih.tu-dresden.de/docs/software/containers.md @@ -1,13 +1,13 @@ # Use of Containers -[Containerization](todo) encapsulating or packaging up software code and all its dependencies to run -uniformly and consistently on any infrastructure. On Taurus [Singularity](todo) used as a standard -container solution. Singularity enables users to have full control of their environment. This means -that you don’t have to ask an HPC support to install anything for you - you can put it in a -Singularity container and run! As opposed to Docker (the most famous container solution), -Singularity is much more suited to being used in an HPC environment and more efficient in many -cases. Docker containers can easily be used in Singularity. Information about the use of Singularity -on Taurus can be found [here](todo). +[Containerization]**todo link** encapsulating or packaging up software code and all its dependencies +to run uniformly and consistently on any infrastructure. On Taurus [Singularity]**todo link** used +as a standard container solution. Singularity enables users to have full control of their +environment. This means that you don’t have to ask an HPC support to install anything for you - you +can put it in a Singularity container and run! As opposed to Docker (the most famous container +solution), Singularity is much more suited to being used in an HPC environment and more efficient in +many cases. Docker containers can easily be used in Singularity. Information about the use of +Singularity on Taurus can be found [here]**todo link**. In some cases using Singularity requires a Linux machine with root privileges (e.g. using the ml partition), the same architecture and a compatible kernel. For many reasons, users on Taurus cannot @@ -15,11 +15,11 @@ be granted root permissions. A solution is a Virtual Machine (VM) on the ml part users to gain root permissions in an isolated environment. There are two main options on how to work with VM on Taurus: - 1. [VM tools](todo). Automative algorithms for using virtual machines; - 1. [Manual method](todo). It required more operations but gives you more flexibility and reliability. + 1. [VM tools]**todo link**. Automative algorithms for using virtual machines; + 1. [Manual method]**todo link**. It required more operations but gives you more flexibility and reliability. -Additional Information: Examples of the definition for the Singularity container ([here](todo)) and -some hints ([here](todo)). +Additional Information: Examples of the definition for the Singularity container ([here]**todo +link**) and some hints ([here]**todo link**). -Useful links: [Containers](todo), [Custom EasyBuild Environment](todo), [Virtual machine on -Taurus](todo) +Useful links: [Containers]**todo link**, [Custom EasyBuild Environment]**todo link**, [Virtual +machine on Taurus]**todo link** diff --git a/doc.zih.tu-dresden.de/docs/software/overview.md b/doc.zih.tu-dresden.de/docs/software/overview.md index 503c8e9b4057ac755f1512648f69f7cbc1071b98..db945921f7123eb342edc2d9c7bf0fb8be6ef834 100644 --- a/doc.zih.tu-dresden.de/docs/software/overview.md +++ b/doc.zih.tu-dresden.de/docs/software/overview.md @@ -1,19 +1,20 @@ # Overview -According to [What software do I need](todo), first of all, check the [Software module list](todo). -Keep in mind that there are two different environments: **scs5** (for the x86 architecture) and -**ml** (environment for the Machine Learning partition based on the Power9 architecture). +According to [What software do I need]**todo link**, first of all, check the [Software module +list]**todo link**. Keep in mind that there are two different environments: **scs5** (for the x86 +architecture) and **ml** (environment for the Machine Learning partition based on the Power9 +architecture). Work with the software on Taurus could be started only after allocating the resources by [batch -systems](todo). By default, you are in the login nodes. They are not specified for the work, only -for the login. Allocating resources will be done by batch system [SLURM](todo). +systems]**todo link**. By default, you are in the login nodes. They are not specified for the work, +only for the login. Allocating resources will be done by batch system [SLURM]**todo link**. There are a lot of different possibilities to work with software on Taurus: ## Modules Usage of software on HPC systems is managed by a **modules system**. Thus, it is crucial to -be familiar with the [modules concept and commands](../modules/modules). Modules are a way to use +be familiar with the [modules concept and commands](modules.md). Modules are a way to use frameworks, compilers, loader, libraries, and utilities. A module is a user interface that provides utilities for the dynamic modification of a user's environment without manual modifications. You could use them for `srun`, batch jobs (`sbatch`) and the Jupyterhub. @@ -21,15 +22,16 @@ could use them for `srun`, batch jobs (`sbatch`) and the Jupyterhub. ## JupyterNotebook The Jupyter Notebook is an open-source web application that allows creating documents containing -live code, equations, visualizations, and narrative text. There is [jupyterhub](todo) on Taurus, -where you can simply run your Jupyter notebook on HPC nodes using modules, preloaded or custom -virtual environments. Moreover, you can run a [manually created remote jupyter server](todo) for -more specific cases. +live code, equations, visualizations, and narrative text. There is [jupyterhub]**todo link** on +Taurus, where you can simply run your Jupyter notebook on HPC nodes using modules, preloaded or +custom virtual environments. Moreover, you can run a [manually created remote jupyter server]**todo +link** for more specific cases. ## Containers -Some tasks require using containers. It can be done on Taurus by [Singularity](todo). Details could -be found in the [following chapter](todo). +Some tasks require using containers. It can be done on Taurus by [Singularity]**todo link**. Details +could be found in the [following chapter]**todo link**. -Useful links: [Libraries](todo), [Deep Learning](todo), [Jupyter Hub](todo), [Big Data -Frameworks](todo), [R](todo), [Applications for various fields of science](todo) +Useful links: [Libraries]**todo link**, [Deep Learning]**todo link**, [Jupyter Hub]**todo link**, +[Big Data Frameworks]**todo link**, [R]**todo link**, [Applications for various fields of +science]**todo link** diff --git a/doc.zih.tu-dresden.de/docs/specific_software.md b/doc.zih.tu-dresden.de/docs/specific_software.md index 37e7bba8acc5933365d8561ac9660d086e7d679c..fd98e303e5448ae7ce128ddfbc4e78c63e754075 100644 --- a/doc.zih.tu-dresden.de/docs/specific_software.md +++ b/doc.zih.tu-dresden.de/docs/specific_software.md @@ -4,29 +4,29 @@ The modular concept is the easiest way to work with the software on Taurus. It allows to user to switch between different versions of installed programs and provides utilities for the dynamic -modification of a user's environment. The information can be found [here](todo). +modification of a user's environment. The information can be found [here]**todo link**. ### Private project and user modules files -[Private project module files](todo) allow you to load your group-wide installed software into your -environment and to handle different versions. It allows creating your own software environment for -the project. You can create a list of modules that will be loaded for every member of the team. It -gives opportunity on unifying work of the team and defines the reproducibility of results. Private -modules can be loaded like other modules with module load. +[Private project module files]**todo link** allow you to load your group-wide installed software +into your environment and to handle different versions. It allows creating your own software +environment for the project. You can create a list of modules that will be loaded for every member +of the team. It gives opportunity on unifying work of the team and defines the reproducibility of +results. Private modules can be loaded like other modules with module load. -[Private user module files](todo) allow you to load your own installed software into your +[Private user module files]**todo link** allow you to load your own installed software into your environment. It works in the same manner as to project modules but for your private use. ## Use of containers -[Containerization](todo) encapsulating or packaging up software code and all its dependencies to run -uniformly and consistently on any infrastructure. On Taurus [Singularity](todo) used as a standard -container solution. Singularity enables users to have full control of their environment. This means -that you don’t have to ask an HPC support to install anything for you - you can put it in a -Singularity container and run! As opposed to Docker (the most famous container solution), -Singularity is much more suited to being used in an HPC environment and more efficient in many -cases. Docker containers can easily be used in Singularity. Information about the use of Singularity -on Taurus can be found [here](todo). +[Containerization]**todo link** encapsulating or packaging up software code and all its dependencies +to run uniformly and consistently on any infrastructure. On Taurus [Singularity]**todo link** used +as a standard container solution. Singularity enables users to have full control of their +environment. This means that you don’t have to ask an HPC support to install anything for you - you +can put it in a Singularity container and run! As opposed to Docker (the most famous container +solution), Singularity is much more suited to being used in an HPC environment and more efficient in +many cases. Docker containers can easily be used in Singularity. Information about the use of +Singularity on Taurus can be found [here]**todo link**. In some cases using Singularity requires a Linux machine with root privileges (e.g. using the ml partition), the same architecture and a compatible kernel. For many reasons, users on Taurus cannot @@ -34,11 +34,11 @@ be granted root permissions. A solution is a Virtual Machine (VM) on the ml part users to gain root permissions in an isolated environment. There are two main options on how to work with VM on Taurus: - 1. [VM tools](todo). Automative algorithms for using virtual machines; - 1. [Manual method](todo). It required more operations but gives you more flexibility and reliability. + 1. [VM tools]**todo link**. Automative algorithms for using virtual machines; + 1. [Manual method]**todo link**. It required more operations but gives you more flexibility and reliability. -Additional Information: Examples of the definition for the Singularity container ([here](todo)) and -some hints ([here](todo)). +Additional Information: Examples of the definition for the Singularity container ([here]**todo +link**) and some hints ([here]**todo link**). -Useful links: [Containers](todo), [Custom EasyBuild Environment](todo), [Virtual machine on -Taurus](todo) +Useful links: [Containers]**todo link**, [Custom EasyBuild Environment]**todo link**, [Virtual +machine on Taurus]**todo link** diff --git a/doc.zih.tu-dresden.de/docs/support.md b/doc.zih.tu-dresden.de/docs/support.md index 8ba4d682f6dc4b29717d6a3ba3d587359d3c0396..d85f71226115f277cef27bdb6841e276e85ec1d9 100644 --- a/doc.zih.tu-dresden.de/docs/support.md +++ b/doc.zih.tu-dresden.de/docs/support.md @@ -1,19 +1,19 @@ # What if everything didn't help? -## Create a ticket: how do I do that? +## Create a Ticket: how do I do that? The best way to ask about the help is to create a ticket. In order to do that you have to write a message to the <a href="mailto:hpcsupport@zih.tu-dresden.de">hpcsupport@zih.tu-dresden.de</a> with a -detailed description of your problem. If it possible please add logs, used environment and write a +detailed description of your problem. If possible please add logs, used environment and write a minimal executable example for the purpose to recreate the error or issue. ## Communication with HPC Support There is the HPC support team who is responsible for the support of HPC users and stable work of the -cluster. You could find the [details](todo) in the right part of any page of the compendium. +cluster. You could find the [details]**todo link** in the right part of any page of the compendium. However, please, before the contact with the HPC support team check the documentation carefully -(starting points: [main page](todo), [HPC-DA](todo)), use a search and then create a ticket. The -ticket is a preferred way to solve the issue, but in some terminable cases, you can call to ask for -help. +(starting points: [main page]**todo link**, [HPC-DA]**todo link**), use a search and then create a +ticket. The ticket is a preferred way to solve the issue, but in some terminable cases, you can call +to ask for help. -Useful link: [Further Documentation](todo) +Useful link: [Further Documentation]**todo link** diff --git a/doc.zih.tu-dresden.de/docs/use_of_hardware.md b/doc.zih.tu-dresden.de/docs/use_of_hardware.md index 5900c92d0faae343a6fe8b75f701a7e606138b95..605c9561e8ca41020bc89f6ce04a3bf367b99997 100644 --- a/doc.zih.tu-dresden.de/docs/use_of_hardware.md +++ b/doc.zih.tu-dresden.de/docs/use_of_hardware.md @@ -2,8 +2,9 @@ To run the software, do some calculations or compile your code compute nodes have to be used. Login nodes which are using for login can not be used for your computations. Submit your tasks (by using -[jobs](todo)) to compute nodes. The [Slurm](todo) (scheduler to handle your jobs) is using on Taurus -for this purposes. [HPC Introduction](todo) is a good resource to get started with it. +[jobs]**todo link**) to compute nodes. The [Slurm](jobs/index.md) (scheduler to handle your jobs) is +using on Taurus for this purposes. [HPC Introduction]**todo link** is a good resource to get started +with it. ## What do I need a CPU or GPU? @@ -14,9 +15,9 @@ a single GPU's core can handle is small), GPUs are not as versatile as CPUs. ## Selection of Suitable Hardware -Available [hardware](todo): Normal compute nodes (Haswell[[64](todo), [128](todo), [256](todo)], -Broadwell, [Rome](todo)), Large SMP nodes, Accelerator(GPU) nodes: (gpu2 partition, [ml -partition](todo)). +Available [hardware]**todo link**: Normal compute nodes (Haswell[[64]**todo link**, [128]**todo link**, +[256]**todo link**], Broadwell, [Rome]**todo link**), Large SMP nodes, Accelerator(GPU) nodes: (gpu2 +partition, [ml partition]**todo link**). The exact partition could be specified by `-p` flag with the srun command or in your batch job. @@ -33,11 +34,11 @@ perfect for this task. **OpenMP jobs:** An SMP-parallel job can only run **within a node**, so it is necessary to include the options `-N 1` and `-n 1`. Using `--cpus-per-task N` Slurm will start one task and you will have N CPUs. -The maximum number of processors for an SMP-parallel program is 896 on Taurus ([SMP](todo) island). +The maximum number of processors for an SMP-parallel program is 896 on Taurus ([SMP]**todo link** island). **GPUs** partitions are best suited for **repetitive** and **highly-parallel** computing tasks. If -you have a task with potential [data parallelism](todo) most likely that you need the GPUs. Beyond -video rendering, GPUs excel in tasks such as machine learning, financial simulations and risk +you have a task with potential [data parallelism]**todo link** most likely that you need the GPUs. +Beyond video rendering, GPUs excel in tasks such as machine learning, financial simulations and risk modeling. Use the gpu2 and ml partition only if you need GPUs! Otherwise using the x86 partitions (e.g Haswell) most likely would be more beneficial. @@ -49,13 +50,14 @@ with the `--x11` option. To use an interactive job you have to specify `-X` flag However, using srun directly on the shell will lead to blocking and launch an interactive job. Apart from short test runs, it is recommended to launch your jobs into the background by using batch jobs. For that, you can conveniently put the parameters directly into the job file which you can submit -using sbatch [options] <job file>. +using `sbatch [options] <job file>`. -## Processing of data for input and output +## Processing of Data for Input and Output Pre-processing and post-processing of the data is a crucial part for the majority of data-dependent projects. The quality of this work influence on the computations. However, pre- and post-processing in many cases can be done completely or partially on a local pc and then transferred to the Taurus. Please use Taurus for the computation-intensive tasks. -Useful links: [Batch Systems](todo), [Hardware Taurus](todo), [HPC-DA](todo), [Slurm](todo) +Useful links: [Batch Systems]**todo link**, [Hardware Taurus]**todo link**, [HPC-DA]**todo link**, +[Slurm]**todo link**