diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md index 615282e59c23aa93844116a5a58939274bf5f12f..17394c63f91dc6536cc6351a91d52f6a972cb278 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md @@ -180,7 +180,7 @@ Useful options: To list your personal filesystem usage and limits (quota), invoke ```console -marie@login$ lfs quota -h -u $LOGIN <filesystem> +marie@login$ lfs quota -h -u $USER <filesystem> ``` Useful options: diff --git a/doc.zih.tu-dresden.de/docs/index.md b/doc.zih.tu-dresden.de/docs/index.md index c22ef202a4408ab09d938219fa9be8b896cd7ae1..ab6e8ae4f7760d5fc768bf366b55a0e6289cc5c1 100644 --- a/doc.zih.tu-dresden.de/docs/index.md +++ b/doc.zih.tu-dresden.de/docs/index.md @@ -42,3 +42,8 @@ We offer a rich and colorful bouquet of courses from classical *HPC introduction for a detailed overview of the courses and the respective dates at ZIH. * [HPC introduction slides](misc/HPC-Introduction.pdf) (Nov. 2022) + +Furthermore, Center for Scalable Data Analytics and Artificial Intelligence +[ScaDS.AI](https://scads.ai) Dresden/Leipzig offers various trainings with HPC focus. +Current schedule and registration is available at the +[ScaDS.AI trainings page](https://scads.ai/transfer-2/teaching-and-training/). diff --git a/doc.zih.tu-dresden.de/docs/software/cicd.md b/doc.zih.tu-dresden.de/docs/software/cicd.md new file mode 100644 index 0000000000000000000000000000000000000000..7294622a292acb18d586ccee3108332f6e555272 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/software/cicd.md @@ -0,0 +1,126 @@ +# CI/CD on HPC + +We provide a **GitLab Runner** that allows you to run a GitLab pipeline on the ZIH systems. With +that you can continuously build, test, and benchmark your HPC software in the target environment. + +## Requirements + +- You (and ideally every involved developer) need an [HPC-Login](../application/overview.md). +- You manage your source code in a repository at the [TU Chemnitz GitLab instance](https://gitlab.hrz.tu-chemnitz.de) + +## Setup process + +1. Open your repository in the browser. + +2. Hover *Settings* and then click on *CI/CD* + +  + { align=center } + +3. *Expand* the *Runners* section + +  + { align=center } + +4. Copy the *registration token* + +  + { align=center } + +5. Now, you can request the registration of your repository with the + [HPC-Support](../support/support.md). In the ticket, you need to add the URL of the GitLab + repository and the registration token. + +!!! warning + + At the moment, only repositories hosted at the TU Chemnitz GitLab are supported. + +## GitLab pipelines + +As the ZIH provides the CI/CD as an GitLab runner, you can run any pipeline already working on other +runners with the CI/CD at the ZIH systems. This also means, to configure the actual steps performed +once your pipeline runs, you need to define the `.gitlab-ci.yml` file in the root of your +repository. There is a [comprehensive +documentation](https://gitlab.hrz.tu-chemnitz.de/help/ci/index.md) and a [reference for the +`.gitlab-ci.yml` file](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index) available at every +GitLab instance. There's also a [quick start +guide](https://gitlab.hrz.tu-chemnitz.de/help/ci/quick_start/index.md). + +The main difference to other GitLab runner is that every pipeline jobs will be scheduled as an +individual HPC job on the ZIH systems. Therefore, an important aspect is the possibility to set +Slurm parameters. While scheduling jobs allows to run code directly on the target system, it also +means that a single pipeline has to wait for resource allocation. Hence, you want to restrict, +which commits will run the complete pipeline, or which commits only run a part of the pipeline. + +### Passing Slurm parameters + +You can pass Slurm parameters via the [`variables` +keyword](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index#variables), either globally for the +whole yaml file, or on a per-job base. + +Use the variable `SCHEDULER_PARAMETERS` and define the same parameters you would use for [`srun` or +`sbatch`](../jobs_and_resources/slurm.md). + +!!! warning + + The parameters `--job-name`, `--output`, and `--wait` are handled by the GitLab runner and must + not be used. If used, the run will fail. + +!!! tip + + Make sure to set the `--account` such that the allocation of HPC resources is accounted + correctly. + +!!! example + + The following YAML file defines a configuration section `.test-job`, and two jobs, + `test-job-haswell` and `test-job-ml`, extending from that. The two job share the + `before_script`, `script`, and `after_script` configuration, but differ in the + `SCHEDULER_PARAMETERS`. The `test-job-haswell` and `test-job-ml` are scheduled on the partition + `haswell` and partition `ml`, respectively. + + ``` yaml + .test-job: + before_script: + - date + - pwd + - hostname + script: + - date + - pwd + - hostname + after_script: + - date + - pwd + - hostname + + test-job-haswell: + extends: .test-job + variables: + SCHEDULER_PARAMETERS: -p haswell + + + test-job-ml: + extends: .test-job + variables: + SCHEDULER_PARAMETERS: -p ml + ``` + +## Current limitations + +- Every runner job is currently limited to **one hour**. Once this time limit passes, the runner job + gets canceled regardless of the requested runtime from Slurm. This time *includes* the waiting + time for HPC resources. + +## Pitfalls and Recommendations + +- While the [`before_script`](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index#before_script) + and [`script`](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index#script) array of commands are + executed on the allocated resources, the + [`after_script`](https://gitlab.hrz.tu-chemnitz.de/help/ci/yaml/index#after_script) runs on the + GitLab runner node. We recommend that you do not use `after_script`. + +- It is likely that all your runner jobs will be executed in a slightly different directory on the + shared filesystem. Some build systems, for example CMake, expect that the configure and build is + executed in the same directory. In this case, we recommend to use one job for configure and + build. diff --git a/doc.zih.tu-dresden.de/docs/software/misc/menu12_en.png b/doc.zih.tu-dresden.de/docs/software/misc/menu12_en.png new file mode 100644 index 0000000000000000000000000000000000000000..a46d3b590299ef478d3afb077b45fe8697480570 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/menu12_en.png differ diff --git a/doc.zih.tu-dresden.de/docs/software/misc/menu3_en.png b/doc.zih.tu-dresden.de/docs/software/misc/menu3_en.png new file mode 100644 index 0000000000000000000000000000000000000000..54356995d0a898fe8673d7c6d13ce57b8749df00 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/menu3_en.png differ diff --git a/doc.zih.tu-dresden.de/docs/software/misc/menu4_en.png b/doc.zih.tu-dresden.de/docs/software/misc/menu4_en.png new file mode 100644 index 0000000000000000000000000000000000000000..caebd286dd951d5865e5baf16596877e1e31c1d4 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/menu4_en.png differ diff --git a/doc.zih.tu-dresden.de/docs/software/nanoscale_simulations.md b/doc.zih.tu-dresden.de/docs/software/nanoscale_simulations.md index d7a90eb4bf0af3fe417fe9e6c89d7c44a400be28..392124c49f16e5fc4c2dcb1774782d70180682c6 100644 --- a/doc.zih.tu-dresden.de/docs/software/nanoscale_simulations.md +++ b/doc.zih.tu-dresden.de/docs/software/nanoscale_simulations.md @@ -128,6 +128,8 @@ However hereafter we have an example on how that might look like for Gaussian: #SBATCH --ntasks=1 #SBATCH --constraint=fs_lustre_ssd #SBATCH --cpus-per-task=24 + #SBATCH --mem-per-cpu 2050 + # only 2050 MB RAM for haswell, as Gaussian somehow crashes if we try using the full 2541 of it # Load the software you need here module purge diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index 3e424e9894e8ee37e91de55edfd3ecc4d679a239..6b59050a90bdec12d51ea958b69252047bfee303 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -54,6 +54,7 @@ nav: - Singularity for Power9 Architecture: software/singularity_power9.md - Virtual Machines: software/virtual_machines.md - GPU-accelerated Containers for Deep Learning (NGC Containers): software/ngc_containers.md + - CI/CD: software/cicd.md - External Licenses: software/licenses.md - Computational Fluid Dynamics (CFD): software/cfd.md - Mathematics Applications: software/mathematics.md