diff --git a/doc.zih.tu-dresden.de/README.md b/doc.zih.tu-dresden.de/README.md index f1d0e97563caae06b8859b8c0632e7dacc2fb641..bf1b82f52a145f959068fa063d9dbdf31fb2eae3 100644 --- a/doc.zih.tu-dresden.de/README.md +++ b/doc.zih.tu-dresden.de/README.md @@ -41,8 +41,6 @@ Now, create a local clone of your fork #### Install Dependencies See [Installation with Docker](#preview-using-mkdocs-with-dockerfile). -**TODO:** virtual environment -**TODO:** What we need for markdownlinter and checks? <!--- All branches are protected, i.e., only ZIH staff can create branches and push to them ---> diff --git a/doc.zih.tu-dresden.de/docs/index.md b/doc.zih.tu-dresden.de/docs/index.md index 60d43b4e73f285901931f652c55aedabc393c451..60f6f081cf4a1c2ea76663bccd65e9ff866597fb 100644 --- a/doc.zih.tu-dresden.de/docs/index.md +++ b/doc.zih.tu-dresden.de/docs/index.md @@ -26,4 +26,4 @@ Contributions from user-side are highly welcome. Please find out more in our [gu **2021-10-05** Offline-maintenance (black building test) -**2021-09-29** Introduction to HPC at ZIH ([slides](misc/HPC-Introduction.pdf)) +**2021-09-29** Introduction to HPC at ZIH ([HPC introduction slides](misc/HPC-Introduction.pdf)) diff --git a/doc.zih.tu-dresden.de/docs/software/custom_easy_build_environment.md b/doc.zih.tu-dresden.de/docs/software/custom_easy_build_environment.md index d482d89a45a3849054af19a75ccaf64daeb6e9eb..3a0bc91ab60320f00911fb6bfe8cb07eb23c5e85 100644 --- a/doc.zih.tu-dresden.de/docs/software/custom_easy_build_environment.md +++ b/doc.zih.tu-dresden.de/docs/software/custom_easy_build_environment.md @@ -1,133 +1,155 @@ # EasyBuild -Sometimes the \<a href="SoftwareModulesList" target="\_blank" -title="List of Modules">modules installed in the cluster\</a> are not -enough for your purposes and you need some other software or a different -version of a software. - -\<br />For most commonly used software, chances are high that there is -already a *recipe* that EasyBuild provides, which you can use. But what -is Easybuild? - -\<a href="<https://easybuilders.github.io/easybuild/>" -target="\_blank">EasyBuild\</a>\<span style="font-size: 1em;"> is the -software used to build and install software on, and create modules for, -Taurus.\</span> - -\<span style="font-size: 1em;">The aim of this page is to introduce -users to working with EasyBuild and to utilizing it to create -modules**.**\</span> - -**Prerequisites:** \<a href="Login" target="\_blank">access\</a> to the -Taurus system and basic knowledge about Linux, \<a href="SystemTaurus" -target="\_blank" title="SystemTaurus">Taurus\</a> and the \<a -href="RuntimeEnvironment" target="\_blank" -title="RuntimeEnvironment">modules system \</a>on Taurus. - -\<span style="font-size: 1em;">EasyBuild uses a configuration file -called recipe or "EasyConfig", which contains all the information about -how to obtain and build the software:\</span> +Sometimes the [modules](modules.md) installed in the cluster are not enough for your purposes and +you need some other software or a different version of a software. + +For most commonly used software, chances are high that there is already a *recipe* that EasyBuild +provides, which you can use. But what is EasyBuild? + +[EasyBuild](https://easybuild.io/) is the software used to build and install +software on ZIH systems. + +The aim of this page is to introduce users to working with EasyBuild and to utilizing it to create +modules. + +## Prerequisites + +1. [Shell access](../access/ssh_login.md) to ZIH systems +1. basic knowledge about: + - [the ZIH system](../jobs_and_resources/hardware_overview.md) + - [the module system](modules.md) on ZIH systems + +EasyBuild uses a configuration file called recipe or "EasyConfig", which contains all the +information about how to obtain and build the software: - Name - Version - Toolchain (think: Compiler + some more) - Download URL -- Buildsystem (e.g. configure && make or cmake && make) +- Buildsystem (e.g. `configure && make` or `cmake && make`) - Config parameters - Tests to ensure a successful build -The "Buildsystem" part is implemented in so-called "EasyBlocks" and -contains the common workflow. Sometimes those are specialized to -encapsulate behaviour specific to multiple/all versions of the software. -\<span style="font-size: 1em;">Everything is written in Python, which -gives authors a great deal of flexibility.\</span> +The build system part is implemented in so-called "EasyBlocks" and contains the common workflow. +Sometimes, those are specialized to encapsulate behaviour specific to multiple/all versions of the +software. Everything is written in Python, which gives authors a great deal of flexibility. ## Set up a custom module environment and build your own modules -Installation of the new software (or version) does not require any -specific credentials. +Installation of the new software (or version) does not require any specific credentials. -\<br />Prerequisites: 1 An existing EasyConfig 1 a place to put your -modules. \<span style="font-size: 1em;">Step by step guide:\</span> +### Prerequisites -1\. Create a \<a href="WorkSpaces" target="\_blank">workspace\</a> where -you'll install your modules. You need a place where your modules will be -placed. This needs to be done only once : +1. An existing EasyConfig +1. a place to put your modules. - ws_allocate -F scratch EasyBuild 50 # +### Step by step guide -2\. Allocate nodes. You can do this with interactive jobs (see the -example below) and/or put commands in a batch file and source it. The -latter is recommended for non-interactive jobs, using the command sbatch -in place of srun. For the sake of illustration, we use an interactive -job as an example. The node parameters depend, to some extent, on the -architecture you want to use. ML nodes for the Power9 and others for the -x86. We will use Haswell nodes. +**Step 1:** Create a [workspace](../data_lifecycle/workspaces.md#allocate-a-workspace) where you +install your modules. You need a place where your modules are placed. This needs to be done only +once: - srun -p haswell -N 1 -c 4 --time=08:00:00 --pty /bin/bash +```console +marie@login$ ws_allocate -F scratch EasyBuild 50 +marie@login$ ws_list | grep 'directory.*EasyBuild' + workspace directory : /scratch/ws/1/marie-EasyBuild +``` -\*Using EasyBuild on the login nodes is not allowed\* +**Step 2:** Allocate nodes. You can do this with interactive jobs (see the example below) and/or +put commands in a batch file and source it. The latter is recommended for non-interactive jobs, +using the command `sbatch` instead of `srun`. For the sake of illustration, we use an +interactive job as an example. Depending on the partitions that you want the module to be usable on +later, you need to select nodes with the same architecture. Thus, use nodes from partition ml for +building, if you want to use the module on nodes of that partition. In this example, we assume +that we want to use the module on nodes with x86 architecture und thus, Haswell nodes will be used. -3\. Load EasyBuild module. +```console +marie@login$ srun --partition=haswell --nodes=1 --cpus-per-task=4 --time=08:00:00 --pty /bin/bash -l +``` - module load EasyBuild +!!! warning -\<br />4. Specify Workspace. The rest of the guide is based on it. -Please create an environment variable called \`WORKSPACE\` with the -location of your Workspace: + Using EasyBuild on the login nodes is not allowed. - WORKSPACE=<location_of_your_workspace> # For example: WORKSPACE=/scratch/ws/anpo879a-EasyBuild +**Step 3:** Specify the workspace. The rest of the guide is based on it. Please create an +environment variable called `WORKSPACE` with the path to your workspace: -5\. Load the correct modenv according to your current or target -architecture: \`ml modenv/scs5\` for x86 (default) or \`modenv/ml\` for -Power9 (ml partition). Load EasyBuild module +```console +marie@compute$ export WORKSPACE=/scratch/ws/1/marie-EasyBuild #see output of ws_list above +``` - ml modenv/scs5 - module load EasyBuild +**Step 4:** Load the correct module environment `modenv` according to your current or target +architecture: -6\. Set up your environment: +=== "x86 (default, e. g. partition haswell)" + ```console + marie@compute$ module load modenv/scs5 + ``` +=== "Power9 (partition ml)" + ```console + marie@ml$ module load modenv/ml + ``` - export EASYBUILD_ALLOW_LOADED_MODULES=EasyBuild,modenv/scs5 - export EASYBUILD_DETECT_LOADED_MODULES=unload - export EASYBUILD_BUILDPATH="/tmp/${USER}-EasyBuild${SLURM_JOB_ID:-}" - export EASYBUILD_SOURCEPATH="${WORKSPACE}/sources" - export EASYBUILD_INSTALLPATH="${WORKSPACE}/easybuild-$(basename $(readlink -f /sw/installed))" - export EASYBUILD_INSTALLPATH_MODULES="${EASYBUILD_INSTALLPATH}/modules" - module use "${EASYBUILD_INSTALLPATH_MODULES}/all" - export LMOD_IGNORE_CACHE=1 +**Step 5:** Load module `EasyBuild` -7\. \<span style="font-size: 13px;">Now search for an existing -EasyConfig: \</span> +```console +marie@compute$ module load EasyBuild +``` - eb --search TensorFlow +**Step 6:** Set up your environment: -\<span style="font-size: 13px;">8. Build the EasyConfig and its -dependencies\</span> +```console +marie@compute$ export EASYBUILD_ALLOW_LOADED_MODULES=EasyBuild,modenv/scs5 +marie@compute$ export EASYBUILD_DETECT_LOADED_MODULES=unload +marie@compute$ export EASYBUILD_BUILDPATH="/tmp/${USER}-EasyBuild${SLURM_JOB_ID:-}" +marie@compute$ export EASYBUILD_SOURCEPATH="${WORKSPACE}/sources" +marie@compute$ export EASYBUILD_INSTALLPATH="${WORKSPACE}/easybuild-$(basename $(readlink -f /sw/installed))" +marie@compute$ export EASYBUILD_INSTALLPATH_MODULES="${EASYBUILD_INSTALLPATH}/modules" +marie@compute$ module use "${EASYBUILD_INSTALLPATH_MODULES}/all" +marie@compute$ export LMOD_IGNORE_CACHE=1 +``` - eb TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4.eb -r +**Step 7:** Now search for an existing EasyConfig: -\<span style="font-size: 13px;">After this is done (may take A LONG -time), you can load it just like any other module.\</span> +```console +marie@compute$ eb --search TensorFlow +``` -9\. To use your custom build modules you only need to rerun step 4, 5, 6 -and execute the usual: +**Step 8:** Build the EasyConfig and its dependencies (option `-r`) - module load <name_of_your_module> # For example module load TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4 +```console +marie@compute$ eb TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4.eb -r +``` -The key is the \`module use\` command which brings your modules into -scope so \`module load\` can find them and the LMOD_IGNORE_CACHE line -which makes LMod pick up the custom modules instead of searching the +This may take a long time. After this is done, you can load it just like any other module. + +**Step 9:** To use your custom build modules you only need to rerun steps 3, 4, 5, 6 and execute +the usual: + +```console +marie@compute$ module load TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4 #replace with the name of your module +``` + +The key is the `module use` command, which brings your modules into scope, so `module load` can find +them. The `LMOD_IGNORE_CACHE` line makes `LMod` pick up the custom modules instead of searching the system cache. ## Troubleshooting -When building your EasyConfig fails, you can first check the log -mentioned and scroll to the bottom to see what went wrong. +When building your EasyConfig fails, you can first check the log mentioned and scroll to the bottom +to see what went wrong. + +It might also be helpful to inspect the build environment EasyBuild uses. For that you can run: + +```console +marie@compute$ eb myEC.eb --dump-env-script` +``` + +This command creates a sourceable `.env`-file with `module load` and `export` commands that show +what EasyBuild does before running, e.g., the configuration step. -It might also be helpful to inspect the build environment EB uses. For -that you can run \`eb myEC.eb --dump-env-script\` which creates a -sourceable .env file with \`module load\` and \`export\` commands that -show what EB does before running, e.g., the configure step. +It might also be helpful to use -It might also be helpful to use '\<span style="font-size: 1em;">export -LMOD_IGNORE_CACHE=0'\</span> +```console +marie@compute$ export LMOD_IGNORE_CACHE=0 +``` diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index 4efbb60c85f44b6cb8d80c33cfb251c7a52003a3..8867e2df2618e2b1a7fea1f19069f9cfca995f2e 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -69,7 +69,6 @@ nav: - PAPI Library: software/papi.md - Pika: software/pika.md - Perf Tools: software/perf_tools.md - - Score-P: software/scorep.md - Vampir: software/vampir.md - Data Life Cycle Management: - Overview: data_lifecycle/overview.md diff --git a/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh b/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh index e4786c07e52177ba9a19bf7e5b571ac0d9057fb6..38e9015599922fdcec93fecebb9fd638cfa576d8 100755 --- a/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh +++ b/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh @@ -23,7 +23,7 @@ s \<SLURM\> doc.zih.tu-dresden.de/docs/contrib/content_rules.md i file \+system HDFS Use \"ZIH systems\" or \"ZIH system\" instead of \"Taurus\". \"taurus\" is only allowed when used in ssh commands and other very specific situations. -doc.zih.tu-dresden.de/docs/contrib/content_rules.md +doc.zih.tu-dresden.de/docs/contrib/content_rules.md doc.zih.tu-dresden.de/docs/archive/phase2_migration.md i \<taurus\> taurus\.hrsk /taurus /TAURUS ssh ^[0-9]\+:Host taurus$ \"HRSKII\" should be avoided, use \"ZIH system\" instead. doc.zih.tu-dresden.de/docs/contrib/content_rules.md @@ -35,13 +35,13 @@ i hpc[ -]\+da\> i attachurl Replace \"todo\" with real content. - +doc.zih.tu-dresden.de/docs/archive/system_triton.md i \<todo\> <!--.*todo.*--> -Replace \"Coming soon\" with real content. +Replace variations of \"Coming soon\" with real content. -i \<coming soon\> +i \(\<coming soon\>\|This .* under construction\|posted here\) Avoid spaces at end of lines. - +doc.zih.tu-dresden.de/docs/accessibility.md i [[:space:]]$ When referencing partitions, put keyword \"partition\" in front of partition name, e. g. \"partition ml\", not \"ml partition\". doc.zih.tu-dresden.de/docs/contrib/content_rules.md