diff --git a/doc.zih.tu-dresden.de/README.md b/doc.zih.tu-dresden.de/README.md index f1d0e97563caae06b8859b8c0632e7dacc2fb641..bf1b82f52a145f959068fa063d9dbdf31fb2eae3 100644 --- a/doc.zih.tu-dresden.de/README.md +++ b/doc.zih.tu-dresden.de/README.md @@ -41,8 +41,6 @@ Now, create a local clone of your fork #### Install Dependencies See [Installation with Docker](#preview-using-mkdocs-with-dockerfile). -**TODO:** virtual environment -**TODO:** What we need for markdownlinter and checks? <!--- All branches are protected, i.e., only ZIH staff can create branches and push to them ---> diff --git a/doc.zih.tu-dresden.de/docs/access/jupyterhub.md b/doc.zih.tu-dresden.de/docs/access/jupyterhub.md index b6b0f25d3963da0529f26274a3daf4bdfcb0bbe0..f9a916195ecbf814cf426beb4d26885500b3b3de 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyterhub.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyterhub.md @@ -1,7 +1,7 @@ # JupyterHub With our JupyterHub service we offer you a quick and easy way to work with Jupyter notebooks on ZIH -systems. This page covers starting and stopping JuperterHub sessions, error handling and customizing +systems. This page covers starting and stopping JupyterHub sessions, error handling and customizing the environment. We also provide a comprehensive documentation on how to use @@ -21,7 +21,8 @@ cannot give extensive support in every case. !!! note This service is only available for users with an active HPC project. - See [here](../access/overview.md) how to apply for an HPC project. + See [Application for Login and Resources](../application/overview.md), if you need to apply for + an HPC project. JupyterHub is available at [https://taurus.hrsk.tu-dresden.de/jupyter](https://taurus.hrsk.tu-dresden.de/jupyter). @@ -100,7 +101,7 @@ running the code. We currently offer one for Python, C++, MATLAB and R. ## Stop a Session -It is good practise to stop your session once your work is done. This releases resources for other +It is good practice to stop your session once your work is done. This releases resources for other users and your quota is less charged. If you just log out or close the window, your server continues running and **will not stop** until the Slurm job runtime hits the limit (usually 8 hours). @@ -147,8 +148,8 @@ Useful pages for valid batch system parameters: If the connection to your notebook server unexpectedly breaks, you will get this error message. Sometimes your notebook server might hit a batch system or hardware limit and gets killed. Then -usually the logfile of the corresponding batch job might contain useful information. These logfiles -are located in your `home` directory and have the name `jupyter-session-<jobid>.log`. +usually the log file of the corresponding batch job might contain useful information. These log +files are located in your `home` directory and have the name `jupyter-session-<jobid>.log`. ## Advanced Tips @@ -309,4 +310,4 @@ You can switch kernels of existing notebooks in the kernel menu: You have now the option to preload modules from the [module system](../software/modules.md). Select multiple modules that will be preloaded before your notebook server starts. The list of available modules depends on the module environment you want to start the session in (`scs5` or -`ml`). The right module environment will be chosen by your selected partition. +`ml`). The right module environment will be chosen by your selected partition. diff --git a/doc.zih.tu-dresden.de/docs/access/jupyterhub_for_teaching.md b/doc.zih.tu-dresden.de/docs/access/jupyterhub_for_teaching.md index 92ad16d1325173c384c7472658239baca3e26157..797d9fc8e455b14e40a5ec7f3737874b2ac500ae 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyterhub_for_teaching.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyterhub_for_teaching.md @@ -1,7 +1,7 @@ # JupyterHub for Teaching -On this page we want to introduce to you some useful features if you -want to use JupyterHub for teaching. +On this page, we want to introduce to you some useful features if you want to use JupyterHub for +teaching. !!! note @@ -9,23 +9,21 @@ want to use JupyterHub for teaching. Please be aware of the following notes: -- ZIH systems operate at a lower availability level than your usual Enterprise Cloud VM. There - can always be downtimes, e.g. of the filesystems or the batch system. +- ZIH systems operate at a lower availability level than your usual Enterprise Cloud VM. There can + always be downtimes, e.g. of the filesystems or the batch system. - Scheduled downtimes are announced by email. Please plan your courses accordingly. - Access to HPC resources is handled through projects. See your course as a project. Projects need to be registered beforehand (more info on the page [Access](../application/overview.md)). - Don't forget to [add your users](../application/project_management.md#manage-project-members-dis-enable) - (eg. students or tutors) to your project. + (e.g. students or tutors) to your project. - It might be a good idea to [request a reservation](../jobs_and_resources/overview.md#exclusive-reservation-of-hardware) - of part of the compute resources for your project/course to - avoid unnecessary waiting times in the batch system queue. + of part of the compute resources for your project/course to avoid unnecessary waiting times in + the batch system queue. ## Clone a Repository With a Link -This feature bases on -[nbgitpuller](https://github.com/jupyterhub/nbgitpuller). -Documentation can be found at -[this page](https://jupyterhub.github.io/nbgitpuller/). +This feature bases on [nbgitpuller](https://github.com/jupyterhub/nbgitpuller). Further information +can be found in the [external documentation about nbgitpuller](https://jupyterhub.github.io/nbgitpuller/). This extension for Jupyter notebooks can clone every public git repository into the users work directory. It's offering a quick way to distribute notebooks and other material to your students. @@ -50,14 +48,14 @@ The following parameters are available: |---|---| |`repo` | path to git repository| |`branch` | branch in the repository to pull from default: `master`| -|`urlpath` | URL to redirect the user to a certain file [more info](https://jupyterhub.github.io/nbgitpuller/topic/url-options.html#urlpath)| +|`urlpath` | URL to redirect the user to a certain file, [more info about parameter urlpath](https://jupyterhub.github.io/nbgitpuller/topic/url-options.html#urlpath)| |`depth` | clone only a certain amount of latest commits not recommended| This [link generator](https://jupyterhub.github.io/nbgitpuller/link?hub=https://taurus.hrsk.tu-dresden.de/jupyter/) might help creating those links -## Spawner Options Passthrough with URL Parameters +## Spawn Options Pass-through with URL Parameters The spawn form now offers a quick start mode by passing URL parameters. diff --git a/doc.zih.tu-dresden.de/docs/application/project_request_form.md b/doc.zih.tu-dresden.de/docs/application/project_request_form.md index b5b9e348a94c4178d382e5ca27d67047c06f1481..e829f316cb26f11b9b9048a889c8b5e918b2e870 100644 --- a/doc.zih.tu-dresden.de/docs/application/project_request_form.md +++ b/doc.zih.tu-dresden.de/docs/application/project_request_form.md @@ -36,15 +36,16 @@ Any project have: ## Third step: Hardware {loading=lazy width=300 style="float:right"} -This step inquire the required hardware. You can find the specifications -[here](../jobs_and_resources/hardware_overview.md). +This step inquire the required hardware. The +[hardware specifications](../jobs_and_resources/hardware_overview.md) might help you to estimate, +e. g. the compute time. -Please fill in the total computing time you expect in the project runtime. The compute time is +Please fill in the total computing time you expect in the project runtime. The compute time is given in cores per hour (CPU/h), this refers to the 'virtual' cores for nodes with hyperthreading. -If they require GPUs, then this is given as GPU units per hour (GPU/h). Please add 6 CPU hours per +If they require GPUs, then this is given as GPU units per hour (GPU/h). Please add 6 CPU hours per GPU hour in your application. -The project home is a shared storage in your project. Here you exchange data or install software +The project home is a shared storage in your project. Here you exchange data or install software for your project group in userspace. The directory is not intended for active calculations, for this the scratch is available. diff --git a/doc.zih.tu-dresden.de/docs/index.md b/doc.zih.tu-dresden.de/docs/index.md index 60d43b4e73f285901931f652c55aedabc393c451..60f6f081cf4a1c2ea76663bccd65e9ff866597fb 100644 --- a/doc.zih.tu-dresden.de/docs/index.md +++ b/doc.zih.tu-dresden.de/docs/index.md @@ -26,4 +26,4 @@ Contributions from user-side are highly welcome. Please find out more in our [gu **2021-10-05** Offline-maintenance (black building test) -**2021-09-29** Introduction to HPC at ZIH ([slides](misc/HPC-Introduction.pdf)) +**2021-09-29** Introduction to HPC at ZIH ([HPC introduction slides](misc/HPC-Introduction.pdf)) diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md index 3d342f628fc7abfeb851500d3cc6fc785d1a03e2..4ab5ca41a5a8c11d4a52c03661b5810d4d09a65d 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md @@ -64,7 +64,8 @@ True ### Python Virtual Environments -Virtual environments allow users to install additional python packages and create an isolated +[Virtual environments](../software/python_virtual_environments.md) allow users to install +additional python packages and create an isolated runtime environment. We recommend using `virtualenv` for this purpose. ```console diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md index 2af016d0188ae4f926b45e7b8fdc14b039e8baa3..65e445f354d08a3473e226cc97c45ff6c01e8c48 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md @@ -58,10 +58,10 @@ For MPI-parallel jobs one typically allocates one core per task that has to be s ### Multiple Programs Running Simultaneously in a Job In this short example, our goal is to run four instances of a program concurrently in a **single** -batch script. Of course we could also start a batch script four times with `sbatch` but this is not -what we want to do here. Please have a look at -[this subsection](#multiple-programs-running-simultaneously-in-a-job) -in case you intend to run GPU programs simultaneously in a **single** job. +batch script. Of course, we could also start a batch script four times with `sbatch` but this is not +what we want to do here. However, you can also find an example about +[how to run GPU programs simultaneously in a single job](#running-multiple-gpu-applications-simultaneously-in-a-batch-job) +below. !!! example " " @@ -355,4 +355,4 @@ file) that will be executed one after each other with different CPU numbers: ## Array-Job with Afterok-Dependency and Datamover Usage -This is a *todo* +This part is under construction. diff --git a/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md b/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md index 247d35c545a70013e45160f6f45f67d9cca80e4b..9600fe81d30531b2fc85bda91a67ab730414d97b 100644 --- a/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md +++ b/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md @@ -38,8 +38,8 @@ The usage of Flink with Jupyter notebooks is currently under examination. ### Default Configuration -The Spark module is available in both `scs5` and `ml` environments. -Thus, Spark can be executed using different CPU architectures, e.g., Haswell and Power9. +The Spark and Flink modules are available in both `scs5` and `ml` environments. +Thus, Spark and Flink can be executed using different CPU architectures, e.g., Haswell and Power9. Let us assume that two nodes should be used for the computation. Use a `srun` command similar to the following to start an interactive session using the partition haswell. The following code @@ -61,8 +61,9 @@ Once you have the shell, load desired Big Data framework using the command marie@compute$ module load Flink ``` -Before the application can be started, the Spark cluster needs to be set up. To do this, configure -Spark first using configuration template at `$SPARK_HOME/conf`: +Before the application can be started, the cluster with the allocated nodes needs to be set up. To +do this, configure the cluster first using the configuration template at `$SPARK_HOME/conf` for +Spark or `$FLINK_ROOT_DIR/conf` for Flink: === "Spark" ```console @@ -74,7 +75,7 @@ Spark first using configuration template at `$SPARK_HOME/conf`: ``` This places the configuration in a directory called `cluster-conf-<JOB_ID>` in your `home` -directory, where `<JOB_ID>` stands for the id of the Slurm job. After that, you can start Spark in +directory, where `<JOB_ID>` stands for the id of the Slurm job. After that, you can start in the usual way: === "Spark" @@ -86,7 +87,7 @@ the usual way: marie@compute$ start-cluster.sh ``` -The Spark processes should now be set up and you can start your application, e. g.: +The necessary background processes should now be set up and you can start your application, e. g.: === "Spark" ```console diff --git a/doc.zih.tu-dresden.de/docs/software/custom_easy_build_environment.md b/doc.zih.tu-dresden.de/docs/software/custom_easy_build_environment.md index d482d89a45a3849054af19a75ccaf64daeb6e9eb..3a0bc91ab60320f00911fb6bfe8cb07eb23c5e85 100644 --- a/doc.zih.tu-dresden.de/docs/software/custom_easy_build_environment.md +++ b/doc.zih.tu-dresden.de/docs/software/custom_easy_build_environment.md @@ -1,133 +1,155 @@ # EasyBuild -Sometimes the \<a href="SoftwareModulesList" target="\_blank" -title="List of Modules">modules installed in the cluster\</a> are not -enough for your purposes and you need some other software or a different -version of a software. - -\<br />For most commonly used software, chances are high that there is -already a *recipe* that EasyBuild provides, which you can use. But what -is Easybuild? - -\<a href="<https://easybuilders.github.io/easybuild/>" -target="\_blank">EasyBuild\</a>\<span style="font-size: 1em;"> is the -software used to build and install software on, and create modules for, -Taurus.\</span> - -\<span style="font-size: 1em;">The aim of this page is to introduce -users to working with EasyBuild and to utilizing it to create -modules**.**\</span> - -**Prerequisites:** \<a href="Login" target="\_blank">access\</a> to the -Taurus system and basic knowledge about Linux, \<a href="SystemTaurus" -target="\_blank" title="SystemTaurus">Taurus\</a> and the \<a -href="RuntimeEnvironment" target="\_blank" -title="RuntimeEnvironment">modules system \</a>on Taurus. - -\<span style="font-size: 1em;">EasyBuild uses a configuration file -called recipe or "EasyConfig", which contains all the information about -how to obtain and build the software:\</span> +Sometimes the [modules](modules.md) installed in the cluster are not enough for your purposes and +you need some other software or a different version of a software. + +For most commonly used software, chances are high that there is already a *recipe* that EasyBuild +provides, which you can use. But what is EasyBuild? + +[EasyBuild](https://easybuild.io/) is the software used to build and install +software on ZIH systems. + +The aim of this page is to introduce users to working with EasyBuild and to utilizing it to create +modules. + +## Prerequisites + +1. [Shell access](../access/ssh_login.md) to ZIH systems +1. basic knowledge about: + - [the ZIH system](../jobs_and_resources/hardware_overview.md) + - [the module system](modules.md) on ZIH systems + +EasyBuild uses a configuration file called recipe or "EasyConfig", which contains all the +information about how to obtain and build the software: - Name - Version - Toolchain (think: Compiler + some more) - Download URL -- Buildsystem (e.g. configure && make or cmake && make) +- Buildsystem (e.g. `configure && make` or `cmake && make`) - Config parameters - Tests to ensure a successful build -The "Buildsystem" part is implemented in so-called "EasyBlocks" and -contains the common workflow. Sometimes those are specialized to -encapsulate behaviour specific to multiple/all versions of the software. -\<span style="font-size: 1em;">Everything is written in Python, which -gives authors a great deal of flexibility.\</span> +The build system part is implemented in so-called "EasyBlocks" and contains the common workflow. +Sometimes, those are specialized to encapsulate behaviour specific to multiple/all versions of the +software. Everything is written in Python, which gives authors a great deal of flexibility. ## Set up a custom module environment and build your own modules -Installation of the new software (or version) does not require any -specific credentials. +Installation of the new software (or version) does not require any specific credentials. -\<br />Prerequisites: 1 An existing EasyConfig 1 a place to put your -modules. \<span style="font-size: 1em;">Step by step guide:\</span> +### Prerequisites -1\. Create a \<a href="WorkSpaces" target="\_blank">workspace\</a> where -you'll install your modules. You need a place where your modules will be -placed. This needs to be done only once : +1. An existing EasyConfig +1. a place to put your modules. - ws_allocate -F scratch EasyBuild 50 # +### Step by step guide -2\. Allocate nodes. You can do this with interactive jobs (see the -example below) and/or put commands in a batch file and source it. The -latter is recommended for non-interactive jobs, using the command sbatch -in place of srun. For the sake of illustration, we use an interactive -job as an example. The node parameters depend, to some extent, on the -architecture you want to use. ML nodes for the Power9 and others for the -x86. We will use Haswell nodes. +**Step 1:** Create a [workspace](../data_lifecycle/workspaces.md#allocate-a-workspace) where you +install your modules. You need a place where your modules are placed. This needs to be done only +once: - srun -p haswell -N 1 -c 4 --time=08:00:00 --pty /bin/bash +```console +marie@login$ ws_allocate -F scratch EasyBuild 50 +marie@login$ ws_list | grep 'directory.*EasyBuild' + workspace directory : /scratch/ws/1/marie-EasyBuild +``` -\*Using EasyBuild on the login nodes is not allowed\* +**Step 2:** Allocate nodes. You can do this with interactive jobs (see the example below) and/or +put commands in a batch file and source it. The latter is recommended for non-interactive jobs, +using the command `sbatch` instead of `srun`. For the sake of illustration, we use an +interactive job as an example. Depending on the partitions that you want the module to be usable on +later, you need to select nodes with the same architecture. Thus, use nodes from partition ml for +building, if you want to use the module on nodes of that partition. In this example, we assume +that we want to use the module on nodes with x86 architecture und thus, Haswell nodes will be used. -3\. Load EasyBuild module. +```console +marie@login$ srun --partition=haswell --nodes=1 --cpus-per-task=4 --time=08:00:00 --pty /bin/bash -l +``` - module load EasyBuild +!!! warning -\<br />4. Specify Workspace. The rest of the guide is based on it. -Please create an environment variable called \`WORKSPACE\` with the -location of your Workspace: + Using EasyBuild on the login nodes is not allowed. - WORKSPACE=<location_of_your_workspace> # For example: WORKSPACE=/scratch/ws/anpo879a-EasyBuild +**Step 3:** Specify the workspace. The rest of the guide is based on it. Please create an +environment variable called `WORKSPACE` with the path to your workspace: -5\. Load the correct modenv according to your current or target -architecture: \`ml modenv/scs5\` for x86 (default) or \`modenv/ml\` for -Power9 (ml partition). Load EasyBuild module +```console +marie@compute$ export WORKSPACE=/scratch/ws/1/marie-EasyBuild #see output of ws_list above +``` - ml modenv/scs5 - module load EasyBuild +**Step 4:** Load the correct module environment `modenv` according to your current or target +architecture: -6\. Set up your environment: +=== "x86 (default, e. g. partition haswell)" + ```console + marie@compute$ module load modenv/scs5 + ``` +=== "Power9 (partition ml)" + ```console + marie@ml$ module load modenv/ml + ``` - export EASYBUILD_ALLOW_LOADED_MODULES=EasyBuild,modenv/scs5 - export EASYBUILD_DETECT_LOADED_MODULES=unload - export EASYBUILD_BUILDPATH="/tmp/${USER}-EasyBuild${SLURM_JOB_ID:-}" - export EASYBUILD_SOURCEPATH="${WORKSPACE}/sources" - export EASYBUILD_INSTALLPATH="${WORKSPACE}/easybuild-$(basename $(readlink -f /sw/installed))" - export EASYBUILD_INSTALLPATH_MODULES="${EASYBUILD_INSTALLPATH}/modules" - module use "${EASYBUILD_INSTALLPATH_MODULES}/all" - export LMOD_IGNORE_CACHE=1 +**Step 5:** Load module `EasyBuild` -7\. \<span style="font-size: 13px;">Now search for an existing -EasyConfig: \</span> +```console +marie@compute$ module load EasyBuild +``` - eb --search TensorFlow +**Step 6:** Set up your environment: -\<span style="font-size: 13px;">8. Build the EasyConfig and its -dependencies\</span> +```console +marie@compute$ export EASYBUILD_ALLOW_LOADED_MODULES=EasyBuild,modenv/scs5 +marie@compute$ export EASYBUILD_DETECT_LOADED_MODULES=unload +marie@compute$ export EASYBUILD_BUILDPATH="/tmp/${USER}-EasyBuild${SLURM_JOB_ID:-}" +marie@compute$ export EASYBUILD_SOURCEPATH="${WORKSPACE}/sources" +marie@compute$ export EASYBUILD_INSTALLPATH="${WORKSPACE}/easybuild-$(basename $(readlink -f /sw/installed))" +marie@compute$ export EASYBUILD_INSTALLPATH_MODULES="${EASYBUILD_INSTALLPATH}/modules" +marie@compute$ module use "${EASYBUILD_INSTALLPATH_MODULES}/all" +marie@compute$ export LMOD_IGNORE_CACHE=1 +``` - eb TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4.eb -r +**Step 7:** Now search for an existing EasyConfig: -\<span style="font-size: 13px;">After this is done (may take A LONG -time), you can load it just like any other module.\</span> +```console +marie@compute$ eb --search TensorFlow +``` -9\. To use your custom build modules you only need to rerun step 4, 5, 6 -and execute the usual: +**Step 8:** Build the EasyConfig and its dependencies (option `-r`) - module load <name_of_your_module> # For example module load TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4 +```console +marie@compute$ eb TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4.eb -r +``` -The key is the \`module use\` command which brings your modules into -scope so \`module load\` can find them and the LMOD_IGNORE_CACHE line -which makes LMod pick up the custom modules instead of searching the +This may take a long time. After this is done, you can load it just like any other module. + +**Step 9:** To use your custom build modules you only need to rerun steps 3, 4, 5, 6 and execute +the usual: + +```console +marie@compute$ module load TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4 #replace with the name of your module +``` + +The key is the `module use` command, which brings your modules into scope, so `module load` can find +them. The `LMOD_IGNORE_CACHE` line makes `LMod` pick up the custom modules instead of searching the system cache. ## Troubleshooting -When building your EasyConfig fails, you can first check the log -mentioned and scroll to the bottom to see what went wrong. +When building your EasyConfig fails, you can first check the log mentioned and scroll to the bottom +to see what went wrong. + +It might also be helpful to inspect the build environment EasyBuild uses. For that you can run: + +```console +marie@compute$ eb myEC.eb --dump-env-script` +``` + +This command creates a sourceable `.env`-file with `module load` and `export` commands that show +what EasyBuild does before running, e.g., the configuration step. -It might also be helpful to inspect the build environment EB uses. For -that you can run \`eb myEC.eb --dump-env-script\` which creates a -sourceable .env file with \`module load\` and \`export\` commands that -show what EB does before running, e.g., the configure step. +It might also be helpful to use -It might also be helpful to use '\<span style="font-size: 1em;">export -LMOD_IGNORE_CACHE=0'\</span> +```console +marie@compute$ export LMOD_IGNORE_CACHE=0 +``` diff --git a/doc.zih.tu-dresden.de/docs/software/data_analytics.md b/doc.zih.tu-dresden.de/docs/software/data_analytics.md index 44414493405bc36ffed74bb85fb805b331308af7..b4a5f7f8b9f86c9a47fec20b875970efd4d787b2 100644 --- a/doc.zih.tu-dresden.de/docs/software/data_analytics.md +++ b/doc.zih.tu-dresden.de/docs/software/data_analytics.md @@ -24,7 +24,8 @@ marie@compute$ module spider <software_name> Refer to the section covering [modules](modules.md) for further information on the modules system. Additional software or special versions of [individual modules](custom_easy_build_environment.md) -can be installed individually by each user. If possible, the use of virtual environments is +can be installed individually by each user. If possible, the use of +[virtual environments](python_virtual_environments.md) is recommended (e.g. for Python). Likewise, software can be used within [containers](containers.md). For the transfer of larger amounts of data into and within the system, the diff --git a/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md b/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md index 38190764e6c9efedb275ec9ff4324d916c851566..8f61fe49fd56642aaded82cf711ca92d0035b99f 100644 --- a/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md +++ b/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md @@ -270,9 +270,9 @@ This GUI guides through the configuration process and as result a configuration automatically according to the GUI input. If you are more familiar with using OmniOpt later on, this configuration file can be modified directly without using the GUI. -A screenshot of the GUI, including a properly configuration for the MNIST fashion example is shown -below. The GUI, in which the below displayed values are already entered, can be reached -[here](https://imageseg.scads.ai/omnioptgui/?maxevalserror=5&mem_per_worker=1000&number_of_parameters=3¶m_0_values=10%2C50%2C100¶m_1_values=8%2C16%2C32¶m_2_values=10%2C15%2C30¶m_0_name=out-layer1¶m_1_name=batchsize¶m_2_name=batchsize&account=&projectname=mnist_fashion_optimization_set_1&partition=alpha&searchtype=tpe.suggest¶m_0_type=hp.choice¶m_1_type=hp.choice¶m_2_type=hp.choice&max_evals=1000&objective_program=bash%20%3C%2Fpath%2Fto%2Fwrapper-script%2Frun-mnist-fashion.sh%3E%20--out-layer1%3D%28%24x_0%29%20--batchsize%3D%28%24x_1%29%20--epochs%3D%28%24x_2%29&workdir=%3C%2Fscratch%2Fws%2Fomniopt-workdir%2F%3E). +A screenshot of +[the GUI](https://imageseg.scads.ai/omnioptgui/?maxevalserror=5&mem_per_worker=1000&number_of_parameters=3¶m_0_values=10%2C50%2C100¶m_1_values=8%2C16%2C32¶m_2_values=10%2C15%2C30¶m_0_name=out-layer1¶m_1_name=batchsize¶m_2_name=batchsize&account=&projectname=mnist_fashion_optimization_set_1&partition=alpha&searchtype=tpe.suggest¶m_0_type=hp.choice¶m_1_type=hp.choice¶m_2_type=hp.choice&max_evals=1000&objective_program=bash%20%3C%2Fpath%2Fto%2Fwrapper-script%2Frun-mnist-fashion.sh%3E%20--out-layer1%3D%28%24x_0%29%20--batchsize%3D%28%24x_1%29%20--epochs%3D%28%24x_2%29&workdir=%3C%2Fscratch%2Fws%2Fomniopt-workdir%2F%3E), +including a properly configuration for the MNIST fashion example is shown below. Please modify the paths for `objective program` and `workdir` according to your needs. diff --git a/doc.zih.tu-dresden.de/docs/software/papi.md b/doc.zih.tu-dresden.de/docs/software/papi.md index 9d96cc58f4453692ad7b57abe3e56abda1539290..2de80b4e8a0f420a6b42cd01a3de027b5fb89be2 100644 --- a/doc.zih.tu-dresden.de/docs/software/papi.md +++ b/doc.zih.tu-dresden.de/docs/software/papi.md @@ -20,8 +20,8 @@ To collect performance events, PAPI provides two APIs, the *high-level* and *low The high-level API provides the ability to record performance events inside instrumented regions of serial, multi-processing (MPI, SHMEM) and thread (OpenMP, Pthreads) parallel applications. It is -designed for simplicity, not flexibility. For more details click -[here](https://bitbucket.org/icl/papi/wiki/PAPI-HL.md). +designed for simplicity, not flexibility. More details can be found in the +[PAPI wiki High-Level API description](https://bitbucket.org/icl/papi/wiki/PAPI-HL.md). The following code example shows the use of the high-level API by marking a code section. @@ -86,19 +86,19 @@ more output files in JSON format. ### Low-Level API -The low-level API manages hardware events in user-defined groups -called Event Sets. It is meant for experienced application programmers and tool developers wanting -fine-grained measurement and control of the PAPI interface. It provides access to both PAPI preset -and native events, and supports all installed components. For more details on the low-level API, -click [here](https://bitbucket.org/icl/papi/wiki/PAPI-LL.md). +The low-level API manages hardware events in user-defined groups called Event Sets. It is meant for +experienced application programmers and tool developers wanting fine-grained measurement and +control of the PAPI interface. It provides access to both PAPI preset and native events, and +supports all installed components. The PAPI wiki contains also a page with more details on the +[low-level API](https://bitbucket.org/icl/papi/wiki/PAPI-LL.md). ## Usage on ZIH Systems Before you start a PAPI measurement, check which events are available on the desired architecture. -For this purpose PAPI offers the tools `papi_avail` and `papi_native_avail`. If you want to measure +For this purpose, PAPI offers the tools `papi_avail` and `papi_native_avail`. If you want to measure multiple events, please check which events can be measured concurrently using the tool -`papi_event_chooser`. For more details on the PAPI tools click -[here](https://bitbucket.org/icl/papi/wiki/PAPI-Overview.md#markdown-header-papi-utilities). +`papi_event_chooser`. The PAPI wiki contains more details on +[the PAPI tools](https://bitbucket.org/icl/papi/wiki/PAPI-Overview.md#markdown-header-papi-utilities). !!! hint @@ -133,8 +133,7 @@ compile your application against the PAPI library. !!! hint The PAPI modules on ZIH systems are only installed with the default `perf_event` component. If you - want to measure, e.g., GPU events, you have to install your own PAPI. Instructions on how to - download and install PAPI can be found - [here](https://bitbucket.org/icl/papi/wiki/Downloading-and-Installing-PAPI.md). To install PAPI - with additional components, you have to specify them during configure, for details click - [here](https://bitbucket.org/icl/papi/wiki/PAPI-Overview.md#markdown-header-components). + want to measure, e.g., GPU events, you have to install your own PAPI. Please see the + [external instructions on how to download and install PAPI](https://bitbucket.org/icl/papi/wiki/Downloading-and-Installing-PAPI.md). + To install PAPI with additional components, you have to specify them during configure as + described for the [Installation of Components](https://bitbucket.org/icl/papi/wiki/PAPI-Overview.md#markdown-header-components). diff --git a/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md b/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md index e19daeeb6731aa32eb993f2495e6ec443bebe2dd..67b10817c738b414a3302388b5cca3392ff96bb1 100644 --- a/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md +++ b/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md @@ -93,8 +93,6 @@ are in the virtual environment. You can deactivate the conda environment as foll (conda-env) marie@compute$ conda deactivate #Leave the virtual environment ``` -TODO: Link to this page from other DA/ML topics. insert link in alpha centauri - ??? example This is an example on partition Alpha. The example creates a virtual environment, and installs diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index 4efbb60c85f44b6cb8d80c33cfb251c7a52003a3..8867e2df2618e2b1a7fea1f19069f9cfca995f2e 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -69,7 +69,6 @@ nav: - PAPI Library: software/papi.md - Pika: software/pika.md - Perf Tools: software/perf_tools.md - - Score-P: software/scorep.md - Vampir: software/vampir.md - Data Life Cycle Management: - Overview: data_lifecycle/overview.md diff --git a/doc.zih.tu-dresden.de/util/check-spelling.sh b/doc.zih.tu-dresden.de/util/check-spelling.sh index 8448d0bbffe534b0fd676dbd00ca82e17e7d167d..0d574c1e6adeadacb895f31209b16a9d7f25a123 100755 --- a/doc.zih.tu-dresden.de/util/check-spelling.sh +++ b/doc.zih.tu-dresden.de/util/check-spelling.sh @@ -7,6 +7,7 @@ basedir=`dirname "$scriptpath"` basedir=`dirname "$basedir"` wordlistfile=$(realpath $basedir/wordlist.aspell) branch="origin/${CI_MERGE_REQUEST_TARGET_BRANCH_NAME:-preview}" +files_to_skip=(doc.zih.tu-dresden.de/docs/accessibility.md doc.zih.tu-dresden.de/docs/data_protection_declaration.md data_protection_declaration.md) aspellmode= if aspell dump modes | grep -q markdown; then aspellmode="--mode=markdown" @@ -14,9 +15,10 @@ fi function usage() { cat <<-EOF -usage: $0 [file] +usage: $0 [file | -a] If file is given, outputs all words of the file, that the spell checker cannot recognize. -If file is omitted, checks whether any changed file contains more unrecognizable words than before the change. +If parameter -a (or --all) is given instead of the file, checks all markdown files. +Otherwise, checks whether any changed file contains more unrecognizable words than before the change. If you are sure a word is correct, you can put it in $wordlistfile. EOF } @@ -29,12 +31,52 @@ function getNumberOfAspellOutputLines(){ getAspellOutput | wc -l } +function isWordlistSorted(){ + #Unfortunately, sort depends on locale and docker does not provide much. + #Therefore, it uses bytewise comparison. We avoid problems with the command tr. + if sed 1d "$wordlistfile" | tr [:upper:] [:lower:] | sort -C; then + return 1 + fi + return 0 +} + +function shouldSkipFile(){ + printf '%s\n' "${files_to_skip[@]}" | grep -xq $1 +} + +function checkAllFiles(){ + any_fails=false + + if isWordlistSorted; then + echo "Unsorted wordlist in $wordlistfile" + any_fails=true + fi + + files=$(git ls-tree --full-tree -r --name-only HEAD $basedir/ | grep .md) + while read file; do + if [ "${file: -3}" == ".md" ]; then + if shouldSkipFile ${file}; then + echo "Skip $file" + else + echo "Check $file" + echo "-- File $file" + if { cat "$file" | getAspellOutput | tee /dev/fd/3 | grep -xq '.*'; } 3>&1; then + any_fails=true + fi + fi + fi + done <<< "$files" + + if [ "$any_fails" == true ]; then + return 1 + fi + return 0 +} + function isMistakeCountIncreasedByChanges(){ any_fails=false - #Unfortunately, sort depends on locale and docker does not provide much. - #Therefore, it uses bytewise comparison. We avoid problems with the command tr. - if ! sed 1d "$wordlistfile" | tr [:upper:] [:lower:] | sort -C; then + if isWordlistSorted; then echo "Unsorted wordlist in $wordlistfile" any_fails=true fi @@ -48,9 +90,7 @@ function isMistakeCountIncreasedByChanges(){ while read oldfile; do read newfile if [ "${newfile: -3}" == ".md" ]; then - if [[ $newfile == *"accessibility.md"* || - $newfile == *"data_protection_declaration.md"* || - $newfile == *"legal_notice.md"* ]]; then + if shouldSkipFile ${newfile:2}; then echo "Skip $newfile" else echo "Check $newfile" @@ -90,6 +130,9 @@ if [ $# -eq 1 ]; then usage exit ;; + -a | --all) + checkAllFiles + ;; *) cat "$1" | getAspellOutput ;; diff --git a/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh b/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh index 7895f576e46e66caa9e14f3d77a74deb918fdab0..38e9015599922fdcec93fecebb9fd638cfa576d8 100755 --- a/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh +++ b/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh @@ -23,7 +23,7 @@ s \<SLURM\> doc.zih.tu-dresden.de/docs/contrib/content_rules.md i file \+system HDFS Use \"ZIH systems\" or \"ZIH system\" instead of \"Taurus\". \"taurus\" is only allowed when used in ssh commands and other very specific situations. -doc.zih.tu-dresden.de/docs/contrib/content_rules.md +doc.zih.tu-dresden.de/docs/contrib/content_rules.md doc.zih.tu-dresden.de/docs/archive/phase2_migration.md i \<taurus\> taurus\.hrsk /taurus /TAURUS ssh ^[0-9]\+:Host taurus$ \"HRSKII\" should be avoided, use \"ZIH system\" instead. doc.zih.tu-dresden.de/docs/contrib/content_rules.md @@ -35,20 +35,20 @@ i hpc[ -]\+da\> i attachurl Replace \"todo\" with real content. - +doc.zih.tu-dresden.de/docs/archive/system_triton.md i \<todo\> <!--.*todo.*--> -Replace \"Coming soon\" with real content. +Replace variations of \"Coming soon\" with real content. -i \<coming soon\> +i \(\<coming soon\>\|This .* under construction\|posted here\) Avoid spaces at end of lines. - +doc.zih.tu-dresden.de/docs/accessibility.md i [[:space:]]$ When referencing partitions, put keyword \"partition\" in front of partition name, e. g. \"partition ml\", not \"ml partition\". doc.zih.tu-dresden.de/docs/contrib/content_rules.md i \(alpha\|ml\|haswell\|romeo\|gpu\|smp\|julia\|hpdlf\|scs5\)-\?\(interactive\)\?[^a-z]*partition Give hints in the link text. Words such as \"here\" or \"this link\" are meaningless. doc.zih.tu-dresden.de/docs/contrib/content_rules.md -i \[\s\?\(documentation\|here\|this \(link\|page\|subsection\)\|slides\?\|manpage\)\s\?\] +i \[\s\?\(documentation\|here\|more info\|this \(link\|page\|subsection\)\|slides\?\|manpage\)\s\?\] Use \"workspace\" instead of \"work space\" or \"work-space\". doc.zih.tu-dresden.de/docs/contrib/content_rules.md i work[ -]\+space" diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index 443647e74a9cc4a7e17e92f381c914de04e1b0f3..73af7da3010a0570c99b148f180440c20f8277cd 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -65,6 +65,8 @@ DockerHub dockerized dotfile dotfiles +downtime +downtimes EasyBuild ecryptfs engl @@ -142,6 +144,7 @@ Itanium jobqueue jpg jss +jupyter Jupyter JupyterHub JupyterLab @@ -194,6 +197,7 @@ multithreaded Multithreading NAMD natively +nbgitpuller nbsp NCCL Neptun @@ -260,6 +264,8 @@ pytorch PyTorch Quantum queue +quickstart +Quickstart randint reachability README