diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md index bdbaa5a1523ec2fc06150195e18764cf14b618ef..a0152832601f63ef68ecb2191414265896237434 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md @@ -18,11 +18,11 @@ In the following, a brief overview on relevant topics w.r.t. data life cycle man The main concept of working with data on ZIH systems bases on [Workspaces](workspaces.md). Use it properly: - * use a `/home` directory for the limited amount of personal data, simple examples and the results - of calculations. The home directory is not a working directory! However, `/home` filesystem is - [backed up](#backup) using snapshots; - * use `workspaces` as a place for working data (i.e. data sets); Recommendations of choosing the - correct storage system for workspace presented below. +* use a `/home` directory for the limited amount of personal data, simple examples and the results + of calculations. The home directory is not a working directory! However, `/home` filesystem is + [backed up](#backup) using snapshots; +* use `workspaces` as a place for working data (i.e. data sets); Recommendations of choosing the + correct storage system for workspace presented below. ### Taxonomy of Filesystems @@ -31,38 +31,28 @@ It is important to design your data workflow according to characteristics, like storage to efficiently use the provided storage and filesystems. The page [filesystems](file_systems.md) holds a comprehensive documentation on the different filesystems. -<!--In general, the mechanisms of -so-called--> <!--[Workspaces](workspaces.md) are compulsory for all HPC users to store data for a -defined duration ---> <!--depending on the requirements and the storage system this time span might -range from days to a few--> <!--years.--> -<!--- [HPC filesystems](file_systems.md)--> -<!--- [Intermediate Archive](intermediate_archive.md)--> -<!--- [Special data containers] **todo** Special data containers (was no valid link in old compendium)--> -<!--- [Move data between filesystems](../data_transfer/data_mover.md)--> -<!--- [Move data to/from ZIH's filesystems](../data_transfer/export_nodes.md)--> -<!--- [Longterm Preservation for ResearchData](preservation_research_data.md)--> !!! hint "Recommendations to choose of storage system" * For data that seldom changes but consumes a lot of space, the [warm_archive](file_systems.md#warm_archive) can be used. (Note that this is mounted **read-only** on the compute nodes). - * For a series of calculations that works on the same data please use a `scratch` based [workspace](workspaces.md). + * For a series of calculations that works on the same data please use a `scratch` based + [workspace](workspaces.md). * **SSD**, in its turn, is the fastest available filesystem made only for large parallel applications running with millions of small I/O (input, output operations). * If the batch job needs a directory for temporary data then **SSD** is a good choice as well. The data can be deleted afterwards. Keep in mind that every workspace has a storage duration. Thus, be careful with the expire date -otherwise it could vanish. The core data of your project should be [backed up](#backup) and -[archived]**todo link** (for the most [important]**todo link** data). +otherwise it could vanish. The core data of your project should be [backed up](#backup) and the most +important data should be [archived](preservation_research_data.md). ### Backup The backup is a crucial part of any project. Organize it at the beginning of the project. The backup mechanism on ZIH systems covers **only** the `/home` and `/projects` filesystems. Backed up -files can be restored directly by the users. Details can be found -[here](file_systems.md#backup-and-snapshots-of-the-file-system). +files can be restored directly by users, see [Snapshots](permanent.md#snapshots). !!! warning @@ -73,13 +63,13 @@ files can be restored directly by the users. Details can be found Organizing of living data using the filesystem helps for consistency of the project. We recommend following the rules for your work regarding: - * Organizing the data: Never change the original data; Automatize the organizing the data; Clearly - separate intermediate and final output in the filenames; Carry identifier and original name - along in your analysis pipeline; Make outputs clearly identifiable; Document your analysis - steps. - * Naming Data: Keep short, but meaningful names; Keep standard file endings; File names - don’t replace documentation and metadata; Use standards of your discipline; Make rules for your - project, document and keep them (See the [README recommendations]**todo link** below) +* Organizing the data: Never change the original data; Automatize the organizing the data; Clearly + separate intermediate and final output in the filenames; Carry identifier and original name + along in your analysis pipeline; Make outputs clearly identifiable; Document your analysis + steps. +* Naming Data: Keep short, but meaningful names; Keep standard file endings; File names + don’t replace documentation and metadata; Use standards of your discipline; Make rules for your + project, document and keep them (See the [README recommendations](#readme-recommendation) below) This is the example of an organization (hierarchical) for the folder structure. Use it as a visual illustration of the above: @@ -128,49 +118,10 @@ Don't forget about data hygiene: Classify your current data into critical (need its life cycle (from creation, storage and use to sharing, archiving and destruction); Erase the data you don’t need throughout its life cycle. -<!--## Software Packages--> - -<!--As was written before the module concept is the basic concept for using software on ZIH systems.--> -<!--Uniformity of the project has to be achieved by using the same set of software on different levels.--> -<!--It could be done by using environments. There are two types of environments should be distinguished:--> -<!--runtime environment (the project level, use scripts to load [modules]**todo link**), Python virtual--> -<!--environment. The concept of the environment will give an opportunity to use the same version of the--> -<!--software on every level of the project for every project member.--> - -<!--### Private Individual and Project Modules Files--> - -<!--[Private individual and project module files]**todo link** will be discussed in [chapter 7]**todo--> -<!--link**. Project modules list is a powerful instrument for effective teamwork.--> - -<!--### Python Virtual Environment--> - -<!--If you are working with the Python then it is crucial to use the virtual environment on ZIH systems. The--> -<!--main purpose of Python virtual environments (don't mess with the software environment for modules)--> -<!--is to create an isolated environment for Python projects (self-contained directory tree that--> -<!--contains a Python installation for a particular version of Python, plus a number of additional--> -<!--packages).--> - -<!--**Vitualenv (venv)** is a standard Python tool to create isolated Python environments. We--> -<!--recommend using venv to work with Tensorflow and Pytorch on ZIH systems. It has been integrated into the--> -<!--standard library under the [venv module]**todo link**. **Conda** is the second way to use a virtual--> -<!--environment on the ZIH systems. Conda is an open-source package management system and environment--> -<!--management system from the Anaconda.--> - -<!--[Detailed information]**todo link** about using the virtual environment.--> - -<!--## Application Software Availability--> - -<!--Software created for the purpose of the project should be available for all members of the group.--> -<!--The instruction of how to use the software: installation of packages, compilation etc should be--> -<!--documented and gives the opportunity to comfort efficient and safe work.--> - -## Access rights +## Access Rights The concept of **permissions** and **ownership** is crucial in Linux. See the -[HPC-introduction]**todo link** slides for the understanding of the main concept. Standard Linux -changing permission command (i.e `chmod`) valid for ZIH systems as well. The **group** access level -contains members of your project group. Be careful with 'write' permission and never allow to change -the original data. - -Useful links: [Data Management]**todo link**, [Filesystems]**todo link**, -[Project Management]**todo link**, [Preservation research data[**todo link** +[slides of HPC introduction](../misc/HPC-Introduction.pdf) for understanding of the main concept. +Standard Linux changing permission command (i.e `chmod`) valid for ZIH systems as well. The +**group** access level contains members of your project group. Be careful with 'write' permission +and never allow to change the original data. diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md index 027888e273ff85e331e75652e4df6bcb996b7254..d970f38b9abb883fca7864a7af01a6f7037623a8 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md @@ -101,9 +101,10 @@ maximum durations. A workspace can be extended multiple times, depending on the | Filesystem (use with parameter `-F`) | Duration, days | Extensions | Remarks | |:------------------------------------:|:----------:|:-------:|:-----------------------------------:| -| `ssd` | 30 | 2 | High-IOPS filesystem (`/lustre/ssd`) on SSDs. | -| `beegfs` | 30 | 2 | High-IOPS filesystem (`/lustre/ssd`) on NVMes. | -| `scratch` | 100 | 10 | Scratch filesystem (`/scratch`) with high streaming bandwidth, based on spinning disks | +| `ssd` | 30 | 2 | High-IOPS filesystem (`/lustre/ssd`, symbolic link: `/ssd`) on SSDs. | +| `beegfs_global0` (deprecated) | 30 | 2 | High-IOPS filesystem (`/beegfs/global0`) on NVMes. | +| `beegfs` | 30 | 2 | High-IOPS filesystem (`/beegfs`) on NVMes. | +| `scratch` | 100 | 10 | Scratch filesystem (`/lustre/ssd`, symbolic link: `/scratch`) with high streaming bandwidth, based on spinning disks | | `warm_archive` | 365 | 2 | Capacity filesystem based on spinning disks | To extent your workspace use the following command: diff --git a/doc.zih.tu-dresden.de/docs/software/cfd.md b/doc.zih.tu-dresden.de/docs/software/cfd.md index 492cb96d24f3761e2820fdba34eaa6b0a35db320..186d7b3a5a97a2daf06d8618c7c91dc91d7ab971 100644 --- a/doc.zih.tu-dresden.de/docs/software/cfd.md +++ b/doc.zih.tu-dresden.de/docs/software/cfd.md @@ -42,7 +42,7 @@ marie@login$ # source $FOAM_CSH module load OpenFOAM source $FOAM_BASH cd /scratch/ws/1/marie-example-workspace # work directory using workspace - srun pimpleFoam -parallel > "$OUTFILE" + srun pimpleFoam -parallel > "$OUTFILE" ``` ## Ansys CFX @@ -62,7 +62,7 @@ geometry and mesh generator cfx5pre, and the post-processor cfx5post. module load ANSYS cd /scratch/ws/1/marie-example-workspace # work directory using workspace - cfx-parallel.sh -double -def StaticMixer.def + cfx-parallel.sh -double -def StaticMixer.def ``` ## Ansys Fluent diff --git a/doc.zih.tu-dresden.de/docs/software/data_analytics_with_python.md b/doc.zih.tu-dresden.de/docs/software/data_analytics_with_python.md index bc9ac622530f2b355adef7337fb5d49447d79be1..00ce0c5c4c3ddbd3654161bab69ee0a493cb4350 100644 --- a/doc.zih.tu-dresden.de/docs/software/data_analytics_with_python.md +++ b/doc.zih.tu-dresden.de/docs/software/data_analytics_with_python.md @@ -212,11 +212,11 @@ for the partition `alpha` (queue at the dask terms) on the ZIH system: ```python from dask_jobqueue import SLURMCluster -cluster = SLURMCluster(queue='alpha', +cluster = SLURMCluster(queue='alpha', cores=8, - processes=2, - project='p_marie', - memory="8GB", + processes=2, + project='p_marie', + memory="8GB", walltime="00:30:00") ``` @@ -235,15 +235,15 @@ from distributed import Client from dask_jobqueue import SLURMCluster from dask import delayed -cluster = SLURMCluster(queue='alpha', +cluster = SLURMCluster(queue='alpha', cores=8, - processes=2, - project='p_marie', - memory="80GB", + processes=2, + project='p_marie', + memory="80GB", walltime="00:30:00", extra=['--resources gpu=1']) -cluster.scale(2) #scale it to 2 workers! +cluster.scale(2) #scale it to 2 workers! client = Client(cluster) #command will show you number of workers (python objects corresponds to jobs) ``` @@ -288,7 +288,7 @@ for the Monte-Carlo estimation of Pi. uid = int( sp.check_output('id -u', shell=True).decode('utf-8').replace('\n','') ) portdash = 10001 + uid - #create a Slurm cluster, please specify your project + #create a Slurm cluster, please specify your project cluster = SLURMCluster(queue='alpha', cores=2, project='p_marie', memory="8GB", walltime="00:30:00", extra=['--resources gpu=1'], scheduler_options={"dashboard_address": f":{portdash}"}) @@ -309,12 +309,12 @@ for the Monte-Carlo estimation of Pi. def calc_pi_mc(size_in_bytes, chunksize_in_bytes=200e6): """Calculate PI using a Monte Carlo estimate.""" - + size = int(size_in_bytes / 8) chunksize = int(chunksize_in_bytes / 8) - + xy = da.random.uniform(0, 1, size=(size / 2, 2), chunks=(chunksize / 2, 2)) - + in_circle = ((xy ** 2).sum(axis=-1) < 1) pi = 4 * in_circle.mean() @@ -327,11 +327,11 @@ for the Monte-Carlo estimation of Pi. f"\tErr: {abs(pi - np.pi) : 10.3e}\n" f"\tWorkers: {num_workers}" f"\t\tTime: {time_delta : 7.3f}s") - + #let's loop over different volumes of double-precision random numbers and estimate it for size in (1e9 * n for n in (1, 10, 100)): - + start = time() pi = calc_pi_mc(size).compute() elaps = time() - start @@ -339,7 +339,7 @@ for the Monte-Carlo estimation of Pi. print_pi_stats(size, pi, time_delta=elaps, num_workers=len(cluster.scheduler.workers)) #Scaling the Cluster to twice its size and re-run the experiments - + new_num_workers = 2 * len(cluster.scheduler.workers) print(f"Scaling from {len(cluster.scheduler.workers)} to {new_num_workers} workers.") @@ -349,11 +349,11 @@ for the Monte-Carlo estimation of Pi. sleep(120) client - + #Re-run same experiments with doubled cluster - for size in (1e9 * n for n in (1, 10, 100)): - + for size in (1e9 * n for n in (1, 10, 100)): + start = time() pi = calc_pi_mc(size).compute() elaps = time() - start diff --git a/doc.zih.tu-dresden.de/docs/software/distributed_training.md b/doc.zih.tu-dresden.de/docs/software/distributed_training.md index bd45768f67c862b2a0137bd2a1656723fa6dfd91..1008e33f6a60ba3b4b189deeae2d0f2b14066ffd 100644 --- a/doc.zih.tu-dresden.de/docs/software/distributed_training.md +++ b/doc.zih.tu-dresden.de/docs/software/distributed_training.md @@ -183,7 +183,7 @@ DDP uses collective communications in the [torch.distributed](https://pytorch.org/tutorials/intermediate/dist_tuto.html) package to synchronize gradients and buffers. -The tutorial can be found [here](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). +Please also look at the [official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). To use distributed data parallelism on ZIH systems, please make sure the `--ntasks-per-node` parameter is equal to the number of GPUs you use per node. @@ -234,7 +234,7 @@ marie@compute$ module spider Horovod # Check available modules Horovod/0.19.5-fosscuda-2019b-TensorFlow-2.2.0-Python-3.7.4 Horovod/0.21.1-TensorFlow-2.4.1 [...] -marie@compute$ module load Horovod/0.19.5-fosscuda-2019b-TensorFlow-2.2.0-Python-3.7.4 +marie@compute$ module load Horovod/0.19.5-fosscuda-2019b-TensorFlow-2.2.0-Python-3.7.4 ``` Or if you want to use Horovod on the partition `alpha`, you can load it with the dependencies: diff --git a/doc.zih.tu-dresden.de/docs/software/ngc_containers.md b/doc.zih.tu-dresden.de/docs/software/ngc_containers.md index 835259ce9d6ff5bb48912911f5f02bae7d449596..f19612d9a3310f869a483c20328d51168317552a 100644 --- a/doc.zih.tu-dresden.de/docs/software/ngc_containers.md +++ b/doc.zih.tu-dresden.de/docs/software/ngc_containers.md @@ -53,7 +53,7 @@ Create a container from the image from the NGC catalog. (For this example, the alpha is used): ```console -marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=1 --ntasks=1 --gres=gpu:1 --time=08:00:00 --pty --mem=50000 bash +marie@login$ srun --partition=alpha --nodes=1 --ntasks-per-node=1 --ntasks=1 --gres=gpu:1 --time=08:00:00 --pty --mem=50000 bash marie@compute$ cd /scratch/ws/<name_of_your_workspace>/containers #please create a Workspace diff --git a/doc.zih.tu-dresden.de/docs/software/overview.md b/doc.zih.tu-dresden.de/docs/software/overview.md index f8f4bf32b66c73234ad6db3cb728662e0d33dd7e..9d2d86d7c06989acfcb9415f908fc1453538b6a8 100644 --- a/doc.zih.tu-dresden.de/docs/software/overview.md +++ b/doc.zih.tu-dresden.de/docs/software/overview.md @@ -12,7 +12,7 @@ so called dotfiles in your home directory, e.g., `~/.bashrc` or `~/.bash_profile ## Software Environment There are different options to work with software on ZIH systems: [modules](#modules), -[JupyterNotebook](#jupyternotebook) and [containers](#containers). Brief descriptions and related +[Jupyter Notebook](#jupyternotebook) and [containers](#containers). Brief descriptions and related links on these options are provided below. !!! note @@ -21,16 +21,6 @@ links on these options are provided below. * `scs5` environment for the x86 architecture based compute resources * and `ml` environment for the Machine Learning partition based on the Power9 architecture. -According to [What software do I need]**todo link**, first of all, check the [Software module -list]**todo link**. - -<!--Work with the software on ZIH systems could be started only after allocating the resources by [batch--> -<!--systems]**todo link**.--> - -<!--After logging in, you are on one of the login nodes. They are not meant for work, but only for the--> -<!--login process and short tests. Allocating resources will be done by batch system--> -<!--[Slurm](../jobs_and_resources/slurm.md).--> - ## Modules Usage of software on ZIH systems, e.g., frameworks, compilers, loader and libraries, is @@ -47,7 +37,7 @@ The [Jupyter Notebook](https://jupyter.org/) is an open-source web application t documents containing live code, equations, visualizations, and narrative text. There is a [JupyterHub](../access/jupyterhub.md) service on ZIH systems, where you can simply run your Jupyter notebook on compute nodes using [modules](#modules), preloaded or custom virtual environments. -Moreover, you can run a [manually created remote jupyter server](../archive/install_jupyter.md) +Moreover, you can run a [manually created remote Jupyter server](../archive/install_jupyter.md) for more specific cases. ## Containers diff --git a/doc.zih.tu-dresden.de/docs/software/perf_tools.md b/doc.zih.tu-dresden.de/docs/software/perf_tools.md index 16007698726b0430f84ef20acc80cb9e1766d64d..83398f49cb68a3255e051ae866a3679124559bef 100644 --- a/doc.zih.tu-dresden.de/docs/software/perf_tools.md +++ b/doc.zih.tu-dresden.de/docs/software/perf_tools.md @@ -1,8 +1,8 @@ # Introduction `perf` consists of two parts: the kernel space implementation and the userland tools. This wiki -entry focusses on the latter. These tools are installed on taurus, and others and provides support -for sampling applications and reading performance counters. +entry focusses on the latter. These tools are installed on ZIH systems, and others and provides +support for sampling applications and reading performance counters. ## Configuration @@ -34,18 +34,18 @@ Run `perf stat <Your application>`. This will provide you with a general overview on some counters. ```Bash -Performance counter stats for 'ls':= - 2,524235 task-clock # 0,352 CPUs utilized - 15 context-switches # 0,006 M/sec - 0 CPU-migrations # 0,000 M/sec - 292 page-faults # 0,116 M/sec - 6.431.241 cycles # 2,548 GHz - 3.537.620 stalled-cycles-frontend # 55,01% frontend cycles idle - 2.634.293 stalled-cycles-backend # 40,96% backend cycles idle - 6.157.440 instructions # 0,96 insns per cycle - # 0,57 stalled cycles per insn - 1.248.527 branches # 494,616 M/sec - 34.044 branch-misses # 2,73% of all branches +Performance counter stats for 'ls':= + 2,524235 task-clock # 0,352 CPUs utilized + 15 context-switches # 0,006 M/sec + 0 CPU-migrations # 0,000 M/sec + 292 page-faults # 0,116 M/sec + 6.431.241 cycles # 2,548 GHz + 3.537.620 stalled-cycles-frontend # 55,01% frontend cycles idle + 2.634.293 stalled-cycles-backend # 40,96% backend cycles idle + 6.157.440 instructions # 0,96 insns per cycle + # 0,57 stalled cycles per insn + 1.248.527 branches # 494,616 M/sec + 34.044 branch-misses # 2,73% of all branches 0,007167707 seconds time elapsed ``` @@ -142,10 +142,10 @@ If you added a callchain, it also gives you a callchain profile.\<br /> \*Discla not an appropriate way to gain exact numbers. So this is merely a rough overview and not guaranteed to be absolutely correct.\*\<span style="font-size: 1em;"> \</span> -### On Taurus +### On ZIH systems -On Taurus, users are not allowed to see the kernel functions. If you have multiple events defined, -then the first thing you select in `perf report` is the type of event. Press right +On ZIH systems, users are not allowed to see the kernel functions. If you have multiple events +defined, then the first thing you select in `perf report` is the type of event. Press right ```Bash Available samples @@ -165,7 +165,7 @@ If you'd select cycles, you would get such a screen: ```Bash Events: 96 cycles + 49,13% test_gcc_perf test_gcc_perf [.] main.omp_fn.0 -+ 34,48% test_gcc_perf test_gcc_perf [.] ++ 34,48% test_gcc_perf test_gcc_perf [.] + 6,92% test_gcc_perf test_gcc_perf [.] omp_get_thread_num@plt + 5,20% test_gcc_perf libgomp.so.1.0.0 [.] omp_get_thread_num + 2,25% test_gcc_perf test_gcc_perf [.] main.omp_fn.1 diff --git a/doc.zih.tu-dresden.de/docs/software/virtual_machines.md b/doc.zih.tu-dresden.de/docs/software/virtual_machines.md index c6c660d3c5ac052f3362ad950f6ad395e4420bdf..2527bbe91cbb735824598cc90311b88df2eab808 100644 --- a/doc.zih.tu-dresden.de/docs/software/virtual_machines.md +++ b/doc.zih.tu-dresden.de/docs/software/virtual_machines.md @@ -4,21 +4,21 @@ The following instructions are primarily aimed at users who want to build their [Singularity](containers.md) containers on ZIH systems. The Singularity container setup requires a Linux machine with root privileges, the same architecture -and a compatible kernel. If some of these requirements can not be fulfilled, then there is -also the option of using the provided virtual machines (VM) on ZIH systems. +and a compatible kernel. If some of these requirements cannot be fulfilled, then there is also the +option of using the provided virtual machines (VM) on ZIH systems. -Currently, starting VMs is only possible on partitions `ml` and HPDLF. The VMs on the ML nodes are +Currently, starting VMs is only possible on partitions `ml` and `hpdlf`. The VMs on the ML nodes are used to build singularity containers for the Power9 architecture and the HPDLF nodes to build Singularity containers for the x86 architecture. ## Create a Virtual Machine -The `--cloud=kvm` Slurm parameter specifies that a virtual machine should be started. +The Slurm parameter `--cloud=kvm` specifies that a virtual machine should be started. ### On Power9 Architecture ```console -marie@login$ srun -p ml -N 1 -c 4 --hint=nomultithread --cloud=kvm --pty /bin/bash +marie@login$ srun --partition=ml --nodes=1 --cpus-per-task=4 --hint=nomultithread --cloud=kvm --pty /bin/bash srun: job 6969616 queued and waiting for resources srun: job 6969616 has been allocated resources bash-4.2$ @@ -27,7 +27,7 @@ bash-4.2$ ### On x86 Architecture ```console -marie@login$ srun -p hpdlf -N 1 -c 4 --hint=nomultithread --cloud=kvm --pty /bin/bash +marie@login$ srun --partition=hpdlf --nodes=1 --cpus-per-task=4 --hint=nomultithread --cloud=kvm --pty /bin/bash srun: job 2969732 queued and waiting for resources srun: job 2969732 has been allocated resources bash-4.2$ @@ -35,17 +35,17 @@ bash-4.2$ ## Access a Virtual Machine -Since the a security issue on ZIH systems, we restricted the filesystem permissions. Now you have to -wait until the file `/tmp/${SLURM_JOB_USER}\_${SLURM_JOB_ID}/activate` is created, then you can try +After a security issue on ZIH systems, we restricted the filesystem permissions. Now, you have to +wait until the file `/tmp/${SLURM_JOB_USER}_${SLURM_JOB_ID}/activate` is created. Then, you can try to connect via `ssh` into the virtual machine, but it could be that the virtual machine needs some -more seconds to boot and start the SSH daemon. So you may need to try the `ssh` command multiple +more seconds to boot and accept the connection. So you may need to try the `ssh` command multiple times till it succeeds. ```console bash-4.2$ cat /tmp/marie_2759627/activate #!/bin/bash -if ! grep -q -- "Key for the VM on the partition ml" "/home/rotscher/.ssh/authorized_keys" >& /dev/null; then +if ! grep -q -- "Key for the VM on the partition ml" "/home/marie/.ssh/authorized_keys" > /dev/null; then cat "/tmp/marie_2759627/kvm.pub" >> "/home/marie/.ssh/authorized_keys" else sed -i "s|.*Key for the VM on the partition ml.*|ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3siZfQ6vQ6PtXPG0RPZwtJXYYFY73TwGYgM6mhKoWHvg+ZzclbBWVU0OoU42B3Ddofld7TFE8sqkHM6M+9jh8u+pYH4rPZte0irw5/27yM73M93q1FyQLQ8Rbi2hurYl5gihCEqomda7NQVQUjdUNVc6fDAvF72giaoOxNYfvqAkw8lFyStpqTHSpcOIL7pm6f76Jx+DJg98sXAXkuf9QK8MurezYVj1qFMho570tY+83ukA04qQSMEY5QeZ+MJDhF0gh8NXjX/6+YQrdh8TklPgOCmcIOI8lwnPTUUieK109ndLsUFB5H0vKL27dA2LZ3ZK+XRCENdUbpdoG2Czz Key for the VM on the partition ml|" "/home/marie/.ssh/authorized_keys" @@ -71,7 +71,7 @@ We provide [tools](virtual_machines_tools.md) to automate these steps. You may j The available space inside the VM can be queried with `df -h`. Currently the whole VM has 8 GB and with the installed operating system, 6.6 GB of available space. -Sometimes the Singularity build might fail because of a disk out-of-memory error. In this case it +Sometimes, the Singularity build might fail because of a disk out-of-memory error. In this case, it might be enough to delete leftover temporary files from Singularity: ```console @@ -111,4 +111,4 @@ Bootstraps **shub** and **library** should be avoided. ### Transport Endpoint is not Connected This happens when the SSHFS mount gets unmounted because it is not very stable. It is sufficient to -run `\~/mount_host_data.sh` again or just the SSHFS command inside that script. +run `~/mount_host_data.sh` again or just the SSHFS command inside that script. diff --git a/doc.zih.tu-dresden.de/util/grep-forbidden-words.sh b/doc.zih.tu-dresden.de/util/grep-forbidden-words.sh index 84bd92a0936d4aebc01b13189254e7a4affcc7ca..cfb2b91b57457b701c5b80e76c6346d460cf4602 100755 --- a/doc.zih.tu-dresden.de/util/grep-forbidden-words.sh +++ b/doc.zih.tu-dresden.de/util/grep-forbidden-words.sh @@ -18,7 +18,9 @@ i file \+system HDFS i \<taurus\> taurus\.hrsk /taurus /TAURUS i \<hrskii\> i hpc[ -]\+da\> -i ATTACHURL +i attachurl +i \<todo\> <!--.*todo.*--> +i [[:space:]]$ i \(alpha\|ml\|haswell\|romeo\|gpu\|smp\|julia\|hpdlf\|scs5\)-\?\(interactive\)\?[^a-z]*partition i \[\s\?\(documentation\|here\|this \(link\|page\|subsection\)\|slides\?\|manpage\)\s\?\] i work[ -]\+space" @@ -46,13 +48,15 @@ function usage () { echo " -f Search in a specific markdown file" echo " -s Silent mode" echo " -h Show help message" + echo " -c Show git matches in color" } # Options all_files=false silent=false file="" -while getopts ":ahsf:" option; do +color="" +while getopts ":ahsf:c" option; do case $option in a) all_files=true @@ -64,6 +68,9 @@ while getopts ":ahsf:" option; do s) silent=true ;; + c) + color=" --color=always " + ;; h) usage exit;; @@ -106,7 +113,7 @@ for f in $files; do grepflag=-i ;; esac - if grep -n $grepflag "$pattern" "$f" | grepExceptions "${exceptionPatternsArray[@]}" ; then + if grep -n $grepflag $color "$pattern" "$f" | grepExceptions "${exceptionPatternsArray[@]}" ; then ((cnt=cnt+1)) fi done <<< $exceptionPatterns diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index 70146e2deabb58de6a3c59a8ae73c0cd2c8b3dbe..c54073a4814792a22cd904a36266f46e4cfef91f 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -63,6 +63,8 @@ Dockerfile Dockerfiles DockerHub dockerized +dotfile +dotfiles EasyBuild ecryptfs engl @@ -129,6 +131,7 @@ img Infiniband init inode +IOPS IPs Itanium jobqueue