diff --git a/doc.zih.tu-dresden.de/docs/data_transfer/export_nodes.md b/doc.zih.tu-dresden.de/docs/data_transfer/export_nodes.md index 80ea758c57b09601cadd001aa018c56a2f219a3f..71ca41949f44d558c2ca3384d7651e9e85b19125 100644 --- a/doc.zih.tu-dresden.de/docs/data_transfer/export_nodes.md +++ b/doc.zih.tu-dresden.de/docs/data_transfer/export_nodes.md @@ -9,9 +9,13 @@ that you cannot log in via SSH to the export nodes, but only use `scp`, `rsync` The export nodes are reachable under the hostname `taurusexport.hrsk.tu-dresden.de` (or `taurusexport3.hrsk.tu-dresden.de` and `taurusexport4.hrsk.tu-dresden.de`). +Please keep in mind that there are different +[filesystems](../data_lifecycle/file_systems.md#recommendations-for-filesystem-usage). Choose the +one that matches your needs. + ## Access From Linux -There are at least three tool to exchange data between your local workstation and ZIH systems. All +There are at least three tools to exchange data between your local workstation and ZIH systems. They are explained in the following section in more detail. !!! important @@ -33,13 +37,27 @@ in a directory, the option `-r` has to be specified. marie@local$ scp -r <directory> taurusexport:<target-location> ``` -??? example "Example: Copy a file from ZIH systems to your workstation" + For example, if you want to copy your data file `mydata.csv` to the directory `input` in your + home directory, you would use the following: ```console - marie@login$ scp taurusexport:<file> <target-location> + marie@local$ scp mydata.csv taurusexport:input/ + ``` + +??? example "Example: Copy a file from ZIH systems to your workstation" + + ```bash + marie@local$ scp taurusexport:<file> <target-location> # Add -r to copy whole directory - marie@login$ scp -r taurusexport:<directory> <target-location> + marie@local$ scp -r taurusexport:<directory> <target-location> + ``` + + For example, if you have a directory named `output` in your home directory on ZIH systems and + you want to copy it to the directory `/tmp` on your workstation, you would use the following: + + ```console + marie@local$ scp -r taurusexport:output /tmp ``` ### SFTP diff --git a/doc.zih.tu-dresden.de/docs/software/big_data_frameworks_spark.md b/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md similarity index 50% rename from doc.zih.tu-dresden.de/docs/software/big_data_frameworks_spark.md rename to doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md index 81a66719d20afa05b9bea1dcdf8b7e3175fc18c1..6cd6a94fb93c6861393d7ba3cb3b8689a28f7637 100644 --- a/doc.zih.tu-dresden.de/docs/software/big_data_frameworks_spark.md +++ b/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md @@ -1,13 +1,18 @@ -# Big Data Frameworks: Apache Spark +# Big Data Frameworks [Apache Spark](https://spark.apache.org/), [Apache Flink](https://flink.apache.org/) and [Apache Hadoop](https://hadoop.apache.org/) are frameworks for processing and integrating Big Data. These frameworks are also offered as software [modules](modules.md) in both `ml` and `scs5` software environments. You can check module versions and availability with the command -```console -marie@login$ module avail Spark -``` +=== "Spark" + ```console + marie@login$ module avail Spark + ``` +=== "Flink" + ```console + marie@login$ module avail Flink + ``` **Prerequisites:** To work with the frameworks, you need [access](../access/ssh_login.md) to ZIH systems and basic knowledge about data analysis and the batch system @@ -15,7 +20,8 @@ systems and basic knowledge about data analysis and the batch system The usage of Big Data frameworks is different from other modules due to their master-worker approach. That means, before an application can be started, one has to do additional steps. -In the following, we assume that a Spark application should be started. +In the following, we assume that a Spark application should be started and give alternative +commands for Flink where applicable. The steps are: @@ -26,49 +32,72 @@ The steps are: Apache Spark can be used in [interactive](#interactive-jobs) and [batch](#batch-jobs) jobs as well as via [Jupyter notebooks](#jupyter-notebook). All three ways are outlined in the following. +The usage of Flink with Jupyter notebooks is currently under examination. ## Interactive Jobs ### Default Configuration -The Spark module is available in both `scs5` and `ml` environments. -Thus, Spark can be executed using different CPU architectures, e.g., Haswell and Power9. +The Spark and Flink modules are available in both `scs5` and `ml` environments. +Thus, Spark and Flink can be executed using different CPU architectures, e.g., Haswell and Power9. Let us assume that two nodes should be used for the computation. Use a `srun` command similar to the following to start an interactive session using the partition haswell. The following code -snippet shows a job submission to haswell nodes with an allocation of two nodes with 60 GB main +snippet shows a job submission to haswell nodes with an allocation of two nodes with 60000 MB main memory exclusively for one hour: ```console -marie@login$ srun --partition=haswell --nodes=2 --mem=60g --exclusive --time=01:00:00 --pty bash -l +marie@login$ srun --partition=haswell --nodes=2 --mem=60000M --exclusive --time=01:00:00 --pty bash -l ``` -Once you have the shell, load Spark using the command +Once you have the shell, load desired Big Data framework using the command -```console -marie@compute$ module load Spark -``` +=== "Spark" + ```console + marie@compute$ module load Spark + ``` +=== "Flink" + ```console + marie@compute$ module load Flink + ``` -Before the application can be started, the Spark cluster needs to be set up. To do this, configure -Spark first using configuration template at `$SPARK_HOME/conf`: +Before the application can be started, the cluster with the allocated nodes needs to be set up. To +do this, configure the cluster first using the configuration template at `$SPARK_HOME/conf` for +Spark or `$FLINK_ROOT_DIR/conf` for Flink: -```console -marie@compute$ source framework-configure.sh spark $SPARK_HOME/conf -``` +=== "Spark" + ```console + marie@compute$ source framework-configure.sh spark $SPARK_HOME/conf + ``` +=== "Flink" + ```console + marie@compute$ source framework-configure.sh flink $FLINK_ROOT_DIR/conf + ``` This places the configuration in a directory called `cluster-conf-<JOB_ID>` in your `home` -directory, where `<JOB_ID>` stands for the id of the Slurm job. After that, you can start Spark in +directory, where `<JOB_ID>` stands for the id of the Slurm job. After that, you can start in the usual way: -```console -marie@compute$ start-all.sh -``` +=== "Spark" + ```console + marie@compute$ start-all.sh + ``` +=== "Flink" + ```console + marie@compute$ start-cluster.sh + ``` -The Spark processes should now be set up and you can start your application, e. g.: +The necessary background processes should now be set up and you can start your application, e. g.: -```console -marie@compute$ spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.12-3.0.1.jar 1000 -``` +=== "Spark" + ```console + marie@compute$ spark-submit --class org.apache.spark.examples.SparkPi \ + $SPARK_HOME/examples/jars/spark-examples_2.12-3.0.1.jar 1000 + ``` +=== "Flink" + ```console + marie@compute$ flink run $FLINK_ROOT_DIR/examples/batch/KMeans.jar + ``` !!! warning @@ -80,37 +109,57 @@ marie@compute$ spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOM The script `framework-configure.sh` is used to derive a configuration from a template. It takes two parameters: -- The framework to set up (Spark, Flink, Hadoop) +- The framework to set up (parameter `spark` for Spark, `flink` for Flink, and `hadoop` for Hadoop) - A configuration template Thus, you can modify the configuration by replacing the default configuration template with a customized one. This way, your custom configuration template is reusable for different jobs. You can start with a copy of the default configuration ahead of your interactive session: -```console -marie@login$ cp -r $SPARK_HOME/conf my-config-template -``` +=== "Spark" + ```console + marie@login$ cp -r $SPARK_HOME/conf my-config-template + ``` +=== "Flink" + ```console + marie@login$ cp -r $FLINK_ROOT_DIR/conf my-config-template + ``` After you have changed `my-config-template`, you can use your new template in an interactive job with: -```console -marie@compute$ source framework-configure.sh spark my-config-template -``` +=== "Spark" + ```console + marie@compute$ source framework-configure.sh spark my-config-template + ``` +=== "Flink" + ```console + marie@compute$ source framework-configure.sh flink my-config-template + ``` ### Using Hadoop Distributed Filesystem (HDFS) If you want to use Spark and HDFS together (or in general more than one framework), a scheme similar to the following can be used: -```console -marie@compute$ module load Hadoop -marie@compute$ module load Spark -marie@compute$ source framework-configure.sh hadoop $HADOOP_ROOT_DIR/etc/hadoop -marie@compute$ source framework-configure.sh spark $SPARK_HOME/conf -marie@compute$ start-dfs.sh -marie@compute$ start-all.sh -``` +=== "Spark" + ```console + marie@compute$ module load Hadoop + marie@compute$ module load Spark + marie@compute$ source framework-configure.sh hadoop $HADOOP_ROOT_DIR/etc/hadoop + marie@compute$ source framework-configure.sh spark $SPARK_HOME/conf + marie@compute$ start-dfs.sh + marie@compute$ start-all.sh + ``` +=== "Flink" + ```console + marie@compute$ module load Hadoop + marie@compute$ module load Flink + marie@compute$ source framework-configure.sh hadoop $HADOOP_ROOT_DIR/etc/hadoop + marie@compute$ source framework-configure.sh flink $FLINK_ROOT_DIR/conf + marie@compute$ start-dfs.sh + marie@compute$ start-cluster.sh + ``` ## Batch Jobs @@ -122,41 +171,76 @@ that, you can conveniently put the parameters directly into the job file and sub Please use a [batch job](../jobs_and_resources/slurm.md) with a configuration, similar to the example below: -??? example "spark.sbatch" - ```bash - #!/bin/bash -l - #SBATCH --time=00:05:00 - #SBATCH --partition=haswell - #SBATCH --nodes=2 - #SBATCH --exclusive - #SBATCH --mem=60G - #SBATCH --job-name="example-spark" +??? example "example-starting-script.sbatch" + === "Spark" + ```bash + #!/bin/bash -l + #SBATCH --time=01:00:00 + #SBATCH --partition=haswell + #SBATCH --nodes=2 + #SBATCH --exclusive + #SBATCH --mem=60000M + #SBATCH --job-name="example-spark" + + module load Spark/3.0.1-Hadoop-2.7-Java-1.8-Python-3.7.4-GCCcore-8.3.0 + + function myExitHandler () { + stop-all.sh + } + + #configuration + . framework-configure.sh spark $SPARK_HOME/conf + + #register cleanup hook in case something goes wrong + trap myExitHandler EXIT - ml Spark/3.0.1-Hadoop-2.7-Java-1.8-Python-3.7.4-GCCcore-8.3.0 + start-all.sh + + spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.12-3.0.1.jar 1000 - function myExitHandler () { stop-all.sh - } - #configuration - . framework-configure.sh spark $SPARK_HOME/conf + exit 0 + ``` + === "Flink" + ```bash + #!/bin/bash -l + #SBATCH --time=01:00:00 + #SBATCH --partition=haswell + #SBATCH --nodes=2 + #SBATCH --exclusive + #SBATCH --mem=60000M + #SBATCH --job-name="example-flink" - #register cleanup hook in case something goes wrong - trap myExitHandler EXIT + module load Flink/1.12.3-Java-1.8.0_161-OpenJDK-Python-3.7.4-GCCcore-8.3.0 - start-all.sh + function myExitHandler () { + stop-cluster.sh + } - spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.12-3.0.1.jar 1000 + #configuration + . framework-configure.sh flink $FLINK_ROOT_DIR/conf - stop-all.sh + #register cleanup hook in case something goes wrong + trap myExitHandler EXIT - exit 0 - ``` + #start the cluster + start-cluster.sh + + #run your application + flink run $FLINK_ROOT_DIR/examples/batch/KMeans.jar + + #stop the cluster + stop-cluster.sh + + exit 0 + ``` ## Jupyter Notebook You can run Jupyter notebooks with Spark on the ZIH systems in a similar way as described on the -[JupyterHub](../access/jupyterhub.md) page. +[JupyterHub](../access/jupyterhub.md) page. Interaction of Flink with JupyterHub is currently +under examination and will be posted here upon availability. ### Spawning a Notebook diff --git a/doc.zih.tu-dresden.de/docs/software/data_analytics.md b/doc.zih.tu-dresden.de/docs/software/data_analytics.md index 245bd5ae1a8ea0f246bd578d4365b3d23aaaba64..44414493405bc36ffed74bb85fb805b331308af7 100644 --- a/doc.zih.tu-dresden.de/docs/software/data_analytics.md +++ b/doc.zih.tu-dresden.de/docs/software/data_analytics.md @@ -10,7 +10,7 @@ The following tools are available on ZIH systems, among others: * [Python](data_analytics_with_python.md) * [R](data_analytics_with_r.md) * [RStudio](data_analytics_with_rstudio.md) -* [Big Data framework Spark](big_data_frameworks_spark.md) +* [Big Data framework Spark](big_data_frameworks.md) * [MATLAB and Mathematica](mathematics.md) Detailed information about frameworks for machine learning, such as [TensorFlow](tensorflow.md) diff --git a/doc.zih.tu-dresden.de/docs/software/mathematics.md b/doc.zih.tu-dresden.de/docs/software/mathematics.md index 21aab2856a7b9582c3f6b8d5453d7ea2f8b6895b..5b8e23b2fd3ed373bdf7bf6394ae3b2faf98ce74 100644 --- a/doc.zih.tu-dresden.de/docs/software/mathematics.md +++ b/doc.zih.tu-dresden.de/docs/software/mathematics.md @@ -21,9 +21,9 @@ font manager. You need to copy the fonts from ZIH systems to your local system and expand the font path -```bash -localhost$ scp -r taurus.hrsk.tu-dresden.de:/sw/global/applications/mathematica/10.0/SystemFiles/Fonts/Type1/ ~/.fonts -localhost$ xset fp+ ~/.fonts/Type1 +```console +marie@local$ scp -r taurus.hrsk.tu-dresden.de:/sw/global/applications/mathematica/10.0/SystemFiles/Fonts/Type1/ ~/.fonts +marie@local$ xset fp+ ~/.fonts/Type1 ``` #### Windows Workstation @@ -93,29 +93,29 @@ interfaces with the Maple symbolic engine, allowing it to be part of a full comp Running MATLAB via the batch system could look like this (for 456 MB RAM per core and 12 cores reserved). Please adapt this to your needs! -```bash -zih$ module load MATLAB -zih$ srun -t 8:00 -c 12 --mem-per-cpu=456 --pty --x11=first bash -zih$ matlab +```console +marie@login$ module load MATLAB +marie@login$ srun --time=8:00 --cpus-per-task=12 --mem-per-cpu=456 --pty --x11=first bash +marie@compute$ matlab ``` With following command you can see a list of installed software - also the different versions of matlab. -```bash -zih$ module avail +```console +marie@login$ module avail ``` Please choose one of these, then load the chosen software with the command: ```bash -zih$ module load MATLAB/version +marie@login$ module load MATLAB/<version> ``` Or use: -```bash -zih$ module load MATLAB +```console +marie@login$ module load MATLAB ``` (then you will get the most recent Matlab version. @@ -126,8 +126,8 @@ zih$ module load MATLAB If X-server is running and you logged in at ZIH systems, you should allocate a CPU for your work with command -```bash -zih$ srun --pty --x11=first bash +```console +marie@login$ srun --pty --x11=first bash ``` - now you can call "matlab" (you have 8h time to work with the matlab-GUI) @@ -138,8 +138,9 @@ Using Scripts You have to start matlab-calculation as a Batch-Job via command -```bash -srun --pty matlab -nodisplay -r basename_of_your_matlab_script #NOTE: you must omit the file extension ".m" here, because -r expects a matlab command or function call, not a file-name. +```console +marie@login$ srun --pty matlab -nodisplay -r basename_of_your_matlab_script +# NOTE: you must omit the file extension ".m" here, because -r expects a matlab command or function call, not a file-name. ``` !!! info "License occupying" @@ -160,7 +161,7 @@ You can find detailed documentation on the Matlab compiler at Compile your `.m` script into a binary: ```bash -mcc -m name_of_your_matlab_script.m -o compiled_executable -R -nodisplay -R -nosplash +marie@login$ mcc -m name_of_your_matlab_script.m -o compiled_executable -R -nodisplay -R -nosplash ``` This will also generate a wrapper script called `run_compiled_executable.sh` which sets the required @@ -172,41 +173,35 @@ Then run the binary via the wrapper script in a job (just a simple example, you [sbatch script](../jobs_and_resources/slurm.md#job-submission) for that) ```bash -zih$ srun ./run_compiled_executable.sh $EBROOTMATLAB +marie@login$ srun ./run_compiled_executable.sh $EBROOTMATLAB ``` ### Parallel MATLAB #### With 'local' Configuration -- If you want to run your code in parallel, please request as many - cores as you need! -- start a batch job with the number N of processes -- example for N= 4: `srun -c 4 --pty --x11=first bash` -- run Matlab with the GUI or the CLI or with a script -- inside use `matlabpool open 4` to start parallel - processing +- If you want to run your code in parallel, please request as many cores as you need! +- Start a batch job with the number `N` of processes, e.g., `srun --cpus-per-task=4 --pty + --x11=first bash -l` +- Run Matlab with the GUI or the CLI or with a script +- Inside Matlab use `matlabpool open 4` to start parallel processing -- example for 1000*1000 matrix multiplication - -!!! example +!!! example "Example for 1000*1000 matrix-matrix multiplication" ```bash R = distributed.rand(1000); D = R * R ``` -- to close parallel task: -`matlabpool close` +- Close parallel task using `matlabpool close` #### With parfor -- start a batch job with the number N of processes (e.g. N=12) -- inside use `matlabpool open N` or - `matlabpool(N)` to start parallel processing. It will use +- Start a batch job with the number `N` of processes (,e.g., `N=12`) +- Inside use `matlabpool open N` or `matlabpool(N)` to start parallel processing. It will use the 'local' configuration by default. -- Use `parfor` for a parallel loop, where the **independent** loop - iterations are processed by N threads +- Use `parfor` for a parallel loop, where the **independent** loop iterations are processed by `N` + threads !!! example diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index 79057c1d6770f69e13f6df3bdbcff4a3693851ad..4efbb60c85f44b6cb8d80c33cfb251c7a52003a3 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -45,7 +45,7 @@ nav: - Data Analytics with R: software/data_analytics_with_r.md - Data Analytics with RStudio: software/data_analytics_with_rstudio.md - Data Analytics with Python: software/data_analytics_with_python.md - - Apache Spark: software/big_data_frameworks_spark.md + - Big Data Analytics: software/big_data_frameworks.md - Machine Learning: - Overview: software/machine_learning.md - TensorFlow: software/tensorflow.md @@ -187,6 +187,8 @@ markdown_extensions: permalink: True - attr_list - footnotes + - pymdownx.tabbed: + alternate_style: true extra: homepage: https://tu-dresden.de diff --git a/doc.zih.tu-dresden.de/util/check-empty-page.sh b/doc.zih.tu-dresden.de/util/check-empty-page.sh new file mode 100755 index 0000000000000000000000000000000000000000..7c4fdc2cd07b167b39b0b0ece58e199df0df6d84 --- /dev/null +++ b/doc.zih.tu-dresden.de/util/check-empty-page.sh @@ -0,0 +1,11 @@ +#!/bin/bash + +set -euo pipefail + +scriptpath=${BASH_SOURCE[0]} +basedir=`dirname "$scriptpath"` +basedir=`dirname "$basedir"` + +if find $basedir -name \*.md -exec wc -l {} \; | grep '^0 '; then + exit 1 +fi diff --git a/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh b/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh index 280e4003dc951164c86b44560d6c81e3a5dc640c..7895f576e46e66caa9e14f3d77a74deb918fdab0 100755 --- a/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh +++ b/doc.zih.tu-dresden.de/util/grep-forbidden-patterns.sh @@ -6,41 +6,53 @@ scriptpath=${BASH_SOURCE[0]} basedir=`dirname "$scriptpath"` basedir=`dirname "$basedir"` -#This is the ruleset. Each line represents a rule of tab-separated fields. +#This is the ruleset. Each rule consists of a message (first line), a tab-separated list of files to skip (second line) and a pattern specification (third line). +#A pattern specification is a tab-separated list of fields: #The first field represents whether the match should be case-sensitive (s) or insensitive (i). #The second field represents the pattern that should not be contained in any file that is checked. #Further fields represent patterns with exceptions. #For example, the first rule says: # The pattern \<io\> should not be present in any file (case-insensitive match), except when it appears as ".io". ruleset="The word \"IO\" should not be used, use \"I/O\" instead. +doc.zih.tu-dresden.de/docs/contrib/content_rules.md i \<io\> \.io \"SLURM\" (only capital letters) should not be used, use \"Slurm\" instead. +doc.zih.tu-dresden.de/docs/contrib/content_rules.md s \<SLURM\> \"File system\" should be written as \"filesystem\", except when used as part of a proper name. +doc.zih.tu-dresden.de/docs/contrib/content_rules.md i file \+system HDFS Use \"ZIH systems\" or \"ZIH system\" instead of \"Taurus\". \"taurus\" is only allowed when used in ssh commands and other very specific situations. +doc.zih.tu-dresden.de/docs/contrib/content_rules.md i \<taurus\> taurus\.hrsk /taurus /TAURUS ssh ^[0-9]\+:Host taurus$ \"HRSKII\" should be avoided, use \"ZIH system\" instead. +doc.zih.tu-dresden.de/docs/contrib/content_rules.md i \<hrskii\> The term \"HPC-DA\" should be avoided. Depending on the situation, use \"data analytics\" or similar. +doc.zih.tu-dresden.de/docs/contrib/content_rules.md i hpc[ -]\+da\> \"ATTACHURL\" was a keyword in the old wiki, don't use it. + i attachurl Replace \"todo\" with real content. + i \<todo\> <!--.*todo.*--> +Replace \"Coming soon\" with real content. + +i \<coming soon\> Avoid spaces at end of lines. + i [[:space:]]$ When referencing partitions, put keyword \"partition\" in front of partition name, e. g. \"partition ml\", not \"ml partition\". +doc.zih.tu-dresden.de/docs/contrib/content_rules.md i \(alpha\|ml\|haswell\|romeo\|gpu\|smp\|julia\|hpdlf\|scs5\)-\?\(interactive\)\?[^a-z]*partition Give hints in the link text. Words such as \"here\" or \"this link\" are meaningless. +doc.zih.tu-dresden.de/docs/contrib/content_rules.md i \[\s\?\(documentation\|here\|this \(link\|page\|subsection\)\|slides\?\|manpage\)\s\?\] Use \"workspace\" instead of \"work space\" or \"work-space\". +doc.zih.tu-dresden.de/docs/contrib/content_rules.md i work[ -]\+space" -# Whitelisted files will be ignored -# Whitespace separated list with full path -whitelist=(doc.zih.tu-dresden.de/README.md doc.zih.tu-dresden.de/docs/contrib/content_rules.md) - function grepExceptions () { if [ $# -gt 0 ]; then firstPattern=$1 @@ -55,22 +67,29 @@ function checkFile(){ f=$1 echo "Check wording in file $f" while read message; do + IFS=$'\t' read -r -a files_to_skip + skipping="" + if (printf '%s\n' "${files_to_skip[@]}" | grep -xq $f); then + skipping=" -- skipping" + fi IFS=$'\t' read -r flags pattern exceptionPatterns while IFS=$'\t' read -r -a exceptionPatternsArray; do if [ $silent = false ]; then - echo " Pattern: $pattern" + echo " Pattern: $pattern$skipping" fi - grepflag= - case "$flags" in - "i") - grepflag=-i - ;; - esac - if grep -n $grepflag $color "$pattern" "$f" | grepExceptions "${exceptionPatternsArray[@]}" ; then - number_of_matches=`grep -n $grepflag $color "$pattern" "$f" | grepExceptions "${exceptionPatternsArray[@]}" | wc -l` - ((cnt=cnt+$number_of_matches)) - if [ $silent = false ]; then - echo " $message" + if [ -z "$skipping" ]; then + grepflag= + case "$flags" in + "i") + grepflag=-i + ;; + esac + if grep -n $grepflag $color "$pattern" "$f" | grepExceptions "${exceptionPatternsArray[@]}" ; then + number_of_matches=`grep -n $grepflag $color "$pattern" "$f" | grepExceptions "${exceptionPatternsArray[@]}" | wc -l` + ((cnt=cnt+$number_of_matches)) + if [ $silent = false ]; then + echo " $message" + fi fi fi done <<< $exceptionPatterns @@ -123,7 +142,7 @@ branch="origin/${CI_MERGE_REQUEST_TARGET_BRANCH_NAME:-preview}" if [ $all_files = true ]; then echo "Search in all markdown files." - files=$(git ls-tree --full-tree -r --name-only HEAD $basedir/docs/ | grep .md) + files=$(git ls-tree --full-tree -r --name-only HEAD $basedir/ | grep .md) elif [[ ! -z $file ]]; then files=$file else @@ -138,10 +157,6 @@ if [[ ! -z $file ]]; then else for f in $files; do if [ "${f: -3}" == ".md" -a -f "$f" ]; then - if (printf '%s\n' "${whitelist[@]}" | grep -xq $f); then - echo "Skip whitelisted file $f" - continue - fi checkFile $f fi done diff --git a/doc.zih.tu-dresden.de/util/pre-commit b/doc.zih.tu-dresden.de/util/pre-commit index eb63bbea24052eb1dff4ec16a17b8b5aba275e18..1cc901e00efbece94209bfa6c4bbbc54aad682e9 100755 --- a/doc.zih.tu-dresden.de/util/pre-commit +++ b/doc.zih.tu-dresden.de/util/pre-commit @@ -75,6 +75,13 @@ then exit_ok=no fi +echo "Looking for empty files..." +docker run --name=hpc-compendium --rm -w /docs --mount src="$(pwd)",target=/docs,type=bind hpc-compendium ./doc.zih.tu-dresden.de/util/check-empty-page.sh +if [ $? -ne 0 ] +then + exit_ok=no +fi + if [ $exit_ok == yes ] then exit 0